Still loitering around…

As noted in the last post, I’ve been browsing a lot. However, I find that the signal-to-noise ratio is, in a way, too low. There are too few things worth writing home about. Of course, OTOH, some of these things are so deep that they can keep one occupied for a long time.

Anyway, let me give many (almost all?) of the interesting links I found since my last post. These are being noted in no particular order. In most cases, the sub-title says it all, and so, I need not add comments. However, for a couple of videos related to QM, I do add significant amount of comments. … BTW, too many hats to do the tipping to. So let me skip that part and directly give you the URLs…

“A `digital alchemist’ unravels the mysteries of complexity”:

“Computational physicist Sharon Glotzer is uncovering the rules by which complex collective phenomena emerge from simple building blocks.” [^]

“Up and down the ladder of abstraction. A systematic approach to interactive visualization.” [^]

The tweet that pointed to this URL had this preface: “One particular talent stands out among the world-class programmers I’ve known—namely, an ability to move effortlessly between different levels of abstraction.”—Donald Knuth.

My own thinking processes are such that I use visualization a lot. Nay, I must. That’s the reason I appreciated this link. Incidentally, it also is the reason why I did not play a lot with the interactions here! (I put it in the TBD / Some other day / Etc. category.)

“The 2021 AI index: major growth despite the pandemic.”

“This year’s report shows a maturing industry, significant private investment, and rising competition between China and the U.S.” [^]

“Science relies on constructive criticism. Here’s how to keep it useful and respectful.” [^]

The working researcher, esp. the one who blogs / interacts a lot, probably already knows most all this stuff. But for students, it might be useful to have such tips collected in one place.

“How to criticize with kindness: Philosopher Daniel Dennett on the four steps to arguing intelligently.” [^].

Ummm… Why four, Dan? Why not, say, twelve? … Also, what if one honestly thinks that retards aren’t ever going to get any part of it?… Oh well, let me turn to the next link though…

“Susan Sontag on censorship and the three steps to refuting any argument” [^]

I just asked about four steps, and now comes Sontag. She comes down to just three steps, and also generalizes the applicability of the advice to any argument… But yes, she mentions a good point about censorship. Nice.

“The needless complexity of modern calculus: How 18th century mathematicians complicated calculus to avoid the criticisms of a bishop.” [^]

Well, the article does have a point, but if you ask me, there’s no alternative to plain hard work. No alternative to taking a good text-book or two (like Thomas and Finney, as also Resnick and Halliday (yes, for maths)), paper and pen / pencil, and working your way through. No alternative to that… But if you do that once for some idea, then every idea which depends on it, does become so simple—for your entire life. A hint or a quick reference is all you need, then. [Hints for the specific topic of this piece: the Taylor series, and truncation thereof.] But yes, the article is worth a fast read (if you haven’t read / used calculus in a while). … Also, Twitterati who mentioned this article also recommended the wonderful book from the next link (which I had forgotten)…

“Calculus made easy” [^].

The above link is to the Wiki article, which in turn gives the link to the PDF of the book. Check out the preface of the book, first thing.

“The first paper published in the first overlay journal (JTCAM) in Solid Mechanics” [^]

It’s too late for me (I have left mechanics as a full-time field quite a while ago) but I do welcome this development. … A few years ago, Prof. Timothy Gowers had begun an overlay journal in maths, and then, there also was an overlay journal for QM, and I had welcomed both these developments back then; see my blog post here [^].

“The only two equations that you should know: Part 1” [^].

Dr. Joglekar makes many good points, but I am not sure if my choice for the two equations is going to be the same.

[In fact, I don’t even like the restriction that there should be just two equations. …And, what’s happenning? Four steps. Then, three steps. Now, two equations… How long before we summarily turn negative, any idea?]

But yes, a counter-balance like the one in this article is absolutely necessary. The author touches on E = mc^2 and Newton’s laws, but I will go ahead and add topics like the following too: Big Bang, Standard Model, (and, Quantum Computers, String Theory, Multiverses, …).

“Turing award goes to creators of computer programming building blocks” [^] “Jeffrey Ullman and Alfred Aho developed many of the fundamental concepts that researchers use when they build new software.”

Somehow, there wasn’t as much of excitement this year as the Turing award usually generates.

Personally, though, I could see why the committee might have decided to recognize Aho and Ullman’s work. I had once built a “yacc”-like tool that would generate the tables for a table-driver parser, given the abstract grammar specification in the extended Backus-Noor form (EBNF). I did it as a matter of hobby, working in the evenings. The only resource I used was the “dragon book”, which was written by Profs. Aho, Sethi, and Ullman. It was a challenging but neat book. (I am not sure why they left Sethi out. However, my knowledge of the history of development of this area is minimal. So, take it as an idle wondering…)

Congratulations to Profs. Aho and Ullman.

“Stop calling everything AI, machine-learning pioneer says” [^] “Michael I. Jordan explains why today’s artificial-intelligence systems aren’t actually intelligent”

Well, “every one” knows that, but the fact is, it still needs to be said (and even explained!)

“How a gene for fair skin spread across India” [^] “A study of skin color in the Indian subcontinent shows the complex movements of populations there.”

No, the interesting thing about this article, IMO, was not that it highlighted Indians’ fascination / obsession for fairness—the article actually doesn’t even passingly mention this part. The real interesting thing, to me, was: the direct visual depiction, as it were, of Indian Indologists’ obsession with just one geographical region of India, viz., the Saraswati / Ghaggar / Mohan Ja Daro / Dwaarkaa / Pakistan / Etc. And, also the European obsession with the same region! … I mean check out how big India actually is, you know…

H/W for those interested: Consult good Sanskrit dictionaries and figure out the difference between निल (“nila”) and नील (“neela”). Hint: One of the translations for one of these two words is “black” in the sense “dark”, but not “blue”, and vice-versa for the other. You only have to determine which one stands for what meaning.

Want some more H/W? OK… Find out the most ancient painting of कृष्ण (“kRSNa”) or even राम (“raama”) that is still extant. What is the colour of the skin as shown in the painting? Why? Has the painting been dated to the times before the Europeans (Portugese, Dutch, French, Brits, …) arrived in India (say in the second millennium AD)?

“Six lessons from the biotech startup world” [^]

Dr. Joglekar again… Here, I think every one (whether connected with a start-up or not) should go through the first point: “It’s about the problem, not about the technology”.

Too many engineers commit this mistake, and I guess this point can be amplified further—the tools vs. the problem. …It’s but one variant of the “looking under the lamp” fallacy, but it’s an important one. (Let me confess: I tend to repeat the same error too, though with experience, one does also learn to catch the drift in time.)

“The principle of least action—why it works.” [^].

Neat article.

I haven’t read the related book [“The lazy universe: an introduction to the principle of least action”] , but looking at the portions available at Google [^], even though I might have objections to raise (or at least comments to make) on the positions taken by the author in the book, I am definitely going to add it to the list of books I recommend [^].

Let me mention the position from which I will be raising my objections (if any), in the briefest (and completely on-the-fly) words:

The principle of the least action (PLA) is a principle that brings out what is common to calculations in a mind-bogglingly large variety of theoretical contexts in physics. These are the contexts which involve either the concept of energy, or some suitable mathematical “generalizations” of the same concept.

As such, PLA can be regarded as a principle for a possible organization of our knowledge from a certain theoretical viewpoint.

However, PLA itself has no definite ontological content; whatever ontological content you might associate with PLA would go on changing as per the theoretical context in which it is used. Consequently, PLA cannot be seen as capturing an actual physical characteristic existing in the world out there; it is not a “thing” or “property” that is shared in common by the objects, facts or phenomena in the physical world.

Let me give you an example. The differential equation for heat conduction has exactly the same form as that for diffusion of chemical species. Both are solved using exactly the same technique, viz., the Fourier theory. Both involve a physical flux which is related to the gradient vector of some physically existing scalar quantity. However, this does not mean that both phenomena are produced by the same physical characteristic or property of the physical objects. The fact that both are parabolic PDEs can be used to organize our knowledge of the physical world, but such organization proceeds by making appeal to what is common to methods of calculations, and not in reference to some ontological or physical facts that are in common to both.

Further, it must also be noted, PLA does not apply to all of physics, but only to the more fundamental theories in it. In particular, try applying it to situations where the governing differential equation is not of the second-order, but is of the first- or the third-order [^]. Also, think about the applicability of PLA for dissipative / path-dependent processes.

… I don’t know whether the author (Dr. Jennifer Coopersmith) covers points like these in the book or not… But even if she doesn’t (and despite any differences I anticipate as of now, and indeed might come to keep also after reading the book), I am sure, the book is going to be extraordinarily enlightening in respect of an array of topics. … Strongly recommended.

Muon g-2.

I will give some the links I found useful. (Not listed in any particular order)

  • Dennis Overbye covers it for the NYT [^],
  • Natalie Wolchoever for the Quanta Mag [^],
  • Dr. Luboš Motl for his blog [^],
  • Dr. Peter Woit for his blog [^],
  • Dr. Adam Falkowski (“Jester”) for his blog [^],
  • Dr. Ethan Siegel for the Forbes [^], and,
  • Dr. Sabine Hossenfelder for Sci-Am [^].

If you don’t want to go through all these blog-posts, and only are looking for the right outlook to adopt, then check out the concluding parts of Hossenfelder’s and Siegel’s pieces (which conveniently happen to be the last two in the above list).

As to the discussions: The Best Comment Prize is hereby being awarded, after splitting it equally into two halves, to “Manuel Gnida” for this comment [^], and to “Unknown” for this comment [^].

The five-man quantum mechanics (aka “super-determinism”):

By which, I refer to this video on YouTube: “Warsaw Spacetime Colloquium #11 – Sabine Hossenfelder (2021/03/26)” [^].

In this video, Dr. Hossenfelder talks about… “super-determinism.”

Incidentally, this idea (of super-determinism) had generated a lot of comments at Prof. Dr. Scott Aaronson’s blog. See the reader comments following this post: [^]. In fact, Aaronson had to say in the end: “I’m closing this thread tonight, honestly because I’m tired of superdeterminism discussion.” [^].

Hossenfelder hasn’t yet posted this video at her own blog.

There are five people in the entire world who do research in super-determinism, Hossenfelder seems to indicate. [I know, I know, not all of them are men. But I still chose to say the five-man QM. It has a nice ring to it—if you know a certain bit from the history of QM.]

Given the topic, I expected to browse through the video really rapidly, like a stone that goes on skipping on the surface of water [^], and thus, being done with it right within 5–10 minutes or so.

Instead, I found myself listening to it attentively, not skipping even a single frame, and finishing the video in the sequence presented. Also, going back over some portions for the second time…. And that’s because Hossenfelder’s presentation is so well thought out. [But where is the PDF of the slides?]

It’s only after going through this video that I got to understand what the idea of “super-determinism” is supposed to be like, and how it differs from the ordinary “determinism”. Spoiler: Think “hidden variables”.

My take on the video:

No, the idea (of super-determinism) isn’t at all necessary to explain QM.

However, it still was neat to get to know what (those five) people mean by it, and also, more important: why these people take it seriously.

In fact, given Hossenfelder’s sober (and intelligent!) presentation of it, I am willing to give them a bit of a rope too. …No, not so long that they can hang themselves with it, but long enough that they can perform some more detailed simulations. … I anticipate that when they conduct their simulations, they themselves are going to understand the query regarding the backward causation (raised by a philosopher during the interactive part of the video) in a much better manner. That’s what I anticipate.

Another point. Actually, “super-determinism” is supposed to be “just” a theory of physics, and hence, it should not have any thing to say about topics like consciousness, free-will, etc. But I gather that at least some of them (out of the five) do seem to think that the free-will would have to be denied, may be as a consequence of super-determinism. Taken in this sense, my mind has classified “super-determinism” as being the perfect foil to (or the other side of) panpsychism. … As to panpsychism, if interested, check out my take on it, here [^].

All along, I had always thought that super-determinism is going to turn out to be a wrong idea. Now, after watching this video, I know that it is a wrong idea.

However, precisely for the same reason (i.e., coming to know what they actually have in mind, and also, how they are going about it), I am not going to attack them, their research program. … Not necessary… I am sure that they would want to give up their program on their own, once (actually, some time after) I publish my ideas. I think so. … So, there…

“Video: Quantum mechanics isn’t weird, we’re just too big” YouTube video at: [^]

The speaker is Dr. Phillip Ball; the host is Dr. Zlatko Minev. Let me give some highlights of their bio’s: Ball has a bachelor’s in chemistry from Oxford and a PhD in physics from Bristol. He was an editor at Nature for two decades. Minev has a BS in physics from Berkeley and a PhD in applied physics from Yale. He works in the field of QC at IBM (which used to be the greatest company in the computers industry (including software)).

The abstract given at the YouTube page is somewhat misleading. Ignore it, and head towards the video itself.

The video can be divided into two parts: (i) the first part, ~47 minutes long, is a presentation by Ball; (ii) the second part is a chat between the host (Minev) and the guest (Ball). IMO, if you are in a hurry, you may ignore the second part (the chat).

The first two-third portion of the first part (the presentation) is absolutely excellent. I mean the first 37 minutes. This good portion (actually excellent) gets over once Ball goes to the slide which says “Reconstructing quantum mechanics from informational rules”, which occurs at around 37 minutes. From this point onward, Ball’s rigour dilutes a bit, though he does recover by the 40:00 minutes mark or so. But from ~45:00 to the end (~47:00), it’s all down-hill (IMO). May be Ball was making a small little concession to his compatriots.

However, the first 37 minutes are excellent (or super-excellent).

But even if you are absolutely super-pressed for time, then I would still say: Check out at least the first 10 odd minutes. … Yes, I agree 101 percent with Ball, when it comes to the portion from ~5:00 through 06:44 through 07:40.

Now, a word about the mistakes / mis-takes:

Ball says, in a sentence that begins at 08:10 that Schrodinger devised the equation 1924. This is a mistake / slip of the tongue. Schrodinger developed his equation in late 1925, and published it in 1926, certainly not in 1924. I wonder how come it slipped past Ball.

Also, the title of the video is somewhat misleading. “Bigness” isn’t really the distinguishing criterion in all situations. Large-distance QM entanglements have been demonstrated; in particular, photons are (relativistic) QM phenomena. So, size isn’t necessarily always the issue (even if the ideas of multi-scaling must be used for bridging between “classical” mechanics and QM).

And, oh yes, one last point… People five-and-a-half feet tall also are big enough, Phil! Even the new-borns, for that matter…

A personal aside: Listening to Ball, somehow, I got reminded of some old English English movies I had seen long back, may be while in college. Somehow, my registration of the British accent seems to have improved a lot. (Or may be the Brits these days speak with a more easily understandable accent.)

Status of my research on QM:

If I have something to note about my research, especially that related to the QM spin and all, then I will come back a while later and note something—may be after a week or two. …

As of today, I still haven’t finished taking notes and thinking about it. In fact, the status actually is that I am kindaa “lost”, in the sense: (i) I cannot stop browsing so as to return to the study / research, and (ii) even when I do return to the study, I find that I am unable to “zoom in” and “zoom out” of the topic (by which, I mean, switching the contexts at will, in between all: the classical ideas, the mainstream QM ideas, and the ideas from my own approach). Indeed (ii) is the reason for (i). …

If the same thing continues for a while, I will have to rethink whether I want to address the QM spin right at this stage or not…

You know, there is a very good reason for omitting the QM spin. The fact of the matter is, in the non-relativistic QM, the spin can only be introduced on an ad-hoc basis. … It’s only in the relativistic QM that the spin comes out as a necessary consequence of certain more basic considerations (just the way in the non-relativistic QM, the ground-state energy comes out as a consequence of the eigenvalue nature of the problem; you don’t have to postulate a stable orbit for it as in the Bohr theory). …

So, it’s entirely possible that my current efforts to figure out a way to relate the ideas from my own approach to the mainstream QM treatment of the spin are, after all, a basically pointless exercise. Even if I do think hard and figure out some good and original ideas / path-ways, they aren’t going to be enough, because they aren’t going to be general enough anyway.

At the same time, I know that I am not going to get into the relativistic QM, because it has to be a completely distinct development—and it’s going require a further huge effort, perhaps a humongous effort. And, it’s not necessary for solving the measurement problem anyway—which was my goal!

That’s why, I have to really give it a good thought—whether I should be spending so much time on the QM spin or not. May giving some sketchy ideas (rather, making some conceptual-level statements) is really enough… No one throws so much material in just one paper, anyway! Even the founders of QM didn’t! … So, that’s another line of thought that often pops up in my mind. …

My current plan, however, is to finish taking the notes on the mainstream QM treatment of the spin anyway—at least to the level of Eisberg and Resnick, though I can’t finish it, because this desire to connect my approach to the mainstream idea also keeps on interfering…

All in all, it’s a weird state to be in! … And, that’s what the status looks like, as of now…

… Anyway, take care and bye for now…

A song I, ahem, like:

It was while browsing that I gathered, a little while ago, that there is some “research” which “explains why” some people “like” certain songs (like the one listed below) “so much”.

The research in question was this paper [^]; it was mentioned on Twitter (where else?). Someone else, soon thereafter, also pointed out a c. 2014 pop-sci level coverage [^] of a book published even earlier [c. 2007].

From the way this entire matter was now being discussed, it was plain and obvious that the song had been soul-informing for some, not just soul-satisfying. The song in question is the following:

(Hindi) सुन रुबिया तुम से प्यार हो गया (“sun rubiyaa tum se pyaar ho gayaa”)
Music: Anu Malik
Lyrics: Prayag Raj
Singers: S. Jaanaki, Shabbir Kumar

Given the nature of this song, it would be OK to list the credits in any order, I guess. … But if you ask me why I too, ahem, like this song, then recourse must be made not just to the audio of this song [^] but also to its video. Not any random video but the one that covers the initial sequence of the song to an adequate extent; e.g., as in here [^].

2021.04.09 19:22 IST: Originally published.
2021.04.10 20:47 IST: Revised considerably, especially in the section related to the principle of the least action (PLA), and the section on the current status of my research on QM. Also minor corrections and streamlining. Guess now I am done with this post.

Sound separation, voice separation from a song, the cocktail party effect, etc., and AI/ML

A Special note for the Potential Employers from the Data Science field:

Recently, in April 2020, I achieved a World Rank # 5 on the MNIST problem. The initial announcement can be found here [^], and a further status update, here [^].

All my data science-related posts can always be found here [^].

1. The “Revival” series of “Sa Re Ga Ma” music company:

It all began with the two versions of the song which I ran the last time (in the usual songs section). The original song (or so I suppose) is here [^]. The “Revival” edition of the same song is here [^]. For this particular song, I happened to like the “Revival” version just so slightly better. It seemed to “fill” the tonal landscape of this song better—without there being too much of degradation to the vocals, which almost always happens in the “Revival” series. I listened to both these versions back-to-back, a couple of times. Which set me thinking.

I thought that, perhaps, by further developing the existing AI techniques and using them together with some kind of advanced features for manual editing, it should be possible to achieve the same goals that the makers of “Revival” were aiming at, but too often fell too short of.

So, I did a bit of Google search, and found discussions like this one at HiFiVision [^], and this one at [^]. Go through the comments; they are lively!

On the AI side, I found this Q&A at Quora: [^]. One of the comments said: “Just Google this for searching `software to clarify and improve sound recordings’ and you will have several thousand listings for software that does this job.”

Aha! So there already were at least a few software to do the job?

A few more searches later, I landed at Spleeter by Deezer [^]. (In retrospect, this seems inevitable.)

2. AI software for processing songs: Spleeter:

Spleeter uses Python and TensorFlow, and the installation instructions assume Conda. So, I immediately tried to download it, but saw my Conda getting into trouble “solving” for the “environment”. Tch. Too bad! I stopped the Conda process, closed the terminal window, and recalled the sage advice: “If everything fails, refer to the manual.” … Back to the GitHub page on Deezer, and it’s only now that I find that they still use TF 1.x! Bad luck!

Theoretically, I could have run Spleeter on Google Colab. But I don’t like Google Colab. Nothing basically wrong with it, except that (i) personally, I think that Jupyter is a great demo tool but a very poor development tool, and (ii) I like my personal development projects unclouded. (I will happily work on a cloud-based project if someone pays me to do it. For that matter, if someone is going to pay me and then also advices me to set up the swap space for my local Ubuntu installation on the cloud, I will happily do it too! But when it comes to my own personal projects, don’t expect me to work on the cloud.)

So, tinkering around with Spleeter was, unfortunately, not possible. But I noticed what they add at the Spleeter’s page: “there are multiple forks exposing spleeter through either a Guided User Interface (GUI) or a standalone free or paying website. Please note that we do not host, maintain or directly support any of these initiatives.” Good enough for me. I mean, the first part…

So, after a couple of searches, I landed at a few sites, and actually tried two of them: [^] and [^]. The song I tried was the original version of the same song I ran the last time (links to YouTube given above). I went for just two stems in both cases—vocals and the rest.

Ummm…. Not quite up to what I had naively expected. Neither software did too well, IMO. Not with this song.

3. Peculiarity of the Indian music:

The thing is this: Indian music—rather, Indian film (and similar) music—tends to be very Indian in its tastes.

We Indians love curries—everything imaginable thrown together. No 5 or 7 or 9 or 11 course dinners for us. All our courses are packed into a big simultaneous serving called “thaali”. Curries are a very convenient device in accomplishing this goal of serving the whole kitchen together. (BTW, talking of Western dinners, it seems to me that they always have an odd number of courses in dinners. Why so? why not an even number of courses? Is it something like that 13 thing?)

Another thing. Traditionally, Indian music has made use of many innovations, but ideas of orchestra and harmony have never been there. Also, Indian music tends to emphasize continuity of sounds, whereas Western music seems to come in a “choppy” sort of style. In Western music, there are clear-cut demarcations at every level, starting from very neatly splitting a symphony into movements, right down to individual phrases and the smallest pieces of variations. Contrast it all with the utter fluidity and flexibility with which Indian music (especially the vocal classical type) gets rendered.

On both counts, Indian music comes across as a very fluid and continuous kind of a thing: a curry, really speaking.

All individual spices (“masaalaa” items) are not just thrown together, they are pounded and ground together into a homogenous ball first. That’s why, given a good home-made curry, it is tough to even take a good guess at exactly what all spices might have gone into making it. …Yes, even sub-regional variations are hard to figure out, even for expert noses. Just ask a lady from North Maharashtra or Marathwada to name all the spices used in a home-made curry from the north Konkan area of Maharashtra state (which is different from the Malavani sub-region of Konkan), going just by the taste. These days, communications have improved radically, and so, people know the ingredients already. But when I was young, women of my mother’s age would typically fail to guess the ingredients right. So, curries come with everything utterly mixed-up together. The whole here is not just greater than the sum of its parts; the whole here is one whose parts cannot even be teased out all that easily.

In India, our songs too are similar to curries. Even when we use Western instruments and orchestration ideas, the usage patterns still tend to become curry-like. Which means, Indian songs are not at all suitable for automatic/mechanically conducted analysis!

That’s why, the results given by the two services were not very surprising. So, I let it go at that.

But the topic wasn’t prepared to let me go so easily. It kept tugging at my mind.

4. Further searches:

Today, I gave in, and did some further searches. I found Spleeter’s demo video [^]. Of course, there could be other, better demo’s too. But that’s what I found and pursued. I also found a test of Spleeter done on Beatles’ song: “Help” [^].

Finally, I also found this video which explains how to remove vocals from a song using Audacity[^]. Skip it if you wish, but it was this video which mentioned Melodyne [^], which was a new thing to me. Audacity is Open Source, whereas Melodyne is a commercial product.

Further searches later, I also found this video (skip it if you don’t find all these details interesting), using ISSE [^]. Finally, I found this one (and don’t skip it—there’s a wealth of information in it): [^]. Among many things, it also mentions AutoTune [^], a commercial product. Google search suggested AutoTalent as its Open Source alternative; it was written by an MIT prof [^]. I didn’t pursue it a lot, because my focus was on vocals-extraction rather than vocals pitchcorrection.

Soooooo, where does that leave us?

Without getting into all the details, let me just state a few conclusions that I’ve reached…

5. My conclusions as of today:

5.1. Spleeter and similar AI/ML-based techniques need to improve a lot. Directly offering voice-separation services is not likely to take the world by the storm.

5.2. Actually, my position is better stated by saying this: I think that directly deploying AI/ML in the way it is being deployed, isn’t going to work out—at all. Just throwing tera-bytes of data at the problem isn’t going to solve it. Not because the current ML techniques aren’t very capable, but because music is far too complex. People are aiming for too high-hanging a fruit here.

5.3. At the same time, I also think that in a little more distant future, say over a 5–10 years’ horizon, chances are pretty good that tasks like separating the voice from the instrumental sounds would become acceptably good. Provided, they pursue the right clues.

6. How the development in this field should progress (IMO):

In this context, IMO, a good clue is this: First of all, don’t start with AI/ML, and don’t pursue automation. Instead, start with a very good idea of what problem(s) are at all to be solved.

In the present context, it means: Try to understand why products like Melodyne and AutoTune have at all been successful—despite there being “so little automation” in them.

My answer: Precisely because these software have given so much latitude to the user.

It’s only after understanding the problem to be solved, and the modalities of current software, that we should come to this question of whether the existing capabilities of these software can at all be enhanced using AI/ML, using one feature/aspect at a time.

My answer, in short: Yes, they can (and should) be.

Notice, we don’t start with the AI/ML algorithms and then try to find applications for them. We start with some pieces of good software that have already created certain needs (or expanded them), and are fulfilling them already. Only then do we think of enhancing it—with AI/ML being just a tool. Yes, it’s an enabling technology. But in the end, it’s just a tool to improve other software.

Then, as the next step, consolidate all possible AI-related gains first—doing just enhancements, really speaking. Only then proceed further. In particular, don’t try to automate everything right from the beginning.

IMO, AI/ML techniques simply aren’t so well developed that they can effectively tackle problems involving relatively greater conceptual scope, such that wide-ranging generalizations get involved in complex ways, in performing a task. AI/ML techniques can, and indeed do, excel—and even outperform people—but only in those tasks that are very narrowly defined—tasks like identifying handwritten digits, or detecting few cancerous cells from among hundreds of healthy cells using details of morphology—without ever getting fatigued. Etc.

Sound isolation is not a task very well suited to these algorithms. Not at this stage of development of AI/ML, and the sound-related softwares.

“The human element will always be there,” people love to repeat in the context of AI.

Yes, I am an engineer and I am very much into AI/ML. But when it comes to tasks like sound separation and all, my point is stronger than what people say. IMO, it would be actually stupid to throw away the manual editing aspects.

Human ear is too sensitive an instrument; it takes almost nothing (and almost no time) for most of us to figure out when some sound processing goes bad, or when a reverse-FFT’ed sound begins to feel too shrill at times, or too “hollow” at other times, or plain “helium-throat”-like at still others [^][^].

Gather some 5–10 people in a room and play some songs on a stereo system that is equipped with a good graphic equalizer. If there is no designated DJ, what is the bet that people are just going to fiddle around with the graphic equalizer every time a new song begins? The reason is not just that people want to impress others with their expertize. The fact of the matter is, people are sensitive to even minutest variations that come in sound, and they will simply not accept something which does not sound “just right.” Further, there are individual tastes too—as to what precisely is “just right”. That’s why, if one guy increases the bass, someone else is bound to get closer to the graphic equalizer to set it right! It’s going to happen.

That’s why, it’s crucial not to even just attempt to “minimize” the human “interference.” Don’t do that. Not for software like these.

Instead, the aim should be to keep that human element right at the forefront, despite using AI/ML.

Ditto, for other similarly complex tasks / domains, like colouring B&W images, generating meaningful passages of text, etc.

That’s what I think. As of today.

7. Guess I also have some ideas for processing of music:

So, yes, I am not at all for directly starting training Deep Learning models with lots of music tracks.

At the same time, I guess, I also have already started generating quite a few ideas regarding these topics: analysis of music, sound separation, which ML technique might work out well and in what respect (for these tasks), what kind of abstract information to make available to the human “operator” and in what form/presentation, etc. …

…You see, some 15+ years ago, I had actually built a small product called “ToneBrush.” It offered real-time visualizations of music using windowed FFT and further processing (something like spectrogram and the visualizations which by now have become standard in all media players like VLC etc.). My product didn’t even sell just a single copy! But it was a valuable experience…

…Cutting back to the present, all the thinking which I did back then, now came back to once again. … All the same, for the time being, I’m just noting these ideas in my notebook, and otherwise moving this whole topic on to the back-burner. I first want to finish my ongoing work on QM, first of all.

One final note, an after-thought, actually: I didn’t say anything about the cocktail party effect. Well, if you don’t know what the effect means, start with the Wiki [^]. As to its relevance: I remember seeing some work (I think it was mentioned at Google’s blog) which tried to separate out each speaker’s voice from the mixed up signals coming from, say, round-table discussion kind of scenarios. However, I am unable to locate it right now. So, let me leave it as “homework” for you!

8. On the QM front:

As to my progress on the QM side—or lack of it: I spotted (and even recalled from memory) quite a few more conceptual issues, and have been working through them. The schedule might get affected a bit, but not a lot. Instead of 3–4 weeks which I announced 1–2 weeks ago, these additional items add a further couple of weeks or so, but not more. So, instead of August-end, I might be ready by mid-September. Overall, I am quite happy with the way things are progressing in this respect. However, I’ve realized that this work is not like programming. If I work for more than just 7–8 hours a day, then I still get exhausted. When it’s programming and not just notes/theory-building, then I can easily go past 12 hours a day, consistently, for a lot longer period (like weeks at a time). So, things are moving more slowly, but quite definitely, and I am happy with the progress so far. Let’s see.

In the meanwhile, of course, thoughts on topics like coloring of B&W pics or sound separation also pass by, and I note them.

OK then, enough is enough. See you after 10–15 days. In the meanwhile, take care, and bye for now…

A song I like:

(Western, Pop) “Rhiannon”
Band: Fleetwood Mac

[I mean this version: [^]. This song has a pretty good melody, and Stevie Nicks’s voice, even if it’s not too smooth and mellifluous, has a certain charm to it, a certain “femininity” as it were. But what I like the most about this song is, once again, its sound-scape taken as a whole. Like many songs of its era, this song too carries a right level of richness in its tonal land-scape—neither too rich nor too sparse/rarefied, but just right. …If I recall it right, surprisingly, the first time I heard this song was not in the COEP hostels, but in the IIT Madras hostels.]

— 2020.08.09 02:43 IST: First published
— 2020.08.09 20:33 IST: Fixed the wrong linkings and the broken links, and added reference to AutoTalent.


Status update on my trials for the MNIST dataset

This post is going to be brief, relatively speaking.

1. My further trials for the MNIST dataset :

You know by now, from my last post about a month ago [^], that I had achieved a World Rank # 5 on the MNIST dataset (with 99.78 % accuracy), and that too, using a relatively slow machine (single CPU-only laptop).

At that time, as mentioned in that post, I also had some high hopes of bettering the result (with definite pointers as to why the results should get better).

Since then, I’ve conducted a lot of trials. Both the machine and I learnt a lot. However, during this second bout of trials, I came to learn much, much more than the machine did!

But natural! With a great result already behind me, my focus during the last month naturally shifted to better understanding the why’s and the how’s of it, rather than sheer chasing a further improvement in accuracy, by hook or crook.

So, I deliberately reduced my computational budget from 30+ hours per trial to 12 hours at the most. [Note again, my CPU-only hardware runs about 4–8 times slower, perhaps even 10+ times slower, as compared to the GPU-carrying machines.]

Within this reduced computing budget, I pursued a lot many different ideas and architectures, with some elements being well known to people already, and some elements having been newly invented by me.

The ideas I combined include: batch normalization, learnable pooling layers, models built with functional API (permitting complex entwinements of data streams, not just sequential), custom-written layers (not much, but I did try a bit), custom-written learning-rate scheduler, custom-written call-back functions to monitor progress at batch- and epoch-ends, custom-written class for faster loading of augmented data (TF-Keras itself spins out a special thread for this purpose), apart from custom logging, custom-written early stopping, custom-written ensembles, etc. … (If you are a programmer, you would respect me!)

But what was going to be the result? Sometimes there was some faint hope for some improvement in accuracy; most times, a relatively low accuracy was anyway expected, and that’s what I saw. (No surprises there—the computing budget, to begin with, had to kept small.)

Sometime during these investigations into the architectures and algorithms, I did initiate a few long trials (in the range of some 30–40 hours of training time). I took only one of these trials to completion. (I interrupted and ended all the other long trials more or less arbitrarily, after running them for, may be, 2 to 8 hours. A few of them seemed leering towards eventual over-fitting; for others, I simply would lose the patience!)

During the one long trial which I did run to completion, I did achieve a slight improvement in the accuracy. I did go up to 99.79 % accuracy.

However, I cannot be very highly confident about the consistency of this result. The algorithms are statistical in nature, and a slight degradation (or even a slight improvement) from 99.79 % is what is to be expected.

I do have all the data related to this 99.79 % accuracy result saved with me. (I have saved not just the models, the code, and the outputs, but also all the intermediate data, including the output produced on the terminal by the TensorFlow-Keras library during the training phase. The outputs of the custom-written log also have been saved diligently. And of course, I was careful enough to seed at least the most important three random generators—one each in TF, numpy and basic python.)

Coming back to the statistical nature of training, please note that my new approach does tend to yield statistically much more robust results (with much less statistical fluctuations, as is evident from the logs and all, and as also is only to be expected from the “theory” behind my new approach(es)).

But still, since I was able to conduct only one full trial with this highest-accuracy architecture, I hesitate to make any statement that might be mis-interpreted. That’s why, I have decided not to claim the 99.79 % result. I will mention the achievement in informal communications, even on blog posts (the way I am doing right now). But I am not going to put it on my resume. (One should promise less, and then deliver more. That’s what I believe in.)

If I were to claim this result, it would also improve my World Rank by one. Thus, I would then get to World Rank # 4.

Still, I am leaving the actual making of this claim to some other day. Checking the repeatability will take too much time with my too-slow-for-the-purposes machine, and I need to focus on other areas of data science too. I find that I have begun falling behind on them. (With a powerful GPU-based machine, both would have been possible—MNIST and the rest of data science. But for now, I have to prioritize.)


Since my last post, I learnt a lot about image recognition, classification, deep learning, and all. I also coded some of the most advanced ideas in deep learning for image processing that can at all be implemented with today’s best technology—and then, a few more of my own. Informally, I can now say that now I am at World Rank # 4. However, for the reasons given above, I am not going to make the claim for the improvement, as of today. So, my official rank on the MNIST dataset remains at 5.


2. Miscellaneous:

I have closed this entire enterprise of the MNIST trials for now. With my machine, I am happy to settle at the World Rank # 5 (as claimed, and # 4, informally and actually).

I might now explore deep learning for radiology (e.g. detection of abnormalities in chest X-rays or cancers), for just a bit.

However, it seems that I have been getting stuck into this local minimum of image recognition for too long. (In the absence of a gainful employment in this area, despite my world-class result, it still is just a local minimum.) So, to correct my overall tilt in the pursuit of the topics, for the time being, I am going to keep image processing relatively on the back-burner.

I have already started exploring time-series analysis for stock-markets. I would also be looking into deep learning from text data, esp. NLP. I have not thought a lot about it, and now I need to effect the correction.

… Should be back after a few weeks.

In the meanwhile, if you are willing to pay for my stock-market tips, I would sure hasten designing and perfecting my algorithms for the stock-market “prediction”s. … It just so happens that I had predicted yesterday, (Sunday 10th May) that the Bombay Stock Exchange’s Sensex indicator would definitely not rise today (on Monday), and that while the market should trade at around the same range as it did on Friday, the Sensex was likely to close a bit lower. This has come true. (It in fact closed just a bit higher than what I had predicted.)

… Of course, “one swallow does not a summer make.”… [Just checked it. Turns out that this one has come from Aristotle [^]. I don’t know why, but I had always carried the impression that the source was Shakespeare, or may be some renaissance English author. Apparently, not so.]

Still, don’t forget: If you have the money, I do have the inclination. And, the time. And, my data science skills. And, my algorithms. And…

…Anyway, take care and bye for now.


A song I like:

(Hindi) तुम से ओ हसीना कभी मुहब्बत मुझे ना करनी थी… (“tum se o haseenaa kabhee muhabbat naa” )
Music: Laxmikant Pyarelal
Singers: Mohammad Rafi, Suman Kalyanpur
Lyrics: Anand Bakshi

[I know, I know… This one almost never makes it to the anyone’s lists. If I were not to hear it (and also love it) in my childhood, it wouldn’t make to my lists either. But given this chronological prior, the logical prior too has changed forever for me. …

This song is big on rhythm (though they overdo it a bit), and all kids always like songs that emphasize rhythms. … I have seen third-class songs from aspiring/actual pot-boilers, songs like तु चीझ बडी है मस्त मस्त (“too cheez baDi hai mast, mast”), being a huge hit with kids. Not just a big hit but a huge one. And that third-rate song was a huge hit even with one of my own nephews, when he was 3–4 years old. …Yes,  eventually, he did grow up…

But then, there are some song that you somehow never grow out of. For me, this song is one of them. (It might be a good idea to start running “second-class,” why, even “third-class” songs, about which I am a bit nostalgic. I listened to a lot of them during this boring lock-down, and even more boring, during all those long-running trials for the MNIST dataset. The boredom, especially on the second count, had to be killed. I did. … So, all in all, from my side, I am ready!)

Anyway, once I grew up, there were a couple of surprises regarding the credits of this song. I used to think, by default as it were, that it was Lata. No, it turned out to be Suman Kalyanpur. (Another thing. That there was no mandatory “kar” after the “pur” also was a surprise for me, but then, I digress here.)

Also, for no particular reason, I didn’t know about the music director. Listening to it now, after quite a while, I tried to take a guess. After weighing in between Shankar-Jaikishan and R.D. Burman, also with a faint consideration of Kalyanji-Anandji, I started suspecting RD. …Think about it. This song could easily go well with those from the likes of तीसरी मंझील (“Teesri Manjhil”) or काँरवा (“karvaan”) right?

But as a self-declared RD expert, I also couldn’t shuffle my memory and recall some incidence in which I could be found boasting to someone in the COEP/IITM hostels that this one indeed was an RD song. … So, it shouldn’t be RD either. … Could it be Shankar-Jaikishan then?  The song seemed to fit in with the late 60s SJ mold too. (Recall Shammi.) … Finally, I gave up, and checked out the music director.

Well, Laxmikant Pyarelal sure was a surprise to me! And it should be, to any one. Remember, this song is from 1967. This was the time that LP were coming out with songs like those in दोस्ती (“Dosti”),  मिलन (“Milan”), उपकार (“Upakar”), and similar. Compositions grounded in the Indian musical sense, through and through.

Well yes, LP have given a lot songs/tunes that go more in the Western-like fold. (Recall रोज शाम आती थी (“roz shyaam aatee thee”), for instance.) Still, this song is a bit too “out of the box” for LP when you consider their typical “box”. The orchestration, in particular, also at times feels as if the SD-RD “gang” like Manohari Singh, Sapan Chakravorty, et al. wasn’t behind it, and so does the rendering by Rafi (espcially with that sharp हा! (“hah!”) coming at the end of the refrain)… Oh well.

Anyway, give a listen and see how you find it.]