My new year’s resolutions—2021 edition

Before we come to my New Year’s Resolutions for the last year and then for the new year, do take a moment to check out the poll I’ve posted on Twitter, and see if you wish to respond to it. I would be grateful to you if you do. The tweet in question is the following one:

The poll gets over within a day: On 01 January 2021 (all time-zones in the world) + a few hours.


1. A quick review of my blog posts in the year 2020:

Not counting the present post, I made in all 26 posts overall during the year 2020. So, on an average, there was one post every fortnight.

This statistic is somewhat misleading. My posts are always much longer than those of your average blogger (i.e. among those few left who still blog!) My posts are almost always > 1.5 words in length, often > 3 k words, and many times >= 5k words. Also, my last year’s NYRs had actually included making even fewer posts!

Let me pick out the more important posts I made this year:

05 February 2020: Equations in the matrix form for implementing simple artificial neural networks [^]
Don’t underestimate the amount of effort which has gone into writing the document mentioned in this post. The PDF document itself was uploaded at my GitHub account, here [^]. (Let me now make it available also from this blog, see here [^]). In this document, I have worked out each and every equation in a consistent, matrix-based format. Although the title says: “simple” neural networks, that word is somewhat misleading. Even the backward passes have been covered, in 100% detail, for all the fundamentally important layers. These same layers are used even in the most modern Deep Learning networks.

13 April 2020: Status: 99.78 % accuracy on the MNIST dataset (World Rank # 5, using a single CPU-only laptop) [^]
The Covid lockdowns had begun already. I had been on the lookout for jobs for more than a year by then. But now I knew that due to the lockdowns, no further interview calls are going to come in—not for me anyway. So, I had a lot of time at hand. I had just written the document on the equations of ANNs/DL. So, the knowledge was fresh. So, I decided to put to use some of my research ideas, with the foremost benchmark of Deep Learning. I broke through in World’s Top 10! (No Indian had ever been in top 20, perhaps even in top 50. That includes Indians from IITs and those gone abroad for higher studies/research/jobs/startups.)

11 May 2020: Status update on my trials for the MNIST dataset [^]
Continuation of the above work. I raised my performance by a tiny but significant 0.01%. I also briefly mention the kind of tricks I tried.

10 June 2020: Python scripts for simulating QM, part 2: Vectorized code for the H atom in a 1D/2D/3D box. [Also, un-lockdown and covid in India.] [^]
This post also mentioned some of the cutting-edge work I had done earlier, in software engineering—like, in writing a yacc-like tool, given just abstract grammar of a computer language.

18 July 2020: On the Bhagavad-Geetaa, ch. 2, v. 47—part 1: कर्मण्येवाधिकारस्ते (“karmaNyevaadhikaaraste”) [^]
This post still remains rather rough. It could do with some good editing. But the real meaning of the Sanskrit phrases aren’t going to change. Check it out if you think you know this famous verse from the Gita.

31 July 2020: “Simulating quantum `time travel’ disproves butterfly effect in quantum realm”—not! [^]
An unplanned post. I never meant to write anything on QM from this angle, but a new paper came up, and it was hyped rather too much. And that was a relative statement. It was hyped even more than the super-high type of hype which has always been an old normal in these fields of Foundations of QM and Quantum Computers

09 August 2020: Sound separation, voice separation from a song, the cocktail party effect, etc., and AI/ML [^]
This was my personal, opinionated, take on some AI/ML-based products/services.

23 August 2020: Talking of my attitudes… [^]
My answers to a questionnaire. This post should be of interest to those who don’t want all the details of my new approach, but just want to know how I view the various issues and interpretations concerning QM

17 October 2020: Update: Almost quitting QM… [^]
I try to be the worst possible critic of my new approach to QM. So, naturally, doubts like these can easily come up. It’s just that once I notice such things, I deal with them.

08 November 2020: Updates: RSI. QM tunnelling time. [^]
Another unplanned post. It covers a very impressive line of experimentation in QM, its coverage in other blogs, as also my own comments. In particular, I thought that this piece of work is likely to be nominated for a physics Nobel too (and gave my reasons for the same).

09 December 2020: Some comments on QM and CM—Part 1: Coleman’s talk. Necessity of ontologies. [^]
and
19 December 2020: Some comments on QM and CM—Part 2: Without ontologies, “classical” mechanics can get very unintuitive too. (Also, a short update.) [^]
Another couple of unplanned posts. I took this opportunity to present something on my (revised) views of ontologies in physics.

The above posts pretty much capture the two issues which kept me pre-occupied during 2020—Data Science, and Foundations of QM.

Other posts were relatively more topical (like updates), or not so important (though I might have made some good points in them, e.g., this 16 June 2020 post: The singularities closest to you [^].

Twitter:

Apart from this blog, I also made a lot of tweets. Off-hand, these included: Comments on pop-sci articles and videos; comments on papers from QM; comments on developments in Data Science (e.g. the big news related to the Protein-Folding problem),  etc. Also, longer twitter-threads (up to 9 or 10 tweets long, not longer) mentioning my thoughts on various topics, e.g., my ideas on relation of maths with reality and physics, finer points related to my ongoing development of my new approach, etc.

Comments at others’ blogs:

I’ve been making many comments at others’ blogs, and some of them have included my spontaneous/ live/ latest thoughts too. However, I don’t want to go through all of that at this point of time. Most of these times, I have saved these spontaneous responses, and I use them in my R&D too, especially of QM and Data Science.


2. A quick review of my last year’s resolutions (for the year 2020):

The resolutions I made last year, are here: My new year’s resolutions—2020 edition [^]

How did I fare on those? Let me jot down in brief:

2.1. Review of last year’s NYRs: Data Science:

2.1.1: A set of brief notes

I did write the “Equations in the matrix form” document; see the comment above.

I had actually planned to write a set of “of brief notes (in LaTeX) on topics from machine learning.” I didn’t. Two reasons:

  • Reason 1: Looking at the way the IT industry people treated my applications through job sites, I came to realize that publishing such notes was likely to push the IT industry people in considering me only for the training jobs.
  • Reason 2: I achieved 99.78% accuracy (see above). Writing notes, just to show my understanding, naturally took a back-seat. Now I was actually putting to use the knowledge in research too, not just compiling it in a neatly processed form (as in the Equations document mentioned above).

2.1.2. New ideas:

I was going to develop new ideas concerning ML/DL/AI, and perhaps publish a paper.

Well, I did develop my ideas, at least w.r.t. the image data problems, as noted above. However, I have decided to withhold publications. (I think that so far, I’ve been able to hold also the network-hacking/screen-grabbing and bitmap-shipping (remember Citrix?) folks at the bay. So, I think I should be able to publish some time later.

2.2. Review of last year’s NYRs: Quantum Mechanics:

Yes, I worked on all the points noted in the last year’s NYRs, and a lot more.

In particular, my development during the year threw some completely new conceptual issues to me, and I dealt with them. In the process, my understanding of QM became even better and deeper. However RSI struck right around the same time, which was yet another unexpected development. As a result, I could not implement in code all my new ideas and see them in action (and verify them!).

2.3. Review of last year’s NYRs: Health etc.:

I had resolved for going “for walks (30 minutes) on at least six days each month”, and to “do surya namaskars on at least two days a week”.

I failed.

As to walking, it soon enough got ruled out due to Covid. As to “soorya namaskaara”s, I did try to keep up with “at least 12 namaskaar’s every day”, but I couldn’t. And, once there was a break, it soon became a complete break.

On the positive side, I didn’t have any alcoholic drink (not even wine) after 16th March 2020. … No, it wasn’t a religious thing. I just avoided going out, due to Covid, and soon later, there were the lockdowns. Then, even when the wine shops reopened, I just thought of continuing until (a) the year-end, or (b) a validation-through-simulations of my new ideas in QM, whichever came first. As a result, I didn’t.

I plan to have a bit tonight, as an exception. I also plan to have a celebration with drinks once I achieve the milestones noted for the new year (2021).

2.4. Review of last year’s NYRs: Blogging:

I had resolved to reduce blogging.

However, there were enough developments that I ended up doing 26 posts instead of the planned 12 to 20 posts.

2.5. Review of last year’s NYR’s: Translations:

I was going to attempt translating उपनिषद (Upanishad). Well, I did pursue this activity but mostly in the mind. OTOH, I did publish on what used to be arguably the most often quoted verse from Gitaa.

2.6. Review of last year’s NYR’s: Meditation, side-readings (on all topics—not just spirituality), etc.

I had said: “Satisfactory pace achieved already. No need to change it; certainly no need to make NYR’s about them as such.” Yes, my reading did continue, satisfactorily enough.


3. My new year’s resolutions for 2021:

3.1. Try to “un-become” “Bohmianism”:

Remember, I had resolved to be a Bohmian via my Diwali-time resolutions? See this 23 November 2020 post: “Now I am become Bohmianism” [^]

Now my NYR is to cancel it!

Reason: I won’t tell you right away. You should be able to figure it out anyway, esp. over the course of the new year!

Aside: However, you know, NYR’s are sooo sooo hard to keep. So, don’t be surprised if I end up saying something like “We the Bohmians…,” also in the new year!

3.2. My new approach to quantum mechanics:

Spinless particles:

Conduct some basic simulations and write *some* preliminary documentation on the spinless electrons (up to 3 electron systems) by Q1-end. If the RSI severity goes up, by Q2-end.

QM with spin:

Conduct some basic simulations and write *some* preliminary documentation on the spinning electrons (up to 3 electron systems) by Q2-end. If the RSI strikes, by Q3-end or year-end.

Depending on the progress, revise my 2019-times Outline document on my new approach. Note, revising that document is optional.

Scope of this NYR:

The above resolutions do not cover the more advanced topics like: photons, detailed studies on the times taken by QM processes, detailed studies of the multi-scaling issues, etc.

However, the above resolutions do cover predicting the bonding energies of electrons in the helium atom (the 1D case), and the preliminary three-particle simulations.

3.3. Data science:

Work a bit on some projects I have in mind—at least two or three of them.

Note: Nothing big here. Just some small little projects of personal interest. Details will become apparent over the course of the year.

3.4. Health etc.:

As noted above, the last year’s NYRs failed. So: Adjust the resolution further, for this year. Accordingly, the NYR for this year is:

Commit to performing one soorya namaskaar every day.

If even this routine fails for whatever reasons (say, a genuine reason like travel etc., or even plain out of irregularity, forgetfulness, boredom, etc.), then it still would be fine. But the next time the issue crosses the mind, just resume the “at least one” routine back again. (Yes, such resumptions are an integral part of this very resolution itself!)

Also included in this resolution is this point: Publish a summary of my actual performance at the end of the next year. (So, keep a record!)

No resolutions concerning food, drink, etc., this year.

3.5. Sanskrit:

Start learning Sanskrit in a more formal manner, doing an online course (or more of them).

3.6. Miscellaneous:

That’s it! Nothing in the miscellaneous department, this year. The rest of the routines are doing fine (like, e.g., meditation, studies of other topics, and all). No need to change anything about them; no need to make any resolution either.


Apart from it all, take care of yourself, and have a Happy, Productive and Prosperous New Year!

I will only return after I have progressed to a definite extent in my QM-related work, which might be a couple of weeks from now on.


A song I like:

(Instrumental, “fusion”): “The River”
Composer: Ananda Shankar

[This was one of the instrumental pieces I would most often listen to, back in my IIT Madras days (1985–87).

… By now, I’ve forgotten whether I had heard this piece first in IIT Madras or in Pune. I do distinctly remember buying the original cassette for this album (“Ananda Shankar: A Life in Music”) in Pune, in particular, from a certain shop in the “Wonderland” shopping complex, on the Main Street in the Pune Camp area, and listening to it often in Pune. So, some chances are that I listened to it first in Pune. OTOH, my nearest retrievable memory also says two things: (1) I would listen to it on a Sony National Panasonic “Walkman” (having a 3-band equalizer), and (2) I had bought such a  Walkman in the Burma Bazaar area of Madras (now Chennai), though I no longer remember whether I bought it while being a student at IIT M or some time later. … So, all in all, I am not sure when or where I listened to it the first time.

… In any case, I am sure that the song became a routine transitioning music on TV and radio in India only some time later (may be after a few years or so). At least I hadn’t listened to it first on TV/radio. …

This song was one of my all time top-most favorites for a long time during my youth. Frankly though, listening to it once again only last year, after a gap of some two+ decades, it sounded a slight bit different. … But yes, it still remains one of my most favorites. (It’s surprising that I happened not to have run it here so far.)

This piece is short (just about 3 minutes long), but it is absolutely innovative, fresh, and creates a wonderfully unhurried, placid, but not lackadaisical mood. … Just as if you were sitting by the side of a river on an unhurried evening, while resplendent colors slowly unfolded in the sky and also on the placid waters, until it became fully dark, completely unknown to you…. Or, as a small, lone wasp of a cloud loitered around the sky, became thinner, and somehow, the next time you looked at it, it was almost gone… An outstanding piece this one is, IMO, even when compared to other pieces by Ananda Shankar himself… This piece carries that unmistakable Indian touch even as the composition unfolds with a Western-sounding orchestration/trappings. (And it was always a bad idea, IMO, to use this piece for transitioning in between TV/radio programs… But then, that’s an entirely different matter!)

So, anyway, give it a listen and see if you like it too. … Back then, it sounded very fresh and innovative. But with a lot of music of the similar kind, not to mention the easy access to the World music these days, if you are listening to it for the first time, this piece may not sound all that extraordinary. But back then, it was, for me at least. … A good quality audio can be found here [^].

Alright, bye for now and take care…

]


History:
— 2020.12.31 19:19 IST: First published
— 2020.12.31 20:26 IST: Very minor corrections and additions, all in the songs section.

 

Sound separation, voice separation from a song, the cocktail party effect, etc., and AI/ML

A Special note for the Potential Employers from the Data Science field:

Recently, in April 2020, I achieved a World Rank # 5 on the MNIST problem. The initial announcement can be found here [^], and a further status update, here [^].

All my data science-related posts can always be found here [^].


1. The “Revival” series of “Sa Re Ga Ma” music company:

It all began with the two versions of the song which I ran the last time (in the usual songs section). The original song (or so I suppose) is here [^]. The “Revival” edition of the same song is here [^]. For this particular song, I happened to like the “Revival” version just so slightly better. It seemed to “fill” the tonal landscape of this song better—without there being too much of degradation to the vocals, which almost always happens in the “Revival” series. I listened to both these versions back-to-back, a couple of times. Which set me thinking.

I thought that, perhaps, by further developing the existing AI techniques and using them together with some kind of advanced features for manual editing, it should be possible to achieve the same goals that the makers of “Revival” were aiming at, but too often fell too short of.

So, I did a bit of Google search, and found discussions like this one at HiFiVision [^], and this one at rec.music [^]. Go through the comments; they are lively!

On the AI side, I found this Q&A at Quora: [^]. One of the comments said: “Just Google this for searching `software to clarify and improve sound recordings’ and you will have several thousand listings for software that does this job.”

Aha! So there already were at least a few software to do the job?

A few more searches later, I landed at Spleeter by Deezer [^]. (In retrospect, this seems inevitable.)

2. AI software for processing songs: Spleeter:

Spleeter uses Python and TensorFlow, and the installation instructions assume Conda. So, I immediately tried to download it, but saw my Conda getting into trouble “solving” for the “environment”. Tch. Too bad! I stopped the Conda process, closed the terminal window, and recalled the sage advice: “If everything fails, refer to the manual.” … Back to the GitHub page on Deezer, and it’s only now that I find that they still use TF 1.x! Bad luck!

Theoretically, I could have run Spleeter on Google Colab. But I don’t like Google Colab. Nothing basically wrong with it, except that (i) personally, I think that Jupyter is a great demo tool but a very poor development tool, and (ii) I like my personal development projects unclouded. (I will happily work on a cloud-based project if someone pays me to do it. For that matter, if someone is going to pay me and then also advices me to set up the swap space for my local Ubuntu installation on the cloud, I will happily do it too! But when it comes to my own personal projects, don’t expect me to work on the cloud.)

So, tinkering around with Spleeter was, unfortunately, not possible. But I noticed what they add at the Spleeter’s page: “there are multiple forks exposing spleeter through either a Guided User Interface (GUI) or a standalone free or paying website. Please note that we do not host, maintain or directly support any of these initiatives.” Good enough for me. I mean, the first part…

So, after a couple of searches, I landed at a few sites, and actually tried two of them: Melody.ml [^] and Lala.ai [^]. The song I tried was the original version of the same song I ran the last time (links to YouTube given above). I went for just two stems in both cases—vocals and the rest.

Ummm…. Not quite up to what I had naively expected. Neither software did too well, IMO. Not with this song.

3. Peculiarity of the Indian music:

The thing is this: Indian music—rather, Indian film (and similar) music—tends to be very Indian in its tastes.

We Indians love curries—everything imaginable thrown together. No 5 or 7 or 9 or 11 course dinners for us. All our courses are packed into a big simultaneous serving called “thaali”. Curries are a very convenient device in accomplishing this goal of serving the whole kitchen together. (BTW, talking of Western dinners, it seems to me that they always have an odd number of courses in dinners. Why so? why not an even number of courses? Is it something like that 13 thing?)

Another thing. Traditionally, Indian music has made use of many innovations, but ideas of orchestra and harmony have never been there. Also, Indian music tends to emphasize continuity of sounds, whereas Western music seems to come in a “choppy” sort of style. In Western music, there are clear-cut demarcations at every level, starting from very neatly splitting a symphony into movements, right down to individual phrases and the smallest pieces of variations. Contrast it all with the utter fluidity and flexibility with which Indian music (especially the vocal classical type) gets rendered.

On both counts, Indian music comes across as a very fluid and continuous kind of a thing: a curry, really speaking.

All individual spices (“masaalaa” items) are not just thrown together, they are pounded and ground together into a homogenous ball first. That’s why, given a good home-made curry, it is tough to even take a good guess at exactly what all spices might have gone into making it. …Yes, even sub-regional variations are hard to figure out, even for expert noses. Just ask a lady from North Maharashtra or Marathwada to name all the spices used in a home-made curry from the north Konkan area of Maharashtra state (which is different from the Malavani sub-region of Konkan), going just by the taste. These days, communications have improved radically, and so, people know the ingredients already. But when I was young, women of my mother’s age would typically fail to guess the ingredients right. So, curries come with everything utterly mixed-up together. The whole here is not just greater than the sum of its parts; the whole here is one whose parts cannot even be teased out all that easily.

In India, our songs too are similar to curries. Even when we use Western instruments and orchestration ideas, the usage patterns still tend to become curry-like. Which means, Indian songs are not at all suitable for automatic/mechanically conducted analysis!

That’s why, the results given by the two services were not very surprising. So, I let it go at that.

But the topic wasn’t prepared to let me go so easily. It kept tugging at my mind.

4. Further searches:

Today, I gave in, and did some further searches. I found Spleeter’s demo video [^]. Of course, there could be other, better demo’s too. But that’s what I found and pursued. I also found a test of Spleeter done on Beatles’ song: “Help” [^].

Finally, I also found this video which explains how to remove vocals from a song using Audacity[^]. Skip it if you wish, but it was this video which mentioned Melodyne [^], which was a new thing to me. Audacity is Open Source, whereas Melodyne is a commercial product.

Further searches later, I also found this video (skip it if you don’t find all these details interesting), using ISSE [^]. Finally, I found this one (and don’t skip it—there’s a wealth of information in it): [^]. Among many things, it also mentions AutoTune [^], a commercial product. Google search suggested AutoTalent as its Open Source alternative; it was written by an MIT prof [^]. I didn’t pursue it a lot, because my focus was on vocals-extraction rather than vocals pitchcorrection.

Soooooo, where does that leave us?

Without getting into all the details, let me just state a few conclusions that I’ve reached…

5. My conclusions as of today:

5.1. Spleeter and similar AI/ML-based techniques need to improve a lot. Directly offering voice-separation services is not likely to take the world by the storm.

5.2. Actually, my position is better stated by saying this: I think that directly deploying AI/ML in the way it is being deployed, isn’t going to work out—at all. Just throwing tera-bytes of data at the problem isn’t going to solve it. Not because the current ML techniques aren’t very capable, but because music is far too complex. People are aiming for too high-hanging a fruit here.

5.3. At the same time, I also think that in a little more distant future, say over a 5–10 years’ horizon, chances are pretty good that tasks like separating the voice from the instrumental sounds would become acceptably good. Provided, they pursue the right clues.

6. How the development in this field should progress (IMO):

In this context, IMO, a good clue is this: First of all, don’t start with AI/ML, and don’t pursue automation. Instead, start with a very good idea of what problem(s) are at all to be solved.

In the present context, it means: Try to understand why products like Melodyne and AutoTune have at all been successful—despite there being “so little automation” in them.

My answer: Precisely because these software have given so much latitude to the user.

It’s only after understanding the problem to be solved, and the modalities of current software, that we should come to this question of whether the existing capabilities of these software can at all be enhanced using AI/ML, using one feature/aspect at a time.

My answer, in short: Yes, they can (and should) be.

Notice, we don’t start with the AI/ML algorithms and then try to find applications for them. We start with some pieces of good software that have already created certain needs (or expanded them), and are fulfilling them already. Only then do we think of enhancing it—with AI/ML being just a tool. Yes, it’s an enabling technology. But in the end, it’s just a tool to improve other software.

Then, as the next step, consolidate all possible AI-related gains first—doing just enhancements, really speaking. Only then proceed further. In particular, don’t try to automate everything right from the beginning.

IMO, AI/ML techniques simply aren’t so well developed that they can effectively tackle problems involving relatively greater conceptual scope, such that wide-ranging generalizations get involved in complex ways, in performing a task. AI/ML techniques can, and indeed do, excel—and even outperform people—but only in those tasks that are very narrowly defined—tasks like identifying handwritten digits, or detecting few cancerous cells from among hundreds of healthy cells using details of morphology—without ever getting fatigued. Etc.

Sound isolation is not a task very well suited to these algorithms. Not at this stage of development of AI/ML, and the sound-related softwares.

“The human element will always be there,” people love to repeat in the context of AI.

Yes, I am an engineer and I am very much into AI/ML. But when it comes to tasks like sound separation and all, my point is stronger than what people say. IMO, it would be actually stupid to throw away the manual editing aspects.

Human ear is too sensitive an instrument; it takes almost nothing (and almost no time) for most of us to figure out when some sound processing goes bad, or when a reverse-FFT’ed sound begins to feel too shrill at times, or too “hollow” at other times, or plain “helium-throat”-like at still others [^][^].

Gather some 5–10 people in a room and play some songs on a stereo system that is equipped with a good graphic equalizer. If there is no designated DJ, what is the bet that people are just going to fiddle around with the graphic equalizer every time a new song begins? The reason is not just that people want to impress others with their expertize. The fact of the matter is, people are sensitive to even minutest variations that come in sound, and they will simply not accept something which does not sound “just right.” Further, there are individual tastes too—as to what precisely is “just right”. That’s why, if one guy increases the bass, someone else is bound to get closer to the graphic equalizer to set it right! It’s going to happen.

That’s why, it’s crucial not to even just attempt to “minimize” the human “interference.” Don’t do that. Not for software like these.

Instead, the aim should be to keep that human element right at the forefront, despite using AI/ML.

Ditto, for other similarly complex tasks / domains, like colouring B&W images, generating meaningful passages of text, etc.

That’s what I think. As of today.

7. Guess I also have some ideas for processing of music:

So, yes, I am not at all for directly starting training Deep Learning models with lots of music tracks.

At the same time, I guess, I also have already started generating quite a few ideas regarding these topics: analysis of music, sound separation, which ML technique might work out well and in what respect (for these tasks), what kind of abstract information to make available to the human “operator” and in what form/presentation, etc. …

…You see, some 15+ years ago, I had actually built a small product called “ToneBrush.” It offered real-time visualizations of music using windowed FFT and further processing (something like spectrogram and the visualizations which by now have become standard in all media players like VLC etc.). My product didn’t even sell just a single copy! But it was a valuable experience…

…Cutting back to the present, all the thinking which I did back then, now came back to once again. … All the same, for the time being, I’m just noting these ideas in my notebook, and otherwise moving this whole topic on to the back-burner. I first want to finish my ongoing work on QM, first of all.

One final note, an after-thought, actually: I didn’t say anything about the cocktail party effect. Well, if you don’t know what the effect means, start with the Wiki [^]. As to its relevance: I remember seeing some work (I think it was mentioned at Google’s blog) which tried to separate out each speaker’s voice from the mixed up signals coming from, say, round-table discussion kind of scenarios. However, I am unable to locate it right now. So, let me leave it as “homework” for you!

8. On the QM front:

As to my progress on the QM side—or lack of it: I spotted (and even recalled from memory) quite a few more conceptual issues, and have been working through them. The schedule might get affected a bit, but not a lot. Instead of 3–4 weeks which I announced 1–2 weeks ago, these additional items add a further couple of weeks or so, but not more. So, instead of August-end, I might be ready by mid-September. Overall, I am quite happy with the way things are progressing in this respect. However, I’ve realized that this work is not like programming. If I work for more than just 7–8 hours a day, then I still get exhausted. When it’s programming and not just notes/theory-building, then I can easily go past 12 hours a day, consistently, for a lot longer period (like weeks at a time). So, things are moving more slowly, but quite definitely, and I am happy with the progress so far. Let’s see.

In the meanwhile, of course, thoughts on topics like coloring of B&W pics or sound separation also pass by, and I note them.

OK then, enough is enough. See you after 10–15 days. In the meanwhile, take care, and bye for now…


A song I like:

(Western, Pop) “Rhiannon”
Band: Fleetwood Mac

[I mean this version: [^]. This song has a pretty good melody, and Stevie Nicks’s voice, even if it’s not too smooth and mellifluous, has a certain charm to it, a certain “femininity” as it were. But what I like the most about this song is, once again, its sound-scape taken as a whole. Like many songs of its era, this song too carries a right level of richness in its tonal land-scape—neither too rich nor too sparse/rarefied, but just right. …If I recall it right, surprisingly, the first time I heard this song was not in the COEP hostels, but in the IIT Madras hostels.]


History:
— 2020.08.09 02:43 IST: First published
— 2020.08.09 20:33 IST: Fixed the wrong linkings and the broken links, and added reference to AutoTalent.

 

Python scripts for simulating QM, part 2: Vectorized code for the H atom in a 1D/2D/3D box. [Also, un-lockdown and covid in India.]

A Special note for the Potential Employers from the Data Science field:

Recently, in April 2020, I achieved a World Rank # 5 on the MNIST problem. The initial announcement can be found here [^], and a further status update, here [^].

All my data science-related posts can always be found here [^].


1.The first cut for the H atom in a 3D box:

The last time [^], I spoke of an enjoyable activity, namely, how to make the tea (and also how to have it).

Talking of other, equally enjoyable things, I have completed the Python code for simulating the H atom in a 3D box.

In the first cut for the 3D code (as also in the previous code in this series [^]), I used NumPy’s dense matrices, and the Python ``for” loops. Running this preliminary code, I obtained the following colourful diagrams, and twitted them:

H atom in a 3D box of 1 angstrom sides. Ground state (i.e., 1s, eigenvalue index = 0). Contour surfaces with wiremesh. Plotted with Mayavi's mlab.contour3d().

H atom in a 3D box of 1 angstrom sides. Ground state (i.e., 1s, eigenvalue index = 0). All contours taken together show a single stationary state. Contour surfaces plotted with wiremesh. Plotted with Mayavi’s mlab.contour3d().

 

H atom in a 3D box of 1 angstrom sides. A 'p' state (eigenvalue index = 2). Contour surfaces with the Gouraud interpolation. Plotted with Mayavi's mlab.contour3d().

H atom in a 3D box of 1 angstrom sides. A ‘p’ state (eigenvalue index = 2). All contours taken together show a single stationary state. Contour surfaces with the Gouraud interpolation. Plotted with Mayavi’s mlab.contour3d().

 

H atom in a 3D box of 1 angstrom sides. A 'p' state (eigenvalue index = 2). Contour surfaces with wiremesh. Plotted with Mayavi's mlab.contour3d().

H atom in a 3D box of 1 angstrom sides. A ‘p’ state (eigenvalue index = 2). All contours taken together show a single stationary state. Contour surfaces with wiremesh. Plotted with Mayavi’s mlab.contour3d().

 

H atom in a 3D box of 1 angstrom sides. Another 'p' state (eigenvalue index = 3). Contour surfaces with the Gourauad interpolation. Plotted with Mayavi's mlab.contour3d().

H atom in a 3D box of 1 angstrom sides. Another ‘p’ state (eigenvalue index = 3). All contours taken together show a single stationary state. Contour surfaces with the Gourauad interpolation. Plotted with Mayavi’s mlab.contour3d().

 

OK, as far as many (most?) of you are concerned, the enjoyable part of this post is over. So, go read something else on the ‘net.


Coming back to my enjoyment…

2. Sparse matrices. Vectorization of code:

After getting to the above plots with dense matrices and Python “for” loops, I then completely rewrote the whole code using SciPy‘s sparse matrices, and put a vectorized code in place of the Python “for” loops. (As a matter of fact, in the process of rewriting, I seem to have deleted the plots too. So, today, I took the above plots from my twitter account!)

2.1 My opinions about vectorizing code, other programmers, interviews, etc:

Vectorization was not really necessary in this problem (an eigenvalue problem), because even if you incrementally build the FD-discretized Hamiltonian matrix, it takes less than 1 percent of the total execution time. 99 % of the execution time is spent in the SciPy library calls.

Python programmers have a habit of always looking down on the simple “for” loops—and hence, on any one who writes them. So, I decided to write this special note.

The first thing you do about vectorization is not to begin wondering how best to implement it for a particular problem. The first thing you do is to ask yourself: Is vectorization really necessary here? Ditto, for lambda expressions. Ditto, for list comprehension. Ditto for itertools. Ditto for almost anything that is a favourite of the dumb interviewers (which means, most Indian Python programmers).

Vectorized codes might earn you brownie points among the Python programmers (including those who interview you for jobs). But such codes are more prone to bugs, harder to debug, and definitely much harder to understand even by you after a gap of time. Why?

That’s because practically speaking, while writing in Python, you hardly if ever define C-struct like things. Python does have classes. But these are rather primitive classes. You are not expected to write code around classes, and then put objects into containers. Technically, you can do that, but it’s not at all efficient. So, practically speaking, you are almost always into using the NumPy ndarrays, or similar things (like Pandas, xarrays, dasks, etc.).

Now, once you have these array-like thingies, indexing becomes important. Why? Because, in Python, it is the design of a number of arrays, and the relationships among their indexing scheme which together “defines” the operative data structures. Python, for all its glory, has this un-removable flaw: The design of the data structures is always implicit; never directly visible through a language construct.

So, in Python, it’s the indexing scheme which plays the same part as the classes, inheritance, genericity play in C++. But it’s implicit. So, how you implement the indexing features becomes of paramount importance.

And here, in my informed opinion, the Python syntax for the slicing and indexing operations has been made unnecessarily intricate. I, for one, could easily design an equally powerful semantics that comes with a syntax that’s much easier on the eye.

In case some professionally employed data scientist (especially a young Indian one) takes an offence to my above claim: Yes, I do mean what I say above. And, I also know what I am talking about.

Though I no longer put it on my CV, once, in the late 1990s, I had implemented a Yacc-like tool to output table-driven parser for the LALR-1 languages (like Java and C++). It would take a language specification in the EBNF (Extended Backus-Noor Form) as the input file, and produce the tables for table-driven parsing of that language. I had implemented this thing completely on my own, looking just at the Dragon Book (Aho, Sethi, Ullman). I haven’t had a CS university education. So, I taught myself the compilers theory, and then, began straight implementing it.

I looked at no previous code. And even if I were to look at something, it would have been horrible. These were ancient projects, written in C, not in C++, and written using arrays, no STL containers like “map”s. A lot of hard-coding, pre-proc macros, and all that. Eventually, I did take a look at the others’ code, but it was only in the verification stage. How did my code fare? Well, I didn’t have to change anything critical.

I had taken about 8 months for this exercise (done part time, on evenings, as a hobby). The closest effort was by some US mountain-time university group (consisting of a professor, one or two post-docs, and four-five graduate students). They had taken some 4 years to reach roughly the same place. To be fair, their code had many more features. But yes, both their code and mine addressed those languages which belonged to the same class of grammar specification, and hence, both would have had the same parsing complexity.

I mention it all it here mainly in order to “assure” the Indian / American programmers (you know, those BE/BS CS guys, the ones who are right now itching to fail me in any interview, should their HR arrange one for me in the first place) that I do know a bit about it when I was talking about the actual computing operations on one hand, and the mere syntax for those operations on the other. There are a lot of highly paid Indian IT professionals who never do learn this difference (but take care to point out that your degree isn’t from the CS field).

So, my conclusion is that despite all its greatness (and I do actually love Python), its syntax does have some serious weaknesses. Not just idiosyncrasies (which are fine) but actual weaknesses. The syntax for slicing and indexing is a prominent part of it.

Anyway, coming back to my present code (for the H atom in the 3D box, using finite difference method), if the execution time was so short, and if vectorization makes a code prone to bugs (and difficult to maintain), why did I bother implementing it?

Two reasons:

  1. I wanted to have a compact-looking code. I was writing this code mainly for myself, so maintenance wasn’t an issue.
  2. In case some programmer / manager interviewing me began acting over-smart, I wanted to have something which I could throw at his face. (Recently, I ran into a woman who easily let out: “What’s there in a PoC (proof of concept, in case you don’t know)? Any one can do a PoC…” She ranted on a bit, but it was obvious that though she has been a senior manager and all, and lists managing innovations and all, she doesn’t know. There are a lot of people like her in the Indian IT industry. People who act over-smart. An already implemented vectorized code, especially one they find difficult to read, would be a nice projectile to have handy.

2.2 Sparse SciPy matrices:

Coming back to the present code for the H atom: As I was saying, though vectorization was not necessary, I have anyway implemented the vectorization part.

I also started using sparse matrices.

In case you don’t know, SciPy‘s and NumPy‘s sparse matrix calls look identical, but they go through different underlying implementations.

From what I have gathered, it seems safe to conclude this much: As a general rule, if doing some serious work, use SciPy’s calls, not NumPy’s. (But note, I am still learning this part.)

With sparse matrices, now, I can easily go to a 50 \times 50 \times  50 domain. I haven’t empirically tested the upper limit on my laptop, though an even bigger mesh should be easily possible. In contrast, earlier, with dense matrices, I was stuck at at most at a 25 \times 25 \times 25 mesh. The execution time too reduced drastically.

In my code, I have used only the dok_matrix() to build the sparse matrices, and only the tocsr(), and tocoo() calls for faster matrix computations in the SciPy eigenvalue calls. These are the only functions I’ve used—I haven’t tried all the pathways that SciPy opens up. However, I think that I have a pretty fast running code; that the execution time wouldn’t improve to any significant degree by using some other combination of calls.

2.3 A notable curiosity:

I also tried, and succeeded to a great degree, in having an exactly identical code for all dimensions: 1D, 2D, 3D, and then even, in principle, ND. That is to say, no “if–else” statements that lead to different execution paths depending on the dimensionality.

If you understand what I just stated, then you sure would want to have a look at my code, because nothing similar exists anywhere on the ‘net (i.e., within the first 10 pages thrown up by Google during several differently phrased searches covering many different domains).

However, eventually, I abandoned this approach, because it made things too complicated, especially while dealing with computing the Coulomb fields. The part dealing with the discretized Laplacian was, in contrast, easier to implement, and it did begin working fully well, which was when I decided to abandon this entire approach. In case you know a bit about this territory: I had to liberally use numpy.newaxis.

Eventually, I came to abandon this insistence on having only a single set of code lines regardless of the dimensionality, because my programmer’s/engineer’s instincts cried against it. (Remember I don’t even like the slicing syntax of Python?) And so, I scrapped it. (But yes, I do have a copy, just in case someone wants to have a look.)

2.4 When to use the “for” loops and when to use slicing + vectorization: A good example:

I always try to lift code if a suitable one is available ready made. So, I did a lot of search for Python/MatLab code for such things.

As far as the FD implementations of the Laplacian go, IMO, the best piece of Python code I saw (for this kind of a project) was that by Prof. Christian Hill [^]. His code is available for free from the site for a book he wrote; see here [^] for an example involving the finite difference discretization of the Laplacian.

Yes, Prof. Hill has wisely chosen to use only the Python “for” loops when it comes to specifying the IC. Thus, he reserves the vectorization only for the time-stepping part of the code.

Of course, unlike Prof. Hill’s code (transient diffusion), my code involves only eigenvalue computations—no time-stepping. So, one would be even more amply justified in using only the “for” loops for building the Laplacian matrix. Yet, as I noted, I vectorized everything in my code, merely because I felt like doing so. It’s during vectorization that the problem of differing dimensionality came up, which I solved, and then abandoned.

2.5 Use of indexing matrices:

While writing my code, I figured out that a simple trick with using index matrices and arrays makes the vectorization part even more compact (and less susceptible to bugs). So, I implemented this approach—indexing matrices and arrays.

“Well, this is a very well known approach. What’s new?” you might ask. The new part is the use of matrices for indexing, not arrays. Very well known, sure. But very few people use it anyway.

Again, I was cautious. I wrote the code, saw it a couple of days later again, and made sure that using indices really made the code easier to understand—to me, of course. Only then I decided to retain it.

By using the indexing matrices, the code indeed becomes very clean-looking. It certainly looks far better (i.e. easier to grasp structure) than the first lines of code in Prof. Hill’s “do_timestep” function [^].

2.6 No code-drop:

During my numerous (if not exhaustive) searches, I found that no one posts a 3D code for quantum simulation that also uses finite differences (i.e. the simplest numerical technique).

Note, people do post codes for 3D, but these are only for more complicated approaches like: FDTD (finite difference time domain), FEM, (pseudo)spectral methods, etc. People also post code for FDM, when the domain is 1D. But none posts a code that is both FD and 2D/3D. People only post the maths for such a case. Some rare times, they also post the results of the simulations. But they don’t post the 3D FDM code. I don’t know the reason for this.

May be there is some money to be made if you keep some such tricks all to yourself?

Once this idea occurred to me, it was impossible for me not to take it seriously. … You know that I have been going jobless for two years by now. And, further, I did have to invest a definite amount of time and effort in getting those indexing matrices working right so that the vectorization part becomes intuitive.

So, I too have decided not to post my 3D code anywhere on the ‘net for free. Not immediately anyway. Let me think about it for a while before I go, post my code.


3. Covid in India:

The process of unlocking down has begun in India. However, the numbers simply aren’t right for any one to get relaxed (except for the entertainment sections of the Indian media like the Times of India, Yahoo!, etc.).

In India, we are nowhere near turning the corner. The data about India are such that even the time when the flattening might occur, is not just hard to predict, but with the current data, it is impossible.

Yes, I said impossible. I could forward reasoning grounded in sound logic and good mathematics (e.g., things like Shannon’s theorem, von Neumann’s errors analysis, etc.), if you want. But I think to any one who really knows a fair amount of maths, it’s not necessary. I think they will understand my point.

Let me repeat: The data about India are such that even the time when the flattening might occur, is not just hard to predict, but with the current data, it is impossible.

India’s data show a certain unique kind of a challenge for the data scientist—and it definitely calls for some serious apprehension by every one concerned. The data themselves are such that predictions have to be made very carefully.

If any one is telling you that India will cross (or has already crossed), say, more than 20 lakh cases, then know that he/she is not speaking from the data, the population size, the social structures, the particular diffusive dynamics of this country, etc. He/she is talking purely from imagination—or very poor maths.

Ditto, if someone tells you that there are going be so many cases in this city or that, by this date or that, if the date runs into, say, August.

Given the actual data, in India, projections about number of cases in the future are likely to remain very tentative (having very big error bands).

Of course,  you may still make some predictions, like those based on the doubling rate. You would be even justified in using this measure, but only for a very short time-span into the future. The reason is that India’s data carry these two peculiarities:

  1. The growth rate has been, on a large enough scale, quite steady for a relatively longer period of time. In India, there has been no exponential growth with a very large log-factor, not even initially (which I attribute to an early enough a lock-down). There also has been no flattening (for whatever reasons, but see the next one).
  2. The number of cases per million population still remains small.

Because of 1., the doubling rate can serve as a good short-term estimator when it comes to activities like large-scale resource planning (but it would be valid only for the short term). You will have to continuously monitor the data, and be willing to adjust your plans. Yet, the fact is also that the doubling rate has remained steady long enough that it can certainly be used for short-term planning (including by corporates).

However, because of 2., everyone will have to revise their estimates starting from the third week of June, when the effects of the un-locking down begin to become visible (not just in the hospitals or the quarantine centres, but also in terms of aggregated numbers).

Finally, realize that 1. matters only to the policy-makers (whether in government or in corporate sectors).

What matters to the general public at large is this one single question: Have we turned around the corner already? if not, when will we do that?

The short answers are: “No” and  “Can’t Tell As of Today.”

In India’s case the data themselves are such that no data scientist worth his salt would be able to predict the time of flattening with any good accuracy—as of today. Nothing clear has emerged, even after 2.5 months, in the data. Since this sentence is very likely to be misinterpreted, let me explain.

I am not underestimating the efforts of the Indian doctors, nurses, support staff, police, and even other government agencies. If they were not to be in this fight, the data would’ve been far simpler to analyse—and far more deadly.

Given India’s population size, its poverty, its meagre medical resources, the absence of civic discipline, the illiteracy (which makes using symbols for political parties indispensable at the time of elections)… Given all such factors, the very fact that India’s data even today (after 2.5 months) still manages to remain hard to analyse suggests, to my mind, this conclusion:

There has been a very hard tussle going on between man and the virus so that no definitive trend could emerge either way.

There weren’t enough resources so that flattening could occur by now. If you kept that expectation to begin with, you were ignoring reality. 

However, in India, the fight has been such that it must have been very tough on the virus too—else, the exponential function is too bad for us, and it is too easy for the virus.

The inability to project the date by which the flattening might be reached, must be seen in such a light.

The picture will become much clearer starting from two weeks in the future, because it would then begin reflecting the actual effects that the unlocking is producing right now.

So, if you are in India, take care even if the government has now allowed you to step out, go to office, and all that. But remember, you have to take even more care than you did during the lock-down, at least for the next one month or so, until the time that even if faint, some definitely discernible trends do begin to emerge, objectively speaking.

I sincerely hope that every one takes precautions so that we begin to see even just an approach towards the flattening. Realize, number of cases and number deaths increase until the flattening occurs. So, take extra care, now that the diffusivity of people has increased.

Good luck!


A song I like:

(Western, instrumental): Mozart, Piano concerto 21, k. 467, second movement (andante in F major).

Listen, e.g., at this [^] YouTube viedo.

[ I am not too much into Western classical, though I have listened to a fair deal of it. I would spend hours in UAB’s excellent music library listening to all sorts of songs, though mostly Western classical. I would also sometimes make on-the-fly requests to the classical music channel of UAB’s radio station (or was it a local radio station? I no longer remember). I didn’t always like what I listened to, but I continuing listening a lot anyway.

Then, as I grew older, I began discovering that, as far as the Western classical music goes, very often, I actually don’t much appreciate even some pieces that are otherwise very highly regarded by others. Even with a great like Mozart, there often are places where I can’t continue to remain in the flow of the music. Unknowingly, I come out of the music, and begin wondering: Here, in this piece, was the composer overtaken by a concern to show off his technical virtuosity rather than being absorbed in the music? He does seem to have a very neat tune somewhere in the neighbourhood of what he is doing here. Why doesn’t he stop tinkling the piano or stretching the violin, stop, think, and resume? I mean, he was composing music, not just blogging, wasn’t he?

The greater the composer or the tune suggested by the piece, the greater is this kind of a disappointment on my part.

Then, at other times, these Western classical folks do the equivalent of analysis-paralysis. They get stuck into the same thing for seemingly forever. If composing music is difficult, composing good music in the Western classical style is, IMHO, exponentially more difficult. That’s the reason why despite showing a definite “cultured-ness,” purely numbers-wise, most Western classical music tends to be boring. … Most Indian classical music also tends to be very boring. But I will cover it on some other day. Actually, one day won’t be enough. But still, this post is already too big…

Coming to the Western classical, Mozart, and the song selected for this time: I think that if Mozart were to do something different with his piano concerto no. 20 (k. 466), then I might have actually liked it as much as k. 467, perhaps even better. (For a good YouTube video on k. 466, see here [^].)

But as things stand, it’s k. 467. It is one of the rarest Western (or Eastern) classical pieces that can auto ride on my mind at some unpredictable moments; also one of the rare pieces that never disappoint me when I play it. Maybe that’s because I don’t play it unless I am in the right mood. A mood that’s not at all bright; a mood that suggests as if someone were plaintively raising the question “why? But why?”. (Perhaps even: “But why me?”) It’s a question not asked to any one in particular. It’s a question raised in the midst of continuing to bear either some tragic something, or, may be, a question raised while in the midst of having to suffer the consequences of someone else’s stupidity or so. … In fact, it’s not even a question explicitly raised. It’s to do with some feeling which comes before you even become aware of it, let alone translate it into a verbal question.  I don’t know, the mood is something like that. … I don’t get in that kind of a mood very often. But sometimes, this kind of a mood is impossible to avoid. And then, if the outward expression of such a mood also is this great, sometimes, you even feel like listening to it… The thing here is, any ordinary composer can evoke pathos. But what Mozart does is in an entirely different class. He captures the process of forming that question clearly, alright. But he captures the whole process in such a subdued manner. Extraordinary clarity, and extraordinary subdued way of expressing it. That’s what appeals to me in this piece… How do I put it?… It’s the epistemological clarity or something like that—I don’t know. Whatever be the reason, I simply love this piece. Even if I play it only infrequently.

Coming back to the dynamic k. 466 vs. the quiet, sombre, even plaintive k. 467, I think, the makers of the “Elvira Madigan” movie were smart; they correctly picked up the k. 467, and only the second movement, not others. It’s the second movement that’s musically extraordinary. My opinion, anyway…

Bye for now.

]