Status update on my trials for the MNIST dataset

This post is going to be brief, relatively speaking.

1. My further trials for the MNIST dataset :

You know by now, from my last post about a month ago [^], that I had achieved a World Rank # 5 on the MNIST dataset (with 99.78 % accuracy), and that too, using a relatively slow machine (single CPU-only laptop).

At that time, as mentioned in that post, I also had some high hopes of bettering the result (with definite pointers as to why the results should get better).

Since then, I’ve conducted a lot of trials. Both the machine and I learnt a lot. However, during this second bout of trials, I came to learn much, much more than the machine did!

But natural! With a great result already behind me, my focus during the last month naturally shifted to better understanding the why’s and the how’s of it, rather than sheer chasing a further improvement in accuracy, by hook or crook.

So, I deliberately reduced my computational budget from 30+ hours per trial to 12 hours at the most. [Note again, my CPU-only hardware runs about 4–8 times slower, perhaps even 10+ times slower, as compared to the GPU-carrying machines.]

Within this reduced computing budget, I pursued a lot many different ideas and architectures, with some elements being well known to people already, and some elements having been newly invented by me.

The ideas I combined include: batch normalization, learnable pooling layers, models built with functional API (permitting complex entwinements of data streams, not just sequential), custom-written layers (not much, but I did try a bit), custom-written learning-rate scheduler, custom-written call-back functions to monitor progress at batch- and epoch-ends, custom-written class for faster loading of augmented data (TF-Keras itself spins out a special thread for this purpose), apart from custom logging, custom-written early stopping, custom-written ensembles, etc. … (If you are a programmer, you would respect me!)

But what was going to be the result? Sometimes there was some faint hope for some improvement in accuracy; most times, a relatively low accuracy was anyway expected, and that’s what I saw. (No surprises there—the computing budget, to begin with, had to kept small.)

Sometime during these investigations into the architectures and algorithms, I did initiate a few long trials (in the range of some 30–40 hours of training time). I took only one of these trials to completion. (I interrupted and ended all the other long trials more or less arbitrarily, after running them for, may be, 2 to 8 hours. A few of them seemed leering towards eventual over-fitting; for others, I simply would lose the patience!)

During the one long trial which I did run to completion, I did achieve a slight improvement in the accuracy. I did go up to 99.79 % accuracy.

However, I cannot be very highly confident about the consistency of this result. The algorithms are statistical in nature, and a slight degradation (or even a slight improvement) from 99.79 % is what is to be expected.

I do have all the data related to this 99.79 % accuracy result saved with me. (I have saved not just the models, the code, and the outputs, but also all the intermediate data, including the output produced on the terminal by the TensorFlow-Keras library during the training phase. The outputs of the custom-written log also have been saved diligently. And of course, I was careful enough to seed at least the most important three random generators—one each in TF, numpy and basic python.)

Coming back to the statistical nature of training, please note that my new approach does tend to yield statistically much more robust results (with much less statistical fluctuations, as is evident from the logs and all, and as also is only to be expected from the “theory” behind my new approach(es)).

But still, since I was able to conduct only one full trial with this highest-accuracy architecture, I hesitate to make any statement that might be mis-interpreted. That’s why, I have decided not to claim the 99.79 % result. I will mention the achievement in informal communications, even on blog posts (the way I am doing right now). But I am not going to put it on my resume. (One should promise less, and then deliver more. That’s what I believe in.)

If I were to claim this result, it would also improve my World Rank by one. Thus, I would then get to World Rank # 4.

Still, I am leaving the actual making of this claim to some other day. Checking the repeatability will take too much time with my too-slow-for-the-purposes machine, and I need to focus on other areas of data science too. I find that I have begun falling behind on them. (With a powerful GPU-based machine, both would have been possible—MNIST and the rest of data science. But for now, I have to prioritize.)


Since my last post, I learnt a lot about image recognition, classification, deep learning, and all. I also coded some of the most advanced ideas in deep learning for image processing that can at all be implemented with today’s best technology—and then, a few more of my own. Informally, I can now say that now I am at World Rank # 4. However, for the reasons given above, I am not going to make the claim for the improvement, as of today. So, my official rank on the MNIST dataset remains at 5.


2. Miscellaneous:

I have closed this entire enterprise of the MNIST trials for now. With my machine, I am happy to settle at the World Rank # 5 (as claimed, and # 4, informally and actually).

I might now explore deep learning for radiology (e.g. detection of abnormalities in chest X-rays or cancers), for just a bit.

However, it seems that I have been getting stuck into this local minimum of image recognition for too long. (In the absence of a gainful employment in this area, despite my world-class result, it still is just a local minimum.) So, to correct my overall tilt in the pursuit of the topics, for the time being, I am going to keep image processing relatively on the back-burner.

I have already started exploring time-series analysis for stock-markets. I would also be looking into deep learning from text data, esp. NLP. I have not thought a lot about it, and now I need to effect the correction.

… Should be back after a few weeks.

In the meanwhile, if you are willing to pay for my stock-market tips, I would sure hasten designing and perfecting my algorithms for the stock-market “prediction”s. … It just so happens that I had predicted yesterday, (Sunday 10th May) that the Bombay Stock Exchange’s Sensex indicator would definitely not rise today (on Monday), and that while the market should trade at around the same range as it did on Friday, the Sensex was likely to close a bit lower. This has come true. (It in fact closed just a bit higher than what I had predicted.)

… Of course, “one swallow does not a summer make.”… [Just checked it. Turns out that this one has come from Aristotle [^]. I don’t know why, but I had always carried the impression that the source was Shakespeare, or may be some renaissance English author. Apparently, not so.]

Still, don’t forget: If you have the money, I do have the inclination. And, the time. And, my data science skills. And, my algorithms. And…

…Anyway, take care and bye for now.


A song I like:

(Hindi) तुम से ओ हसीना कभी मुहब्बत मुझे ना करनी थी… (“tum se o haseenaa kabhee muhabbat naa” )
Music: Laxmikant Pyarelal
Singers: Mohammad Rafi, Suman Kalyanpur
Lyrics: Anand Bakshi

[I know, I know… This one almost never makes it to the anyone’s lists. If I were not to hear it (and also love it) in my childhood, it wouldn’t make to my lists either. But given this chronological prior, the logical prior too has changed forever for me. …

This song is big on rhythm (though they overdo it a bit), and all kids always like songs that emphasize rhythms. … I have seen third-class songs from aspiring/actual pot-boilers, songs like तु चीझ बडी है मस्त मस्त (“too cheez baDi hai mast, mast”), being a huge hit with kids. Not just a big hit but a huge one. And that third-rate song was a huge hit even with one of my own nephews, when he was 3–4 years old. …Yes,  eventually, he did grow up…

But then, there are some song that you somehow never grow out of. For me, this song is one of them. (It might be a good idea to start running “second-class,” why, even “third-class” songs, about which I am a bit nostalgic. I listened to a lot of them during this boring lock-down, and even more boring, during all those long-running trials for the MNIST dataset. The boredom, especially on the second count, had to be killed. I did. … So, all in all, from my side, I am ready!)

Anyway, once I grew up, there were a couple of surprises regarding the credits of this song. I used to think, by default as it were, that it was Lata. No, it turned out to be Suman Kalyanpur. (Another thing. That there was no mandatory “kar” after the “pur” also was a surprise for me, but then, I digress here.)

Also, for no particular reason, I didn’t know about the music director. Listening to it now, after quite a while, I tried to take a guess. After weighing in between Shankar-Jaikishan and R.D. Burman, also with a faint consideration of Kalyanji-Anandji, I started suspecting RD. …Think about it. This song could easily go well with those from the likes of तीसरी मंझील (“Teesri Manjhil”) or काँरवा (“karvaan”) right?

But as a self-declared RD expert, I also couldn’t shuffle my memory and recall some incidence in which I could be found boasting to someone in the COEP/IITM hostels that this one indeed was an RD song. … So, it shouldn’t be RD either. … Could it be Shankar-Jaikishan then?  The song seemed to fit in with the late 60s SJ mold too. (Recall Shammi.) … Finally, I gave up, and checked out the music director.

Well, Laxmikant Pyarelal sure was a surprise to me! And it should be, to any one. Remember, this song is from 1967. This was the time that LP were coming out with songs like those in दोस्ती (“Dosti”),  मिलन (“Milan”), उपकार (“Upakar”), and similar. Compositions grounded in the Indian musical sense, through and through.

Well yes, LP have given a lot songs/tunes that go more in the Western-like fold. (Recall रोज शाम आती थी (“roz shyaam aatee thee”), for instance.) Still, this song is a bit too “out of the box” for LP when you consider their typical “box”. The orchestration, in particular, also at times feels as if the SD-RD “gang” like Manohari Singh, Sapan Chakravorty, et al. wasn’t behind it, and so does the rendering by Rafi (espcially with that sharp हा! (“hah!”) coming at the end of the refrain)… Oh well.

Anyway, give a listen and see how you find it.]




Yeah! Just that!


Update on 2020.02.17 16:02 IST:

The above is a snap I took yesterday at the Bhau Institute [^]’s event: “Pune Startup Fest” [^].

The reason I found myself laughing out loud was this: Yesterday, some of the distinguished panelists made one thing very clear: The valuation for the same product is greater in the S.F. Bay Area than in Pune, because the eco-system there is much more mature, with the investors there having seen many more exits—whether successful or otherwise.


When I was in the USA (which was in the 1990s), they would always say that not every one has to rush there to the USA, especially to the S.F. Bay Area, because technology works the same way everywhere, and hence, people should rather be going back to India. The “they” of course included the Indians already established there.

In short, their never-stated argument was this much: You can make as much money by working from India as from the SF Bay Area. (Examples of the “big three” of Indian IT Industry would often be cited, esp. of Narayana Moorthy’s.) So, “why flock in here”?

Looks like, even if they took some 2–3 decades to do so, finally, something better seems to have downed on them. They seem to have gotten to the truth, which is: Market valuations for the same product are much greater in the SF Bay Area than elsewhere!

So, this all was in the background, in the context.

Then, I was musing about their rate of learning last night, and that’s when I wrote this post! Hence the title.

But of course, not every thing was laughable about, or in, the event.

I particularly liked Vatsal Kanakiya’s enthusiasm (the second guy from the right in the above photo, his LinkedIn profile is here [^]). I appreciated his ability to keep on highlighting what they (their firm) are doing, despite a somewhat cocky (if not outright dismissive) way in which his points were being seen, at least initially. Students attending the event might have found his enthusiasm more in line with theirs, especially after he not only mentioned Guy Kawasaki’s 10-20-30 rule [^], but also cited a statistics from their own office to support it: 1892 proposals last month (if I got that figure right). … Even if he was very young, it was this point which finally made it impossible, for many in that hall, to be too dismissive of him. (BTW, he is from Mumbai, not Pune. (Yes, COEP is in Pune.))


A song I like:

(Hindi) ये मेरे अंधेरे उजाले ना होते (“ye mere andhere ujaale naa hote”)
Music: Salil Chowdhury
Singers: Talat Mahmood, Lata Mangeshkar
Lyrics: Rajinder Kishen

[Buildings made from the granite stone [I studied geology in my SE i.e. second year of engineering] have a way of reminding you of a few songs. Drama! Contrast!! Life!!! Money!!!! Success!!!!! Competition Success Review!!!!!!  Governments!!!!!!! *Business*men!!!!!!!!]



Equations in the matrix form for implementing simple artificial neural networks

(Marathi) हुश्श्… [translit.: “hushsh…”, equivalent word prevalent among the English-speaking peoples: “phewww…”]

I’ve completed the first cut in writing a document of the same title as that of this post. I wrote it in LaTeX. (Too many equations!)

I’ve just uploaded the PDF file at my GitHub account, here [^]. Remember, it’s still only in the alpha stage. (A beta release will follow after a few days. The final release may take place after a couple of weeks or so.)

Below the fold, I copy-paste the abstract and the preface of this document.

“Equations in the matrix form for implementing simple artificial neural networks”


This document presents the basic equations in reference to which artificial neural networks are designed and implemented. The scope is restricted to
the simpler feed-forward networks, including those having hidden layers. Convolutional and recurrent networks are out of the scope.

Equations are often initially noted using an index-based notation for the typical element. However, all the equations are eventually cast in the direct
matrix form, using a consistent set of notation. Some of the minor aspects of notation were invented to make the presentation as simple and direct as

The presentation here regards a layer as the basic unit. The term “layer” is understood in the same sense in which APIs of modern libraries like
TensorFlow-Keras 2.x take it. The presentation here is detailed enough that neural networks with hidden layers could be implemented, starting from
the scratch.


Raison d’être:

I wrote this document mainly for myself, to straighten out the different notations and formulae used in different sources and contexts.

In particular, I wanted to have a document that better matches the design themes used in today’s libraries (like TensorFlow-Keras 2.x) than the description in the text-books.

For instance, in many sources, the input layer is presented as consisting of both a fully connected layer and its corresponding activation layer. However, for flexibility, libraries like TF-Keras 2.x treat them as separate layers.

Also, some sources uniformly treat the input of any layer as \vec{X}, and output of any layer as activation, \vec{a} , but such usage overloads the term “activation”. Confusions also creep in because different conventions exist: treating the bias by expanding the input vector with 1 and the weights matrix with w_0 ; the “to–from” vs “from–to” convention for the weights matrix, etc.

I wanted to have a consistent notation that dealt with all such issues with a uniform, matrix-based notation that came as close to the numpy ndarray interface as possible.

Level of coverage:

The scope here is restricted to the simplest ANNs, including the simplest DL networks. Convolutional neural networks and recurrent neural networks are out of the scope.

Yet, this document wouldn’t make for a good tutorial for a complete beginner; it is likely to confuse him more than explaining anything to him. So, if you are completely new to ANNs, it is advisable to go through sources like Nielsen’s online book [^] to learn the theory of ANNs. Mazur’s fully worked out example of the back-propagation algorithm [^] should also prove to be very helpful,  before returning back to this document.

If you already know ANNs, and don’t want to see equations in the fully expanded forms—or, plain dislike the notation used here—then a good reference, roughly at the same level as this document, is the set of write-ups/notes by Mallya [^].


Any feedback, especially that regarding errors, typos, inconsistencies in notation, suggestions for improvements, etc., will be thankfully received.

How to cite this document:

TBD at the time of the final release version.

Further personal notings:

I began writing this document on 24 January 2020. By 30 January 2020, I had some 11 pages done up, which I released via the last post.

Unfortunately, it was too tentative, with lot of errors, misleading or inconsistent notation, etc. So, I deleted it immediately within a day. No point in having premature documents floating around in the cyberspace.

I had mentioned, right in the last post here on this blog (on 30 January 2020), that the post itself also would be gone. I will keep it for a while, and then, may be after a week or two, delete it.

Anyway, by the time I finished the alpha version today, the document had grown from the initial 11 pages to some 38 pages!

Typing out all the braces, square brackets, parentheses, subscripts for indices, subscripts for sizes of vectors and matrices… It all was tedious. … Somehow, I managed to finish it. (Will think twice before undertaking a similar project, but am already tempted to write a document each on CNNs and RNNs, too!)

Anyway, let me take a break for a while.

If interested in ANNs, please go through the document and let me have your feedback. Thanks in advance, take care, and bye for now.

A song I like:

[Just listen to Lata here! … Not that others don’t get up to the best possible levels, but still, Lata here is, to put it simply, heavenly! [BTW, the song is from 1953.]]

(Hindi) जाने न नजर पहचाने जिगर (“jaane naa najar pahechane jigar”)
Singers: Lata and Mukesh
Music: Shankar-Jaikishen
Lyrics: Hasrat Jaipuri