# Python scripts for simulating QM, part 1: Time evolution of a particle in the infinite potential box

A Special note for the Potential Employers from the Data Science field:

Recently, in April 2020, I achieved a World Rank # 5 on the MNIST problem. The initial announcement can be found here [^], and a further status update, here [^].

All my data science-related posts can always be found here [^].

What’s been happening?

OK, with that special note done, let me now turn my attention to the two of you who regularly read this blog.

… After the MNIST classification problem, I turned my attention to using LSTM’s for time-series predictions, simply because I hadn’t tried much on this topic any time earlier. … So, I implemented a few models. I even seemed to get good accuracy levels.

However, after having continuously worked on Data Science straight for about 2.5 months, I began feeling like taking a little break from it.

I had grown tired a bit though I had not realized it while actually going through all those tedious trials. (The fact of my not earning any money might have added to some draining of the energy too.) In fact, I didn’t have the energy to pick up something on the reading side either.

Basically, after a lot of intense effort, I wanted something that was light but engaging.

In the end, I decided to look into Python code for QM.

Earlier, in late 2018, I had written a few scripts on QM. I had also blogged about it; see the “part 0” of this series [^]. (Somehow, it has received unusually greater number of hits after I announced my MNIST result.) However, after a gap of 1.5 years, I could not easily locate those scripts. … So, like any self-respecting programmer, I decided to code them again!

Below is the first result, a movie. … Though a movie, it should be boring to any one who is not interested in QM.

Movie of the wavefunction of an electron inside an infinite potential box:

An electron inside a 3 cm long 1D box with infinite potentials on the sides. Time evolution of the second excited state (n = 3). In the standard QM, such states are supposed to be“stationary”.

The Python code: Main file:

Now, the code. It should be boring to any one who is not a Python programmer.

"""
01.AJ_PIB_1D_Class.py

Particle in a Box.

Solves the Time-Independent Schrodinger Equation in 1D,
using the Finite Difference Method. Eigenvalues are found using
the direct matrix method.

Also shows the changes in the total wavefunction with time.
[The stationarity in the TISE is not static. In the mainstream
QM, the stationarity is kinematical. In my new approach, it
has been proposed to be kinetic. However, this simulation concerns
itself only with the standard, mainstream, QM.]

Environment: Developed and tested using:
Python 3.7.6 64-bit. All packages as in Anaconda 4.8.3
on Ubuntu-MATE 20.04 (focal fossa) LTS of the date (below).

TBD: It would be nice to use sparse matrices. Also, use eigenvalue
functions from scipy (instead of those from numpy).

History:
This file begun: Friday 2020 May 22 19:50:26 IST
This version: Saturday 2020 May 23 16:07:52 IST
"""

import numpy as np
from scipy.integrate import simps
import matplotlib.pyplot as plt
from matplotlib.animation import ImageMagickFileWriter

# SEE THE ACCOMPANYING FILE. THE NUMERICAL VALUES OF CONSTANTS
# ARE DEFINED IN IT.
from FundaConstants import h, hbar, me, mP, eV2J

################################################################################
# THE MAIN CLASS

class AJ_PIB_1D( object ):

def __init__( self, nInteriorNodes, dMass, dh ):
self.nInteriorNodes = nInteriorNodes
self.nDomainNodes = nInteriorNodes + 2
self.dMass = dMass # Mass associated with the QM particle
self.dh = dh # cell-size ( \Delta x ).

# The following numpy ndarray's get allocated
# during computations.
self.aaT = None
self.aaV = None
self.aaH = None

self.aaPsi = None
self.aE_n = None

self.A_ana = None
self.ak_ana = None
self.aE_ana = None
self.aPsi_ana = None
return

# Creates the kinetic energy matrix for the interior points of the
# domain.
def ComputeKE( self ):
self.aaT = np.zeros( (self.nInteriorNodes, self.nInteriorNodes) )
for i in range( self.nInteriorNodes ):
self.aaT[ i ][ i ] = -2.0
for i in range( self.nInteriorNodes-1 ):
self.aaT[ i ][ i+1 ] = 1.0
for i in range( 1, self.nInteriorNodes ):
self.aaT[ i ][ i-1 ] = 1.0
dFactorKE = - hbar**2 / ( 2.0 * self.dMass * self.dh**2 )
self.aaT *= dFactorKE
return

# Creates the potential energy matrix for the interior points of the
# domain. You can supply an arbitrary potential function via the array
# aV of size = interior points count, and values in joule.
def ComputePE( self, aV= None ):
self.aaV = np.zeros( (self.nInteriorNodes, self.nInteriorNodes) )
if None != aV:
for i in range( self.nInteriorNodes ):
self.aaV[ i ][ i ] = aV[ i ]
return

def ComputeHamiltonian( self, aV= None ):
self.ComputeKE()
self.ComputePE( aV )
self.aaH = self.aaT + self.aaV
return

# Note, the argument aX has the size = the count of domain points, not
# the count of interior points.
# QM operators are Hermitian. We exploit this fact by using the
# numpy.linalg.eigh function here. It is faster than numpy.linalg.eig,
# and, unlike the latter, also returns results sorted in the ascending
# order.
# HOWEVER, NOTE, the eigenvectors returned can have signs opposite
# of what the analytial solution gives. The eigh (or eig)-returned
# vectors still *are* *eigen* vectors. However, for easier comparison
# with the analytical solution, we here provide a quick fix.
# See below in this function.
def ComputeNormalizedStationaryStates( self, aX, bQuickFixForSigns= False ):
assert( self.nDomainNodes == len( aX ) )

# Use the LAPACK library to compute the eigenvalues
aEigVals, aaEigVecs = np.linalg.eigh( self.aaH )

# SQUARE-NORMALIZE THE EIGENVECTORS

# Note:
# The eigenvectors were found on the interior part of the domain,
# i.e., after dropping the boundary points at extreme ends. But the
# wavefunctions are defined over the entire domain (with the
# Dirichlet condition of 0.0 specified at the boundary points).

nCntVecs = aaEigVecs.shape[ 1 ]
assert( nCntVecs == self.nInteriorNodes )

# eigh returns vectors in *columns*. We prefer to store the
# normalized vectors in *rows*.
aaPsi = np.zeros( (self.nInteriorNodes, self.nDomainNodes) )
for c in range( nCntVecs ):
aPsi = aaEigVecs[ :, c ]
# Find the area under the prob. curve
aPsiSq = aPsi * aPsi
dArea = simps( aPsiSq, aX[ 1 : self.nDomainNodes-1 ] )
# Use it to normalize the wavefunction
aPsi /= np.sqrt( dArea )
# The analytical solution always has the curve going up
# (with a +ve gradient) at the left end of the domain.
# We exploit this fact to have a quick fix for the signs.
if bQuickFixForSigns is True:
d0 = aPsi[ 0 ]
d1 = aPsi[ 1 ]
if d1 < d0:
aPsi *= -1
aaPsi[ c, 1 : self.nDomainNodes-1 ] = aPsi
self.aaPsi = aaPsi
self.aE_n = aEigVals
return

# Standard analytical solution. See, e.g., the Wiki article:
# "Particle in a box"
def ComputeAnalyticalSolutions( self, aX ):

xl = aX[ 0 ]
xr = aX[ self.nDomainNodes-1 ]
L = xr - xl
A = np.sqrt( 2.0 / L )
self.A_ana = A

# There are as many eigenvalues as there are interior points
self.ak_ana = np.zeros( self.nInteriorNodes )
self.aE_ana = np.zeros( self.nInteriorNodes )
self.aPsi_ana = np.zeros( (self.nInteriorNodes, self.nDomainNodes) )
for n in range( self.nInteriorNodes ):
# The wavevector. (Notice the absence of the factor of '2'.
# Realize, the 'L' here refers to half of the wavelength of
# the two travelling waves which make up the standing wave.
# That's why.)
k_n = n * np.pi / L
# The energy.
E_n = n**2 * h**2 / (8.0 * self.dMass * L**2)

# A simplest coordinate transformation:
# For x in [0,L], phase angle is given as
# Phase angle = n \pi x / L = k_n x.
# We have arbitrary coordinates for the left- and
# right-boundary point. So,
# Phase angle = k_n (x - xl)
ap = k_n * (aX - xl)

aPsiAna = A * np.sin( ap )
self.ak_ana[ n ] = k_n
self.aE_ana[ n ] = E_n
# We prefer to store the normalized wavefunction
# in rows. (Contrast: linalg.eigh and eig store the
# eigen vectors in columns.)
self.aPsi_ana[ n, : ] = aPsiAna
return

# This function gets the value that is the numerical equivalent to the
# max wave amplitude 'A', i.e., sqrt(2/L) in the analytical solution.
def GetMaxAmplNum( self ):
dMax = np.max( np.abs(self.aaPsi) )
return dMax

################################################################################
# Utility functions

# NOTE: SAVING MOVIES CAN TAKE A LOT MORE TIME (7--10 MINUTES).
def Plot( model, n, nTimeSteps, bSaveMovie= False, sMovieFileName= None ):
# The class computes and stores only the space-dependent part.
aPsiSpace = model.aaPsi[ n-1 ]

# Period = 2 \pi / \omega = 1 / \nu
# Since E = h \nu, \nu = E/h, and so, Period = h/E
nu = model.aE_n[ n-1 ] / h
dPeriod = 1.0 / nu

dt = dPeriod / (nTimeSteps-1)

# Plotting...

plt.style.use( 'ggplot' )
# Plot size is 9 inches X 6 inches. Reduce if you have smaller
# screen size.
fig = plt.figure( figsize=(9,6) )

if bSaveMovie is True:
movieWriter = ImageMagickFileWriter()
movieWriter.setup( fig, sMovieFileName )

dMaxAmpl = model.GetMaxAmplNum() # Required for setting the plot limits.
dTime = 0.0 # How much time has elapsed in the model?
for t in range( nTimeSteps ):
# TIME-DEPENDENT PART:
# \psi_t = e^{-i E_n/\hbar t} = e^{-i \omega_n t} = e^{-i 2 \pi nu t}
# Compute the phase factor (which appears in the exponent).
dTheta = 2.0 * np.pi * nu * dTime
# The Euler identity. Compute the *complete* wavefunction (space and time)
# at this instant.
aPsi_R_t = aPsiSpace * np.cos( dTheta )
aPsi_I_t = - aPsiSpace * np.sin( dTheta )

plt.clf()
sTitle = "Particle in an infinite-potential box (n = %d)\n" % (n)
sTitle += "Domain size: %7.4lf m. Oscillation period: %7.4lf s.\n" % (L, dPeriod)
sTitle += "Time step: %3d/%3d. Time elapsed in simulation: %7.4lf s." % (t+1, nTimeSteps, dTime)
plt.title( sTitle )

plt.xlabel( "Distance, m" )
plt.ylabel( "Wavefunction amplitude, $m^{-1/2}$" )

plt.grid( True )

plt.xlim( (xl - L/10), (xr + L/10) )
plt.ylim( -1.1*dMaxAmpl, 1.1*dMaxAmpl )

plt.plot( aX, aPsi_R_t , color= 'darkcyan', label= r'Re($\Psi$)' )
plt.plot( aX, aPsi_I_t , color= 'purple', label= r'Im($\Psi$)' )

plt.legend( loc= 'upper right', shadow= True, fontsize= 'small' )

if bSaveMovie is True:
movieWriter.grab_frame()
else:
plt.pause( 0.001 )

dTime += dt

if bSaveMovie is True:
movieWriter.finish()
else:
plt.show()

################################################################################
# MAIN DRIVER CODE
# We use the SI system throughout. [This is a program. It runs on a computer.]

# DOMAIN GEOMETRY

xl = -1.0e-02 # Left end (min. x)
xr = 2.0e-02 # Right end (max. x)
L = xr - xl # Length of the domain
xc = (xl + xr )/ 2.0 # Center point

# MESH
# Count of cells = Count of nodes in the domain - 1.
# It's best to take an odd number for the count of domain nodes. This way,
# the peak(s) of the wavefunction will not be missed.
nDomainNodes = 101
aX, dh = np.linspace(   start= xl, stop= xr, num= nDomainNodes,
endpoint= True, retstep= True, dtype= np.float )

# In the PIB model, infinite potential exists at either ends. So, we apply
# the Dirichlet BC of \Psi(x,t) = 0 at all times. Even if the discretized
# Laplacian were to be computed for the entire domain, in handling the
# homogeneous BC, both the boundary-points would get dropped during the
# matrix partitioning. Similarly, V(x) would be infinite there. That's why,
# we allocate the Laplacian and Potential Energy matrices only for the
# interior points.
nInteriorNodes = nDomainNodes - 2

# We model the electron here.
# Constants are defined in a separate file: 'FundaConstants.py'
# Suggestion: Try mP for the proton, and check the \Psi amplitudes.
dMass = me

# Instantiate the main model class.
model = AJ_PIB_1D( nInteriorNodes, dMass, dh )

# Compute the system Hamiltonian.

# If you want, supply a custom-made potential function as an ndarray of
# size nInteriorNodes, as an argument. Values should be in joules.
# 'None' means 0 joules everywhere inside the box.
model.ComputeHamiltonian( None )

# Compute the stationary states. For the second argument, see the
# note in the function implementation.
model.ComputeNormalizedStationaryStates( aX, True )

# You can also have the analytical solution computed. Uncomment the
# line below. The numerical and analytical solutions are kept in
# completely different arrays inside the class. However, the plotting
# code has to be careful.
### model.ComputeAnalyticalSolutions( aX )

# PLOT THE STATIONARY STATES, AND SHOW THEIR OSCILLATIONS WITH TIME.
# (TISE *is* dynamic; the stationarity is dynamical.)

# Note, here, we choose n to be a 1-based index, as is the practice
# in physics. Thus, the ground state is given by n = 1, and not n = 0.
# However, NOTE, the computed arrays of wavefunctions have 0-based
# indices. If such dual-usage for the indices gets confusing, simple!
# Just change the code!

n = 3
# No. of frames to be plotted for a single period of oscillations
# The 0-th and the (nTimeSteps-1)th state is identical because the
# Hamiltonian here is time-independent.
nTimeSteps = 200

# You can save a movie, but note, animated GIFs take a lot more time, even
# ~10 minutes or more, depending on the screen-size and dpi.
# Note, ImageMagickFileWriter will write the temp .png files in the current
# directory (i.e. the same directory where this Python file resides).
# In case the program crashes (or you stop the program before it finishes),
# you will have to manually delete the temporary .png files from the
# program directory! (Even if you specify a separate directory for the
# movie, the *temporary* files still get generated in the program directory.)
### Plot( model, n, nTimeSteps, True, './AJ_PIB_e_%d.gif' % (n) )

Plot( model, n, nTimeSteps, False, None )

################################################################################
# EOF
################################################################################



The ancillary file:

The main file imports the following file. It has nothing but the values of fundamental physical constants noted in it (together with the sources). Here it is:

"""
FundaConstants.py

Begun: Thursday 2020 May 21 20:55:37 IST
This version: Saturday 2020 May 23 20:39:22 IST
"""

import numpy as np

"""
Planck's constant
https://en.wikipedia.org/wiki/Planck_constant
The Planck constant is defined to have the exact value h = 6.62607015×10−34 J⋅s in SI units.''
"""
h = 6.62607015e-34 # J⋅s. Exact value.
hbar = h / (2.0 * np.pi) # J⋅s.

"""
Electron rest mass
https://en.wikipedia.org/wiki/Electron_rest_mass
9.1093837015(28)×10−31'' 2018 CODATA value. NIST
"""
me = 9.1093837015e-31 # kg.

"""
Proton rest mass
https://en.wikipedia.org/wiki/Proton
1.67262192369(51)×10−27 kg
"""
mP = 1.67262192369e-27 # kg

"""
eV to Joule
https://en.wikipedia.org/wiki/Electronvolt
1 eV = 1.602176634×10−19 J
"""
eV2J = 1.602176634e-19 # Conversion factor



And, that’s about it, folks! No documentation. But I have added a lot of (otherwise unnecessary) comments.

Take care, and bye for now.

A song I like:

Music: Chinar Mahesh
Lyrics: Chandrashekhar Sanekar
Singer: Swapnil Bandodkar

# Status update on my trials for the MNIST dataset

This post is going to be brief, relatively speaking.

1. My further trials for the MNIST dataset :

You know by now, from my last post about a month ago [^], that I had achieved a World Rank # 5 on the MNIST dataset (with 99.78 % accuracy), and that too, using a relatively slow machine (single CPU-only laptop).

At that time, as mentioned in that post, I also had some high hopes of bettering the result (with definite pointers as to why the results should get better).

Since then, I’ve conducted a lot of trials. Both the machine and I learnt a lot. However, during this second bout of trials, I came to learn much, much more than the machine did!

But natural! With a great result already behind me, my focus during the last month naturally shifted to better understanding the why’s and the how’s of it, rather than sheer chasing a further improvement in accuracy, by hook or crook.

So, I deliberately reduced my computational budget from 30+ hours per trial to 12 hours at the most. [Note again, my CPU-only hardware runs about 4–8 times slower, perhaps even 10+ times slower, as compared to the GPU-carrying machines.]

Within this reduced computing budget, I pursued a lot many different ideas and architectures, with some elements being well known to people already, and some elements having been newly invented by me.

The ideas I combined include: batch normalization, learnable pooling layers, models built with functional API (permitting complex entwinements of data streams, not just sequential), custom-written layers (not much, but I did try a bit), custom-written learning-rate scheduler, custom-written call-back functions to monitor progress at batch- and epoch-ends, custom-written class for faster loading of augmented data (TF-Keras itself spins out a special thread for this purpose), apart from custom logging, custom-written early stopping, custom-written ensembles, etc. … (If you are a programmer, you would respect me!)

But what was going to be the result? Sometimes there was some faint hope for some improvement in accuracy; most times, a relatively low accuracy was anyway expected, and that’s what I saw. (No surprises there—the computing budget, to begin with, had to kept small.)

Sometime during these investigations into the architectures and algorithms, I did initiate a few long trials (in the range of some 30–40 hours of training time). I took only one of these trials to completion. (I interrupted and ended all the other long trials more or less arbitrarily, after running them for, may be, 2 to 8 hours. A few of them seemed leering towards eventual over-fitting; for others, I simply would lose the patience!)

During the one long trial which I did run to completion, I did achieve a slight improvement in the accuracy. I did go up to 99.79 % accuracy.

However, I cannot be very highly confident about the consistency of this result. The algorithms are statistical in nature, and a slight degradation (or even a slight improvement) from 99.79 % is what is to be expected.

I do have all the data related to this 99.79 % accuracy result saved with me. (I have saved not just the models, the code, and the outputs, but also all the intermediate data, including the output produced on the terminal by the TensorFlow-Keras library during the training phase. The outputs of the custom-written log also have been saved diligently. And of course, I was careful enough to seed at least the most important three random generators—one each in TF, numpy and basic python.)

Coming back to the statistical nature of training, please note that my new approach does tend to yield statistically much more robust results (with much less statistical fluctuations, as is evident from the logs and all, and as also is only to be expected from the “theory” behind my new approach(es)).

But still, since I was able to conduct only one full trial with this highest-accuracy architecture, I hesitate to make any statement that might be mis-interpreted. That’s why, I have decided not to claim the 99.79 % result. I will mention the achievement in informal communications, even on blog posts (the way I am doing right now). But I am not going to put it on my resume. (One should promise less, and then deliver more. That’s what I believe in.)

If I were to claim this result, it would also improve my World Rank by one. Thus, I would then get to World Rank # 4.

Still, I am leaving the actual making of this claim to some other day. Checking the repeatability will take too much time with my too-slow-for-the-purposes machine, and I need to focus on other areas of data science too. I find that I have begun falling behind on them. (With a powerful GPU-based machine, both would have been possible—MNIST and the rest of data science. But for now, I have to prioritize.)

Summary:

Since my last post, I learnt a lot about image recognition, classification, deep learning, and all. I also coded some of the most advanced ideas in deep learning for image processing that can at all be implemented with today’s best technology—and then, a few more of my own. Informally, I can now say that now I am at World Rank # 4. However, for the reasons given above, I am not going to make the claim for the improvement, as of today. So, my official rank on the MNIST dataset remains at 5.

2. Miscellaneous:

I have closed this entire enterprise of the MNIST trials for now. With my machine, I am happy to settle at the World Rank # 5 (as claimed, and # 4, informally and actually).

I might now explore deep learning for radiology (e.g. detection of abnormalities in chest X-rays or cancers), for just a bit.

However, it seems that I have been getting stuck into this local minimum of image recognition for too long. (In the absence of a gainful employment in this area, despite my world-class result, it still is just a local minimum.) So, to correct my overall tilt in the pursuit of the topics, for the time being, I am going to keep image processing relatively on the back-burner.

I have already started exploring time-series analysis for stock-markets. I would also be looking into deep learning from text data, esp. NLP. I have not thought a lot about it, and now I need to effect the correction.

… Should be back after a few weeks.

In the meanwhile, if you are willing to pay for my stock-market tips, I would sure hasten designing and perfecting my algorithms for the stock-market “prediction”s. … It just so happens that I had predicted yesterday, (Sunday 10th May) that the Bombay Stock Exchange’s Sensex indicator would definitely not rise today (on Monday), and that while the market should trade at around the same range as it did on Friday, the Sensex was likely to close a bit lower. This has come true. (It in fact closed just a bit higher than what I had predicted.)

… Of course, “one swallow does not a summer make.”… [Just checked it. Turns out that this one has come from Aristotle [^]. I don’t know why, but I had always carried the impression that the source was Shakespeare, or may be some renaissance English author. Apparently, not so.]

Still, don’t forget: If you have the money, I do have the inclination. And, the time. And, my data science skills. And, my algorithms. And…

…Anyway, take care and bye for now.

A song I like:

(Hindi) तुम से ओ हसीना कभी मुहब्बत मुझे ना करनी थी… (“tum se o haseenaa kabhee muhabbat naa” )
Music: Laxmikant Pyarelal
Lyrics: Anand Bakshi

[I know, I know… This one almost never makes it to the anyone’s lists. If I were not to hear it (and also love it) in my childhood, it wouldn’t make to my lists either. But given this chronological prior, the logical prior too has changed forever for me. …

This song is big on rhythm (though they overdo it a bit), and all kids always like songs that emphasize rhythms. … I have seen third-class songs from aspiring/actual pot-boilers, songs like तु चीझ बडी है मस्त मस्त (“too cheez baDi hai mast, mast”), being a huge hit with kids. Not just a big hit but a huge one. And that third-rate song was a huge hit even with one of my own nephews, when he was 3–4 years old. …Yes,  eventually, he did grow up…

But then, there are some song that you somehow never grow out of. For me, this song is one of them. (It might be a good idea to start running “second-class,” why, even “third-class” songs, about which I am a bit nostalgic. I listened to a lot of them during this boring lock-down, and even more boring, during all those long-running trials for the MNIST dataset. The boredom, especially on the second count, had to be killed. I did. … So, all in all, from my side, I am ready!)

Anyway, once I grew up, there were a couple of surprises regarding the credits of this song. I used to think, by default as it were, that it was Lata. No, it turned out to be Suman Kalyanpur. (Another thing. That there was no mandatory “kar” after the “pur” also was a surprise for me, but then, I digress here.)

Also, for no particular reason, I didn’t know about the music director. Listening to it now, after quite a while, I tried to take a guess. After weighing in between Shankar-Jaikishan and R.D. Burman, also with a faint consideration of Kalyanji-Anandji, I started suspecting RD. …Think about it. This song could easily go well with those from the likes of तीसरी मंझील (“Teesri Manjhil”) or काँरवा (“karvaan”) right?

But as a self-declared RD expert, I also couldn’t shuffle my memory and recall some incidence in which I could be found boasting to someone in the COEP/IITM hostels that this one indeed was an RD song. … So, it shouldn’t be RD either. … Could it be Shankar-Jaikishan then?  The song seemed to fit in with the late 60s SJ mold too. (Recall Shammi.) … Finally, I gave up, and checked out the music director.

Well, Laxmikant Pyarelal sure was a surprise to me! And it should be, to any one. Remember, this song is from 1967. This was the time that LP were coming out with songs like those in दोस्ती (“Dosti”),  मिलन (“Milan”), उपकार (“Upakar”), and similar. Compositions grounded in the Indian musical sense, through and through.

Well yes, LP have given a lot songs/tunes that go more in the Western-like fold. (Recall रोज शाम आती थी (“roz shyaam aatee thee”), for instance.) Still, this song is a bit too “out of the box” for LP when you consider their typical “box”. The orchestration, in particular, also at times feels as if the SD-RD “gang” like Manohari Singh, Sapan Chakravorty, et al. wasn’t behind it, and so does the rendering by Rafi (espcially with that sharp हा! (“hah!”) coming at the end of the refrain)… Oh well.

Anyway, give a listen and see how you find it.]

# Status: 99.78 % accuracy on the MNIST dataset (World Rank # 5, using a single CPU-only laptop)

As you know from my last post, for the past 5–6 weeks, I have been undertaking (very) extensive data-science trials. The one specific problem on which I have been working, during this time, is: the MNIST dataset.

Essentially, the MNIST problem is to teach a computer how to recognize hand-written digits. There are tons of materials on this topic, google on “MNIST” to know more.

MNIST has been called the “hello world” program of machine learning. The idea is that MNIST is merely a toy problem with which you begin exploring the vast area of Artificial Intelligence.

MNIST also has been described as the “drosophilia” of machine learning. IMO, this second characterization is more accurate. After all, OCR of digits is one problem that has kept an entire generation of PhD students and researchers burning oil at night!

Progress over time in the prediction accuracy on the MNIST dataset:

The accuracy in recognizing the MNIST digits has gone up from the earlier levels like some 80 % 20 years ago to 90+ % about 10 years ago, and then on to 99 % being routine today.

As the accuracy kept on increasing, to make reports easier on the eye (i.e., to avoid too many 9’s), people started reporting errors rather than correct results. Thus, instead of saying “I got 99.52 % accurate result”, people these days routinely just say: “I got 0.48% error rate.”

For testing your algorithms/models, the standardized test data provided by the MNIST dataset consists of exactly 10,000 samples. This well-rounded figure turns out to be quite convenient, because when you achieve, say, a 99.65 % accuracy level, you not only are 0.35 % wrong, but the actual number of samples on which you went wrong also is precisely 35. So, it’s very easy to convert between absolute error rates and percentage ones.

To come back to the progress in accuracy, with today’s libraries like PyTorch, TensorFlow and Keras, it indeed seems that MNIST has become a toy problem. Even a simple ‘net search will get you to hundreds of tutorials which show you “how to get into top 1 %”. [Sometimes, this is just a piece of hype which means nothing more than: how to get 99.0 % accuracynot how to get a 99-th percentile rank!

In other words, the trouble lies in the upper reaches of the accuracy scores. Each incremental progress from 99.65 % to 99.70 % to 99.75 % becomes harder and still harder—or even plain impossible to achieve, unless you basically improve your algorithm.

Despite throwing super-computers at it, no one has been able to achieve a 100.0% score on this dataset.

One reason for failing to achieve the 100% accuracy level is that some of the samples have come from people with astoundingly bad hand-writing.

Another reason for the failure is that a few of the standardized test samples (on which every one measures his accuracy) have in fact been wrongly labelled in the first place! For instance, the actual written-down digit may be “4”, but its standardized label says that it is “5” (because the guy who wrote that digit might have himself wrongly labelled it—or was a plain retard!). Now, once it’s standardized, if your algorithm actually guesses this sample right, then you lose on the accuracy as measured by the standardized data! This fact goes into the reason why it’s impossible to achieve 100.0% accuracy on the MNIST dataset.

Another trouble: There perhaps are too many ambiguous samples in this dataset. Two different algorithms may report the same accuracy (or error) level, say 26 errors. However, the particular samples on which they fail may not be exactly the same. In other words, there are many samples which show a tendency to get mis-classified on one algorithm but not on others!

Anyway, enough about these general remarks. Now, on to some hard data on the world-records.

The world’s best of the best:

While there are several Web pages that track the progress in the accuracy on the MNIST dataset, I found that many of the reliable ones have become outdated by now. In fact, today’s rank # 1 result does not appear in any of these compiled lists; I got to know of it through several Google searches.

So, here are the top 10 performances on the MNIST dataset, as found by me, as of today:

• Rank # 1: Number of Errors: 16. Accuracy: 99.84%
• Byerly, Kalganova and Dear. (31 Jan. 2020) “A branching and merging convolutional network with homogeneous filter capsules,” arXiv:2001.09136v3 [^]. Noted at “Papers with Code” [^]
• Rank # 2: Number of Errors: 17. Accuracy: 99.83%
• Assiri. (24 January 2020) “Stochastic optimization of plain convolutional neural networks with simple methods,” arXiv:2001.08856v1. [^] Noted at “Papers with Code” [^]
• “Matuzas77”. (28 January 2020) Jupyter notebook at GitHub: [^]. Result noted at the Wiki [^].”
• Rank # 3: Number of Errors: 18. Accuracy: 99.82%
• Kowsari, Heidarysafa, Brown, Meimandi, Barnes. (2018) “RMDL: Random Multimodel Deep Learning for classification”, arXiv:1805.01890. [^] Noted at “Papers with Code” [^]
• Rank # 4: Number of Errors: 21. Accuracy: 99.79%
• Wan, Zeiler, Zhang, LeCun, and Fergus. (2013) [ PDF from LeCun’s site: ^]
• Rank # 5: Number of Errors: 23. Accuracy: 99.77%
• Cireşan, Meier, and Schmidhuber. (2012)
• Sato, Nishimura, and Yokoi. (2015)
• Rank # 6: Number of Errors: 24. Accuarcy: 99.76%
• Chang, and Chen. (2015)
• Bochinski, Senst, and Sikora. (2017)
• Deotte.(2020) Rank # 1 on the Kaggle Leaderboard on the MNIST dataset.
Noted on the Kaggle Leaderboard [^]. Comment: I suppose Deotte’s model carries over to the original MNIST dataset too, with at least this much accuracy. The two datasets differ. Kaggle has only 48 K of training data.
• Rank # 7: Number of Errors: 25. Accuracy: 99.75%
• Baldominos, Saez, and Isasi. (2018)
• Rank # 8: Number of Errors: 27. Accuracy: 99.73%
• Wan, Zeiler, Zhang, LeCun, and Fergus. (2013)
• Cireşan, Meier, Gambardella, and Schmidhuber. (2011)
• Rank # 9: Number of Errors: 28. Accuracy: 99.73%
• Baldominos, Saez, and Isasi. (2019)
• Rank # 10: Number of Errors: 28. Accuracy: 99.73%
• Lee, Gallagher, and Tu. (2016)
• Alom, Hasan, Yakopcic, and Taha. (2017)

Entries without explicit references to papers were obtained from the following source:

Alejandro Baldominos, Yago Saez, and Pedro Isasi, “A survey of handwritten character recognition with MNIST and EMNIST,” Appl. Sci. 2019, vol. 9, pp. 3169; doi:10.3390/app9153169 [^]

This source is more comprehensive and up to date (except for not including the entries for the top 3 ranks which came after the publication of the paper). Other sources, well-known but not always very up-to-date, are those by Prof. LeCun himself [^], Dr. Rodrigo Benenson [^], Benchmarks.AI [^], and the Wiki [^].

My current results (as of 13th April 2020, 11:00 IST):

Out of tens of trials conducted, let me pick out the two latest (and best) results which I’ve got so far:

• Trials group 1 (my private code: 03.05)
• Total training time: 23.3 hours (on my CPU-only machine)
• Number of Errors: 24. Accuracy: 99.76 %.
• Estimated world-rank: Together with the existing # 6.
• Trials group 2 (my private code: 03.06)
• Total training time: 27.5 hours (on my CPU-only machine)
• Number of Errors: 22. Accuracy: 99.78 %.
• Estimated world-rank: Above the existing rank # 5.

It must be noted that a significant reason for getting good accuracy levels is that I implement some new ideas of my own (parallels to which I have not seen anywhere else so far).

How do my results compare with the best in the world? Would it be possible to improve it?

Recall that I have a CPU-only machine. (I can’t buy a better machine because I don’t have any money, and in fact am in loans. I have been out of job for 2 years by now.)

I anticipate that if I were to have a GPU-enabled laptop, I should have crossed the 99.80 % accuracy level. This is not a vacuous statement; it comes with two different observation points:

Point No. 1:

Dr. Chris Deotte uses an ensemble of 15 models, each of which is run to 45 epochs. He gets to 99.75% accuracy. He is currently at # 1 position on the Kaggle leaderboard for MNIST.

Running Deotte’s one model with 45 epochs takes about 132 minutes on my machine. So, running an ensemble of 15 models would take about 1984 minutes, i.e., about 33 hours.

On the plus side, the Kaggle dataset has only 48 K training samples, and Deotte’s model still gets to 99.757 % accuracy (after averaging). It’s conceivable that with full 60 k training samples of the original MNIST, this architecture might go a few points higher.

On the minus side, when I ran Deotte’s architecture on my machine (as mentioned just above), I used the full 60 k of the original MNIST dataset. I still got only 99.70 % as the best accuracy. So, going much beyond 99.76 may not be possible, due to statistical nature of the training process.

To summarize this point:

33 hours on Deotte’s architecture might take you to 99.76 to 99.78 % accuracy, if you use the full 60 K training samples of the original MNIST dataset.

In comparison, my architecture (which makes use of Deotte’s insights) takes 24 to 28 hours, and definitely has gone up to 99.78 % accuracy already.

Point No. 2:

I have merely browsed through the architectures of the top 5 papers, not studied them. I presume that all of them use powerful parallel processing, at least GPUs, and that none uses a CPU-only machine (the way I do).

The top papers don’t usually give actual execution times required for training. However, one thing is striking.

The Rank # 1 paper uses a single model of about 1.5 million parameters, with which it goes up to 99.82 % (statistically best result). Using an ensemble of such models, it goes to 99.84 %.

In contrast, in my biggest architecture, each model of the ensemble was limited to fewer than half a million parameters (around 400 k).

The total number of parameters per model, taken by itself, is only an indicative measure, not conclusive. Details of the architecture matter, and so does the fine-tuning of a plethora of hyper-parameters. Processing differs. For instance, batch-normalization hardly adds any trainable parameters, but still results in a lot of extra training time (as well as some benefit for not overfitting or better accuracy).

To reach a conclusion on this point:

There is a definite margin to think that my new ideas might significantly enhance accuracy with even bigger architectures up to the tune of 1.5 million parameters.

Reasons for my present level of success, and future directions:

Let me highlight it again. I got (what to my mind are such) great results because of some new ideas that I thought of, and tried.

I mean to say, I tried everything “in the book,” and then some more. Thus, my model uses a deep network of CNNs, and ideas like batch normalization, dropout, data augmentation, and ensembling. That is to say, all the tricks that Dr. Chris Deotte explains so helpfully here [^] and puts to use here [^]. I used almost all of those tricks (or as many as I could, given my computing budget). And then, I added a few more tricks, based on my own ideas and thinking.

I have been abstractly thinking about some new ideas for image recognition for quite some time—for months, may be, and in one case, for almost 1.5 years or so. A group of closely related ideas from this brain-storming began acquiring a better shape late last year—though it was nothing immediately translatable into code. Then, I got some got to some fairly well concretized form in the first half of February 2020 (on the evening of the12th of that month, while in a restaurant, if what I remember is right).

However, in late February 2020, we had to make a move (i.e. shift everything to new house). A lot of time got “wasted” in it. So, I could start writing code only from 03 March 2020.

By 15th March 2020 (the date of my last post), it was already becoming increasingly clear to me that my ideas aren’t going to work out in their initial form. Yet, as I continued writing code and trying out improvements, these earlier ideas got transmutated into further new ideas. It is these latter ideas which were then implemented on bigger and various models, using TensorFlow and Keras.

There remain a few of these last group of ideas (the “transmutated” ones) which I do have in a fairly direct form, but which I could not actually try out so far, mainly because of the low computational power that I have. Trials take too much time, occupy the only machine that I have.

Trials are boring, in a way—while they are in progress. They are extremely boring, in fact, while they last. For hours and hours, you can do nothing but almost helplessly watch the stream of numbers appearing on the screen. You can’t even make out whether the idea you are trying is progressing for the better or the worse, because a handful of numbers at a time don’t give you any idea about the progress over several hours. Even if the current effort is actually worthless, you get to know only after hours, may be on the evening the next day. So, it’s all boring in the meanwhile, and tends to be draining in a way. That’s why, with 6 weeks spent like this, I am also nearing the fag-end of my patience.

Therefore, for now, I have decided to undertake just one more trial—a biggest one that I can still manage on my machine. I estimate that this single trial will go on for some 48  hours of run-time on my machine or more. Thanks to the TF-Keras, I can interrupt the trials once in a while, and resume training. So, it might take something like 4–5 days for this set of trial runs to get over.

I will see how this planned trial goes, and then will close this research on MNIST one way or the other. The reason is, I need to pursue other datasets like Fashion-MNIST, EMNIST, Not-MNIST, CIFAR-10, etc. too. My ideas need to be tested on those grounds too.

In the meanwhile, I also plan to update my resume soon enough, to include the current success on this research. (Yes, it more than amply qualifies for a full time engagement running for about 1.5 months!)

So, as far as MNIST is concerned, expect at the most one more blog post, if at all. In any case, for obvious reasons, I am going off the blog for another week.

One final clarification: I will not share the ideas or the code:

Take my report as being truthful or otherwise. I don’t care. I am not going to share the code. The ideas are patentable, and some of them are of more general nature. So, I am going to proceed very cautiously in sharing any details to any one. Sorry, but that’s how it is.

Credits and Acknowledgement:

While a whole range of materials by a lot of people helped me get here, let me single out three references/people over all others:

• Dr. Michael Nielsen, for his online book on ANNs [^]. He expediated my learning time by a few factors over what would have happened if I were to use those usual university text-books. I got most of my ANN theory from him. Some of the ideas I now tried here had struck me, at least in the seed form, while learning ANNs from this book in 2018. (It’s good that I never shared those ideas back then!)
• Dr. Jason Brownlee (again an Aussie!) [^], for his excellent online tutorials, and very simple and elegant toy code that helps get a feeling of having mastered some particular topic.
• Dr. Chris Deotte, for his most excellent code [^], and even more helpful to me, his article detailing how he got to his Kaggle rank # 1 result [^]. I might have come to know about the best existing practices on CNNs and MNIST by consulting the original papers. But Deotte cut the time for people like me by months, perhaps even a couple of years, via just those two articles at Kaggle.

Thanks to all the three of them! And, thanks also to hundreds of people on different fora (like StackExchange) sharing insights, comments, fastest running code snippets, etc.

To conclude this post:

A poor jobless Indian man has achieved a 99.78% accuracy score on the MNIST dataset using a CPUs-only machine. He thus has an estimated rank of # 5 in the world today. His result has an accuracy better than that of the top-ranking kernel for the MNIST dataset on the Google-sponsored Kaggle’s leaderboard. He has further ideas to enhance accuracy, which, to his mind, are promising. However, he is limited by the available computation power of his machine. But in any case, he is not going to share any of the ideas or code with any one else.

Addendum: Also, the same man hopes that at least now, with this world-class achievement, the Indian/Foreign IT companies/recruiters/HR/Top Management idiots would stop asking him to take their tests before proceeding with any employment consideration. If they can manage to get their “high-quality” people / geniuses with their tests and without reference to world-class achievements, well and good for them; he wishes them further “success.”

A song I like:

(Hindi) पीछे पीछे आ कर, छू लो हमें पा कर (“peechhe peechhe aa kar, chhoo lo hame paa kar”)
Singers: Lata Mangeshkar, Hemant Kumar
Music: S. D. Burman
Lyrics: Sahir Ludhianvi

[This song has a very fresh feel overall—just like so many songs that came from the Dev Anand + SD Burman combination. I don’t know why this song escaped me for such a long time while writing the songs section. (I’ve been listening to it off and on. It’s just that while actually writing the songs section, this song didn’t strike me, that’s all)

… As an exception, I can recommend the video too. I liked the picturization here, though the photography is techically not so sound (the contrast is not well managed). But it’s Mahabaleshwar—my favorite area. Kalpana Kartik’s dancing (jumping around, really speaking!) is so natural. Hardly anything of today’s will ever even approach this kind of an utter naturalness, this kind of a freshness. BTW, to those who don’t know, Kalpana Kartik and Dev Anand were husband and wife in real life. I guess they had already married (1954) by the time this movie was shot (released 1955). … So, this song is like 65 years old!…

Anyway, stay at home, wear mask (the droplet sizes are relatively big so masks can definitely help, and even the air-flow dynamics around a single virus particle, at that small a scale, should be almost Brownian, not of Navier-Stokes), wash your hands, and engage your mind in something else… Should be back after after a week or 10 days. Take care and bye for now…]