Python scripts for simulating QM, part 1: Time evolution of a particle in the infinite potential box

A Special note for the Potential Employers from the Data Science field:

Recently, in April 2020, I achieved a World Rank # 5 on the MNIST problem. The initial announcement can be found here [^], and a further status update, here [^].

All my data science-related posts can always be found here [^].


What’s been happening?

OK, with that special note done, let me now turn my attention to the two of you who regularly read this blog.

… After the MNIST classification problem, I turned my attention to using LSTM’s for time-series predictions, simply because I hadn’t tried much on this topic any time earlier. … So, I implemented a few models. I even seemed to get good accuracy levels.

However, after having continuously worked on Data Science straight for about 2.5 months, I began feeling like taking a little break from it.

I had grown tired a bit though I had not realized it while actually going through all those tedious trials. (The fact of my not earning any money might have added to some draining of the energy too.) In fact, I didn’t have the energy to pick up something on the reading side either.

Basically, after a lot of intense effort, I wanted something that was light but engaging.

In the end, I decided to look into Python code for QM.

Earlier, in late 2018, I had written a few scripts on QM. I had also blogged about it; see the “part 0” of this series [^]. (Somehow, it has received unusually greater number of hits after I announced my MNIST result.) However, after a gap of 1.5 years, I could not easily locate those scripts. … So, like any self-respecting programmer, I decided to code them again!

Below is the first result, a movie. … Though a movie, it should be boring to any one who is not interested in QM.


Movie of the wavefunction of an electron inside an infinite potential box:

 

An electron in an infinite potential box.

An electron inside a 3 cm long 1D box with infinite potentials on the sides. Time evolution of the second excited state (n = 3). In the standard QM, such states are supposed to be“stationary”.

 


The Python code: Main file:

Now, the code. It should be boring to any one who is not a Python programmer.

"""
01.AJ_PIB_1D_Class.py

Written by and copyright (c) Ajit R. Jadhav. All rights reserved.

Particle in a Box.

Solves the Time-Independent Schrodinger Equation in 1D,
using the Finite Difference Method. Eigenvalues are found using
the direct matrix method.

Also shows the changes in the total wavefunction with time.
[The stationarity in the TISE is not static. In the mainstream 
QM, the stationarity is kinematical. In my new approach, it 
has been proposed to be kinetic. However, this simulation concerns
itself only with the standard, mainstream, QM.]

Environment: Developed and tested using:
Python 3.7.6 64-bit. All packages as in Anaconda 4.8.3 
on Ubuntu-MATE 20.04 (focal fossa) LTS of the date (below).

TBD: It would be nice to use sparse matrices. Also, use eigenvalue 
functions from scipy (instead of those from numpy).

History:
This file begun: Friday 2020 May 22 19:50:26 IST 
This version: Saturday 2020 May 23 16:07:52 IST 
"""

import numpy as np 
from scipy.integrate import simps
import matplotlib.pyplot as plt 
from matplotlib.animation import ImageMagickFileWriter

# SEE THE ACCOMPANYING FILE. THE NUMERICAL VALUES OF CONSTANTS 
# ARE DEFINED IN IT.
from FundaConstants import h, hbar, me, mP, eV2J

################################################################################
# THE MAIN CLASS 

class AJ_PIB_1D( object ):

    def __init__( self, nInteriorNodes, dMass, dh ):
        self.nInteriorNodes = nInteriorNodes
        self.nDomainNodes = nInteriorNodes + 2
        self.dMass = dMass # Mass associated with the QM particle
        self.dh = dh # cell-size ( \Delta x ).

        # The following numpy ndarray's get allocated 
        # during computations.
        self.aaT = None 
        self.aaV = None 
        self.aaH = None 

        self.aaPsi = None 
        self.aE_n = None 

        self.A_ana = None 
        self.ak_ana = None 
        self.aE_ana = None 
        self.aPsi_ana = None 
        return 

    # Creates the kinetic energy matrix for the interior points of the
    # domain.
    def ComputeKE( self ):
        self.aaT = np.zeros( (self.nInteriorNodes, self.nInteriorNodes) )
        for i in range( self.nInteriorNodes ):
            self.aaT[ i ][ i ] = -2.0
        for i in range( self.nInteriorNodes-1 ):
            self.aaT[ i ][ i+1 ] = 1.0
        for i in range( 1, self.nInteriorNodes ):
            self.aaT[ i ][ i-1 ] = 1.0 
        dFactorKE = - hbar**2 / ( 2.0 * self.dMass * self.dh**2 )
        self.aaT *= dFactorKE
        return

    # Creates the potential energy matrix for the interior points of the
    # domain. You can supply an arbitrary potential function via the array
    # aV of size = interior points count, and values in joule.
    def ComputePE( self, aV= None ):
        self.aaV = np.zeros( (self.nInteriorNodes, self.nInteriorNodes) )
        if None != aV:
            for i in range( self.nInteriorNodes ):
                self.aaV[ i ][ i ] = aV[ i ]
        return

    def ComputeHamiltonian( self, aV= None ):
        self.ComputeKE() 
        self.ComputePE( aV )
        self.aaH = self.aaT + self.aaV 
        return 

    # Note, the argument aX has the size = the count of domain points, not 
    # the count of interior points.
    # QM operators are Hermitian. We exploit this fact by using the 
    # numpy.linalg.eigh function here. It is faster than numpy.linalg.eig, 
    # and, unlike the latter, also returns results sorted in the ascending 
    # order. 
    # HOWEVER, NOTE, the eigenvectors returned can have signs opposite 
    # of what the analytial solution gives. The eigh (or eig)-returned 
    # vectors still *are* *eigen* vectors. However, for easier comparison 
    # with the analytical solution, we here provide a quick fix. 
    # See below in this function.
    def ComputeNormalizedStationaryStates( self, aX, bQuickFixForSigns= False ):
        assert( self.nDomainNodes == len( aX ) )
        
        # Use the LAPACK library to compute the eigenvalues
        aEigVals, aaEigVecs = np.linalg.eigh( self.aaH )
        
        # SQUARE-NORMALIZE THE EIGENVECTORS

        # Note:
        # The eigenvectors were found on the interior part of the domain, 
        # i.e., after dropping the boundary points at extreme ends. But the 
        # wavefunctions are defined over the entire domain (with the 
        # Dirichlet condition of 0.0 specified at the boundary points).
        
        nCntVecs = aaEigVecs.shape[ 1 ]
        assert( nCntVecs == self.nInteriorNodes )

        # eigh returns vectors in *columns*. We prefer to store the 
        # normalized vectors in *rows*.
        aaPsi = np.zeros( (self.nInteriorNodes, self.nDomainNodes) )
        for c in range( nCntVecs ):
            aPsi = aaEigVecs[ :, c ]
            # Find the area under the prob. curve
            aPsiSq = aPsi * aPsi
            dArea = simps( aPsiSq, aX[ 1 : self.nDomainNodes-1 ] )
            # Use it to normalize the wavefunction
            aPsi /= np.sqrt( dArea )
            # The analytical solution always has the curve going up 
            # (with a +ve gradient) at the left end of the domain. 
            # We exploit this fact to have a quick fix for the signs.
            if bQuickFixForSigns is True:
                d0 = aPsi[ 0 ]
                d1 = aPsi[ 1 ]
                if d1 < d0:
                    aPsi *= -1
            aaPsi[ c, 1 : self.nDomainNodes-1 ] = aPsi
        self.aaPsi = aaPsi
        self.aE_n = aEigVals
        return 

    # Standard analytical solution. See, e.g., the Wiki article: 
    # "Particle in a box"
    def ComputeAnalyticalSolutions( self, aX ):

        xl = aX[ 0 ]
        xr = aX[ self.nDomainNodes-1 ]
        L = xr - xl 
        A = np.sqrt( 2.0 / L )
        self.A_ana = A 

        # There are as many eigenvalues as there are interior points
        self.ak_ana = np.zeros( self.nInteriorNodes )
        self.aE_ana = np.zeros( self.nInteriorNodes )
        self.aPsi_ana = np.zeros( (self.nInteriorNodes, self.nDomainNodes) )
        for n in range( self.nInteriorNodes ):
            # The wavevector. (Notice the absence of the factor of '2'. 
            # Realize, the 'L' here refers to half of the wavelength of 
            # the two travelling waves which make up the standing wave. 
            # That's why.)
            k_n = n * np.pi / L 
            # The energy.
            E_n = n**2 * h**2 / (8.0 * self.dMass * L**2)

            # A simplest coordinate transformation:
            # For x in [0,L], phase angle is given as
            # Phase angle = n \pi x / L = k_n x. 
            # We have arbitrary coordinates for the left- and 
            # right-boundary point. So, 
            # Phase angle = k_n (x - xl)
            ap = k_n * (aX - xl)
            
            aPsiAna = A * np.sin( ap )
            self.ak_ana[ n ] = k_n 
            self.aE_ana[ n ] = E_n 
            # We prefer to store the normalized wavefunction 
            # in rows. (Contrast: linalg.eigh and eig store the 
            # eigen vectors in columns.)
            self.aPsi_ana[ n, : ] = aPsiAna
        return 
        
    # This function gets the value that is the numerical equivalent to the 
    # max wave amplitude 'A', i.e., sqrt(2/L) in the analytical solution. 
    def GetMaxAmplNum( self ):
        dMax = np.max( np.abs(self.aaPsi) )
        return dMax


################################################################################
# Utility functions

# NOTE: SAVING MOVIES CAN TAKE A LOT MORE TIME (7--10 MINUTES).
def Plot( model, n, nTimeSteps, bSaveMovie= False, sMovieFileName= None ):
    # The class computes and stores only the space-dependent part.
    aPsiSpace = model.aaPsi[ n-1 ]

    # Period = 2 \pi / \omega = 1 / \nu
    # Since E = h \nu, \nu = E/h, and so, Period = h/E
    nu = model.aE_n[ n-1 ] / h 
    dPeriod = 1.0 / nu 

    dt = dPeriod / (nTimeSteps-1) 

    # Plotting...

    plt.style.use( 'ggplot' )
    # Plot size is 9 inches X 6 inches. Reduce if you have smaller 
    # screen size.
    fig = plt.figure( figsize=(9,6) ) 

    if bSaveMovie is True:
        movieWriter = ImageMagickFileWriter()
        movieWriter.setup( fig, sMovieFileName )

    dMaxAmpl = model.GetMaxAmplNum() # Required for setting the plot limits.
    dTime = 0.0 # How much time has elapsed in the model?
    for t in range( nTimeSteps ):
        # TIME-DEPENDENT PART: 
        # \psi_t = e^{-i E_n/\hbar t} = e^{-i \omega_n t} = e^{-i 2 \pi nu t}
        # Compute the phase factor (which appears in the exponent).
        dTheta = 2.0 * np.pi * nu * dTime 
        # The Euler identity. Compute the *complete* wavefunction (space and time)
        # at this instant.
        aPsi_R_t = aPsiSpace * np.cos( dTheta )
        aPsi_I_t = - aPsiSpace * np.sin( dTheta )
        
        plt.clf()
        sTitle = "Particle in an infinite-potential box (n = %d)\n" % (n)
        sTitle += "Domain size: %7.4lf m. Oscillation period: %7.4lf s.\n" % (L, dPeriod)
        sTitle += "Time step: %3d/%3d. Time elapsed in simulation: %7.4lf s." % (t+1, nTimeSteps, dTime) 
        plt.title( sTitle )

        plt.xlabel( "Distance, m" )
        plt.ylabel( "Wavefunction amplitude, $m^{-1/2}$" )

        plt.grid( True )

        plt.xlim( (xl - L/10), (xr + L/10) )
        plt.ylim( -1.1*dMaxAmpl, 1.1*dMaxAmpl )

        plt.plot( aX, aPsi_R_t , color= 'darkcyan', label= r'Re($\Psi$)' )
        plt.plot( aX, aPsi_I_t , color= 'purple', label= r'Im($\Psi$)' )

        plt.legend( loc= 'upper right', shadow= True, fontsize= 'small' ) 
        
        if bSaveMovie is True:
            movieWriter.grab_frame()
        else:
            plt.pause( 0.001 )

        dTime += dt

    if bSaveMovie is True:
        movieWriter.finish()
    else:
        plt.show()


################################################################################
# MAIN DRIVER CODE
# We use the SI system throughout. [This is a program. It runs on a computer.]

# DOMAIN GEOMETRY

xl = -1.0e-02 # Left end (min. x)
xr = 2.0e-02 # Right end (max. x)
L = xr - xl # Length of the domain
xc = (xl + xr )/ 2.0 # Center point

# MESH
# Count of cells = Count of nodes in the domain - 1. 
# It's best to take an odd number for the count of domain nodes. This way, 
# the peak(s) of the wavefunction will not be missed.
nDomainNodes = 101
aX, dh = np.linspace(   start= xl, stop= xr, num= nDomainNodes, 
                        endpoint= True, retstep= True, dtype= np.float )

# In the PIB model, infinite potential exists at either ends. So, we apply 
# the Dirichlet BC of \Psi(x,t) = 0 at all times. Even if the discretized 
# Laplacian were to be computed for the entire domain, in handling the 
# homogeneous BC, both the boundary-points would get dropped during the
# matrix partitioning. Similarly, V(x) would be infinite there. That's why,
# we allocate the Laplacian and Potential Energy matrices only for the
# interior points. 
nInteriorNodes = nDomainNodes - 2 

# We model the electron here. 
# Constants are defined in a separate file: 'FundaConstants.py'
# Suggestion: Try mP for the proton, and check the \Psi amplitudes.
dMass = me 

# Instantiate the main model class.
model = AJ_PIB_1D( nInteriorNodes, dMass, dh )

# Compute the system Hamiltonian.

# If you want, supply a custom-made potential function as an ndarray of 
# size nInteriorNodes, as an argument. Values should be in joules.
# 'None' means 0 joules everywhere inside the box.
model.ComputeHamiltonian( None )

# Compute the stationary states. For the second argument, see the 
# note in the function implementation.
model.ComputeNormalizedStationaryStates( aX, True )

# You can also have the analytical solution computed. Uncomment the 
# line below. The numerical and analytical solutions are kept in 
# completely different arrays inside the class. However, the plotting
# code has to be careful.
### model.ComputeAnalyticalSolutions( aX )


# PLOT THE STATIONARY STATES, AND SHOW THEIR OSCILLATIONS WITH TIME.
# (TISE *is* dynamic; the stationarity is dynamical.) 

# Note, here, we choose n to be a 1-based index, as is the practice
# in physics. Thus, the ground state is given by n = 1, and not n = 0.
# However, NOTE, the computed arrays of wavefunctions have 0-based 
# indices. If such dual-usage for the indices gets confusing, simple! 
# Just change the code!

n = 3 
# No. of frames to be plotted for a single period of oscillations
# The 0-th and the (nTimeSteps-1)th state is identical because the 
# Hamiltonian here is time-independent. 
nTimeSteps = 200 

# You can save a movie, but note, animated GIFs take a lot more time, even 
# ~10 minutes or more, depending on the screen-size and dpi.
# Note, ImageMagickFileWriter will write the temp .png files in the current 
# directory (i.e. the same directory where this Python file resides). 
# In case the program crashes (or you stop the program before it finishes), 
# you will have to manually delete the temporary .png files from the 
# program directory! (Even if you specify a separate directory for the 
# movie, the *temporary* files still get generated in the program directory.)
### Plot( model, n, nTimeSteps, True, './AJ_PIB_e_%d.gif' % (n) )

Plot( model, n, nTimeSteps, False, None )

################################################################################
# EOF
################################################################################


The ancillary file:

The main file imports the following file. It has nothing but the values of fundamental physical constants noted in it (together with the sources). Here it is:

"""
FundaConstants.py

Written by and copyright (c) Ajit R. Jadhav. All rights reserved.

Begun: Thursday 2020 May 21 20:55:37 IST 
This version: Saturday 2020 May 23 20:39:22 IST 
"""

import numpy as np 

"""
Planck's constant
https://en.wikipedia.org/wiki/Planck_constant 
``The Planck constant is defined to have the exact value h = 6.62607015×10−34 J⋅s in SI units.'' 
"""
h = 6.62607015e-34 # J⋅s. Exact value.
hbar = h / (2.0 * np.pi) # J⋅s. 

"""
Electron rest mass
https://en.wikipedia.org/wiki/Electron_rest_mass
``9.1093837015(28)×10−31'' 2018 CODATA value. NIST
"""
me = 9.1093837015e-31 # kg. 


"""
Proton rest mass
https://en.wikipedia.org/wiki/Proton 
1.67262192369(51)×10−27 kg
"""
mP = 1.67262192369e-27 # kg

"""
eV to Joule
https://en.wikipedia.org/wiki/Electronvolt 
1 eV = 1.602176634×10−19 J
"""
eV2J = 1.602176634e-19 # Conversion factor

And, that’s about it, folks! No documentation. But I have added a lot of (otherwise unnecessary) comments.

Take care, and bye for now.


A song I like:

(Marathi) सावली उन्हामध्ये (“saavali unaamadhe”)
Music: Chinar Mahesh
Lyrics: Chandrashekhar Sanekar
Singer: Swapnil Bandodkar

I need a [very well paying] job in data science. Now.

I need a very well paying job in data science. Now. In Pune, India.


 



Yes, I was visiting Kota for some official work when at the railway station of the [back then, a simple little] town, on a “whim” (borne out of a sense of curiosity, having heard the author’s name), I bought it. That was on 14th July 1987. The stamp of the A. H. Wheeler and Company (Rupa Publications), so well known to us all (including IITians and IIM graduates) back then, stand in a mute testimony for the same—the price, and the fact that this little book was imported by them. As to bearing testimony to the event, so does my signature, and the noting of the date. (I would ordinarily have no motivation to note a fake date, what do you say?) Also notable is the price of the book: Rs. 59/-. Bought out of Rs. 1800/- per month, if I remember those days right (and plain because I was an M. Tech. from (one of the then five) IITs. My juniors from my own UG college, COEP, would have had to start with a Rs. 1200/- or Rs. 1400/- package, and rise to my level in about 3 years, back then.)

Try to convince my the then back self that I would be jobless today.

No, really. Do that.

And see if I don’t call you names. Right here.

Americans!


A song I like:

(English, pop-song): “Another town, another train…”
Band (i.e. music, composition, lyrics, etc., to the best of my knowledge): ABBA

Bye for now.


And develop a habit to read—and understand—books. That’s important. As my example serves to illustrate the point. Whether I go jobful or jobless. It’s a good habit to cultivate.

But then, Americans have grown so insensitive to the authentic pains of others—including real works by others. The said attitude must reflect inwards too. The emphasis is on the word “authentic.” If a man doesn’t care for another honest, really very hard-working man in pain but spends his intellect and time in finding rationalizations to enhance his own prestige and money-obtaining powers, by the law of integrative mechanism of conscisousness that is the law of “karma,” the same thing must haunt him back—whether he be a Republican, or a Democrat. (Just a familiarity with the word “karma” is not enough to escape its bad—or good—effects. What matters are actions (“karma”s), ultimately. But given the fact that man has intellect, these are helped, not obscured, by it.)

Go, convince Americans to give me a good, well-paying job, in data science, and in Pune—the one that matches my one-sentence profile (mentioned here) and my temperament. As to the latter, simple it is, to put it in one sentence: “When the time calls for it, I am known to call a spade a spade.”

And, I can call Americans (and JPBTIs) exactly what they have earned.

But the more important paragraph was the second in this section. Starting from “But then, Americans have grown so insensitive to the authentic… .”

Instead of “pains,” you could even add a value / virtue. The statement would hold.

 

 

Data science code snippets—2: Using world’s simplest ANN to visualize that deep learning is hard

Good morning! [LOL!]


In the context of machine learning, “learning” only means: changes that get effected to the weights and biases of a network. Nothing more, nothing less.

I wanted to understand exactly how the learning propagates through the layers of a network. So, I wrote the following piece of code. Actually, the development of the code went through two distinct stages.


First stage: A two-layer network (without any hidden layer); only one neuron per layer:

In the first stage, I wrote what demonstrably is world’s simplest possible artificial neural network. This network comes with only two layers: the input layer, and the output layer. Thus, there is no hidden layer at all.

The advantage with such a network (“linear” or “1D” and without hidden layer) is that the total error at the output layer is a function of just one weight and one bias. Thus the total error (or the cost function) is a function of only two independent variables. Therefore, the cost function-surface can (at all) be directly visualized or plotted. For ease of coding, I plotted this function as a contour plot, but a 3D plot also should be possible.

Here are couple of pictures it produces. The input randomly varies in the interval [0.0, 1.0) (i.e., excluding 1.0), and the target is kept as 0.15.

The following plot shows an intermittent stage during training, a stage when the gradient descent algorithm is still very much in progress.

SImplest network with just input and output layers, with only one neuron per layer: Gradient descent in progress.

The following plot shows when the gradient descent has taken the configuration very close to the local (and global) minimum:

SImplest network with just input and output layers, with only one neuron per layer: Gradient descent near the local (and global) minimum.


Second stage: An n-layer network (with many hidden layers); only one neuron per layer

In the second stage, I wanted to show that even if you add one or more hidden layers, the gradient descent algorithm works in such a way that most of the learning occurs only near the output layer.

So, I generalized the code a bit to have an arbitrary number of hidden layers. However, the network continues to have only one neuron per layer (i.e., it maintains the topology of the bogies of a train). I then added a visualization showing the percent changes in the biases and weights at each layer, as learning progresses.

Here is a representative picture it produces when the total number of layers is 5 (i.e. when there are 3 hidden layers). It was made with both the biases and the weights all being set to the value of 2.0:

It is clearly seen that almost all of learning is limited only to the output layer; the hidden layers have learnt almost nothing!

Now, by way of a contrast, here is what happens when you have all initial biases of 0.0, and all initial weights of 2.0:

Here, the last hidden layer has begun learning enough that it shows some visible change during training, even though the output layer learns much more than does the last hidden layer. Almost all learning is via changes to weights, not biases.

Next, here is the reverse situation: when you have all initial biases of 2.0, but all initial weights of 0.0:

The bias of the hidden layer does undergo a slight change, but in an opposite (positive) direction. Compared to the case just above, the learning now is relatively more concentrated in the output layer.

Finally, here is what happens when you initialize both biases and weights to 0.0.

The network does learn (the difference in the predicted vs. target does go on diminishing as the training progresses). However, the percentage change is too small to be visually registered (when plotted to the same scale as what was used earlier).


The code:

Here is the code which produced all the above plots (but you have to suitably change the hard-coded parameters to get to each of the above cases):

'''
SimplestANNInTheWorld.py
Written by and Copyright (c) Ajit R. Jadhav. All rights reserved.
-- Implements the case-study of the simplest possible 1D ANN.
-- Each layer has only neuron. Naturally, there is only one target!
-- It even may not have hidden layers. 
-- However, it can have an arbitrary number of hidden layers. This 
feature makes it a good test-bed to see why and how the neurons 
in the hidden layers don't learn much during deep learning, during 
a ``straight-forward'' application of the gradient descent algorithm. 
-- Please do drop me a comment or an email 
if you find this code useful in any way, 
say, in a corporate training setup or 
in academia. Thanks in advance!
-- History:
* 30 December 2018 09:27:57  IST: 
Project begun
* 30 December 2018 11:57:44  IST: 
First version that works.
* 01 January 2019 12:11:11  IST: 
Added visualizations for activations and gradient descent for the 
last layer, for no. of layers = 2 (i.e., no hidden layers). 
* 01 January 2019 18:54:36  IST:
Added visualizations for percent changes in biases and 
weights, for no. of layers &amp;amp;amp;amp;gt;=3 (i.e. at least one hidden layer).
* 02 January 2019 08:40:17  IST: 
The version as initially posted on my blog.
'''
import numpy as np 
import matplotlib.pyplot as plt 

################################################################################
# Functions to generate the input and test data

def GenerateDataRandom( nTrainingCases ):
    # Note: randn() returns samples from the normal distribution, 
    # but rand() returns samples from the uniform distribution: [0,1). 
    adInput = np.random.rand( nTrainingCases )
    return adInput

def GenerateDataSequential( nTrainingCases ):
    adInput = np.linspace( 0.0, 1.0, nTrainingCases )
    return adInput

def GenerateDataConstant( nTrainingCases, dVal ):
    adInput = np.full( nTrainingCases, dVal )
    return adInput

################################################################################
# Functions to generate biases and weights

def GenerateBiasesWeightsRandom( nLayers ):
    adAllBs = np.random.randn( nLayers-1 )
    adAllWs = np.random.randn( nLayers-1 )
    return adAllBs, adAllWs

def GenerateBiasesWeightsConstant( nLayers, dB, dW ):
    adAllBs = np.ndarray( nLayers-1 )
    adAllBs.fill( dB )
    adAllWs = np.ndarray( nLayers-1 )
    adAllWs.fill( dW )
    return adAllBs, adAllWs

################################################################################
# Other utility functions

def Sigmoid( dZ ):
    return 1.0 / ( 1.0 + np.exp( - dZ ) )

def SigmoidDerivative( dZ ):
    dA = Sigmoid( dZ )
    dADer = dA * ( 1.0 - dA )
    return dADer

# Server function. Called with activation at the output layer. 
# In this script, the target value is always one and the 
# same, i.e., 1.0). 
# Assumes that the form of the cost function is: 
#       C_x = 0.5 * ( dT - dA )^2 
# where, note carefully, that target comes first. 
# Hence the partial derivative is:
#       \partial C_x / \partial dA = - ( dT - dA ) = ( dA - dT ) 
# where note carefully that the activation comes first.
def CostDerivative( dA, dTarget ):
    return ( dA - dTarget ) 

def Transpose( dA ):
    np.transpose( dA )
    return dA

################################################################################
# Feed-Forward

def FeedForward( dA ):
    ## print( "\tFeed-forward" )
    l_dAllZs = []
    # Note, this makes l_dAllAs have one extra data member
    # as compared to l_dAllZs, with the first member being the
    # supplied activation of the input layer 
    l_dAllAs = [ dA ]
    nL = 1
    for w, b in zip( adAllWs, adAllBs ):
        dZ = w * dA + b
        l_dAllZs.append( dZ )
        # Notice, dA has changed because it now refers
        # to the activation of the current layer (nL) 
        dA = Sigmoid( dZ )  
        l_dAllAs.append( dA )
        ## print( "\tLayer: %d, Z: %lf, A: %lf" % (nL, dZ, dA) )
        nL = nL + 1
    return ( l_dAllZs, l_dAllAs )

################################################################################
# Back-Propagation

def BackPropagation( l_dAllZs, l_dAllAs ):
    ## print( "\tBack-Propagation" )
    # Step 1: For the Output Layer
    dZOP = l_dAllZs[ -1 ]
    dAOP = l_dAllAs[ -1 ]
    dZDash = SigmoidDerivative( dZOP )
    dDelta = CostDerivative( dAOP, dTarget ) * dZDash
    dGradB = dDelta
    adAllGradBs[ -1 ] = dGradB

    # Since the last hidden layer has only one neuron, no need to take transpose.
    dAPrevTranspose = Transpose( l_dAllAs[ -2 ] )
    dGradW = np.dot( dDelta, dAPrevTranspose )
    adAllGradWs[ -1 ] = dGradW
    ## print( "\t* Layer: %d\n\t\tGradB: %lf, GradW: %lf" % (nLayers-1, dGradB, dGradW) )

    # Step 2: For all the hidden layers
    for nL in range( 2, nLayers ):
        dZCur = l_dAllZs[ -nL ]
        dZCurDash = SigmoidDerivative( dZCur )
        dWNext = adAllWs[ -nL+1 ]
        dWNextTranspose = Transpose( dWNext )
        dDot = np.dot( dWNextTranspose, dDelta ) 
        dDelta = dDot * dZCurDash
        dGradB = dDelta
        adAllGradBs[ -nL ] = dGradB

        dAPrev = l_dAllAs[ -nL-1 ]
        dAPrevTrans = Transpose( dAPrev )
        dGradWCur = np.dot( dDelta, dAPrevTrans )
        adAllGradWs[ -nL ] = dGradWCur

        ## print( "\tLayer: %d\n\t\tGradB: %lf, GradW: %lf" % (nLayers-nL, dGradB, dGradW) )

    return ( adAllGradBs, adAllGradWs )

def PlotLayerwiseActivations( c, l_dAllAs, dTarget ):
    plt.subplot( 1, 2, 1 ).clear()
    dPredicted = l_dAllAs[ -1 ]
    sDesc = "Activations at Layers. Case: %3d\nPredicted: %lf, Target: %lf" % (c, dPredicted, dTarget) 
    plt.xlabel( "Layers" )
    plt.ylabel( "Activations (Input and Output)" )
    plt.title( sDesc )
    
    nLayers = len( l_dAllAs )
    dES = 0.2	# Extra space, in inches
    plt.axis( [-dES, float(nLayers) -1.0 + dES, -dES, 1.0+dES] )
    
    # Plot a vertical line at the input layer, just to show variations
    plt.plot( (0,0), (0,1), "grey" )
    
    # Plot the dots for the input and hidden layers
    for i in range( nLayers-1 ):
        plt.plot( i, l_dAllAs[ i ], 'go' )
    # Plot the dots for the output layer
    plt.plot( nLayers-1, dPredicted, 'bo' )
    plt.plot( nLayers-1, dTarget, 'ro' )
    
def PlotGradDescent( c, dOrigB, dOrigW, dB, dW ):
    plt.subplot( 1, 2, 2 ).clear()
    
    d = 5.0
    ContourSurface( d )
    plt.axis( [-d, d, -d, d] ) 
    plt.plot( dOrigB, dOrigW, 'bo' )
    plt.plot( dB, dW, 'ro' )
    plt.grid()
    plt.xlabel( "Biases" )
    plt.ylabel( "Weights" )
    sDesc = "Gradient Descent for the Output Layer.\n" \
    "Case: %3d\nWeight: %lf, Bias: %lf" % (c, dW, dB) 
    plt.title( sDesc )
    
    
def ContourSurface( d ):
    nDivs = 10
    dDelta = d / nDivs
    w = np.arange( -d, d, dDelta )
    b = np.arange( -d, d, dDelta )
    W, B = np.meshgrid( w, b ) 
    A = Sigmoid( W + B )
    plt.imshow( A, interpolation='bilinear', origin='lower',
                cmap=plt.cm.Greys, # cmap=plt.cm.RdYlBu_r, 
                extent=(-d, d, -d, d), alpha=0.8 )
    CS = plt.contour( B, W, A )
    plt.clabel( CS, inline=1, fontsize=7 )

def PlotLayerWiseBiasesWeights( c, adOrigBs, adAllBs, adOrigWs, adAllWs, dPredicted, dTarget ):
    plt.clf()
    
    nComputeLayers = len( adOrigBs )
    plt.axis( [-0.2, nComputeLayers+0.7, -320.0, 320.0] )
    
    adBPct = GetPercentDiff( nComputeLayers, adAllBs, adOrigBs )
    adWPct = GetPercentDiff( nComputeLayers, adAllWs, adOrigWs )
    print( "Case: %3d" \
    "\nPercent Changes in Biases:\n%s" \
    "\nPercent Changes in Weights:\n%s\n" \
     % (c, adBPct, adWPct)  )
    adx = np.linspace( 0.0, nComputeLayers-1, nComputeLayers )
    plt.plot( adx + 1.0, adWPct, 'ro' )
    plt.plot( adx + 1.15, adBPct, 'bo' )
    plt.grid()
    plt.xlabel( "Layer Number" )
    plt.ylabel( "Percent Change in Weight (Red) and Bias (Blue)" )
    sTitle = "How most learning occurs only at an extreme layer\n" \
    "Percent Changes to Biases and Weights at Each Layer.\n" \
    "Training case: %3d, Target: %lf, Predicted: %lf" % (c, dTarget, dPredicted) 
    plt.title( sTitle )

def GetPercentDiff( n, adNow, adOrig ):
    adDiff = adNow - adOrig
    print( adDiff )
    adPct = np.zeros( n )
    dSmall = 1.0e-10
    if all( abs( adDiff ) ) &amp;amp;amp;amp;gt; dSmall and all( abs(adOrig) ) &amp;amp;amp;amp;gt; dSmall:
        adPct = adDiff / adOrig * 100.0
    return adPct


################################################################################
# The Main Script
################################################################################

dEta = 1.0 # The learning rate
nTrainingCases = 100
nTestCases = nTrainingCases // 5
adInput = GenerateDataRandom( nTrainingCases ) #, 0.0 )
adTest = GenerateDataRandom( nTestCases )

np.random.shuffle( adInput )
## print( "Data:\n %s" % (adInput) )

# Must be at least 2. Tested up to 10 layers.
nLayers = 2
# Just a single target! Keep it in the interval (0.0, 1.0), 
# i.e., excluding both the end-points of 0.0 and 1.0.

dTarget = 0.15

# The input layer has no biases or weights. Even the output layer 
# here has only one target, and hence, only one neuron.
# Hence, the weights matrix for all layers now becomes just a 
# vector.
# For visualization with a 2 layer-network, keep biases and weights 
# between [-4.0, 4.0]

# adAllBs, adAllWs = GenerateBiasesWeightsRandom( nLayers )
adAllBs, adAllWs = GenerateBiasesWeightsConstant( nLayers, 2.0, 2.0 )
dOrigB = adAllBs[-1]
dOrigW = adAllWs[-1]
adOrigBs = adAllBs.copy()
adOrigWs = adAllWs.copy()

## print( "Initial Biases\n", adAllBs )
## print( "Initial Weights\n", adAllWs )

plt.figure( figsize=(10,5) )
    
# Do the training...
# For each input-target pair,
for c in range( nTrainingCases ):
    dInput = adInput[ c ]
    ## print( "Case: %d. Input: %lf" % (c, dInput) )
    
    adAllGradBs = [ np.zeros( b.shape ) for b in adAllBs ]
    adAllGradWs = [ np.zeros( w.shape ) for w in adAllWs ]
    
    # Do the feed-forward, initialized to dA = dInput
    l_dAllZs, l_dAllAs = FeedForward( dInput )
    
    # Do the back-propagation
    adAllGradBs, adAllGradWs = BackPropagation( l_dAllZs, l_dAllAs )

    ## print( "Updating the network biases and weights" )
    adAllBs = [ dB - dEta * dDeltaB 
                for dB, dDeltaB in zip( adAllBs, adAllGradBs ) ]
    adAllWs = [ dW - dEta * dDeltaW 
                for dW, dDeltaW in zip( adAllWs, adAllGradWs ) ]
    
    ## print( "The updated network biases:\n", adAllBs )
    ## print( "The updated network weights:\n", adAllWs )

    if 2 == nLayers:
        PlotLayerwiseActivations( c, l_dAllAs, dTarget )
        dW = adAllWs[ -1 ]
        dB = adAllBs[ -1 ]
        PlotGradDescent( c, dOrigB, dOrigW, dB, dW )
    else:
        # Plot in case of many layers: Original and Current Weights, Biases for all layers
        # and Activations for all layers 
        dPredicted = l_dAllAs[ -1 ]
        PlotLayerWiseBiasesWeights( c, adOrigBs, adAllBs, adOrigWs, adAllWs, dPredicted, dTarget )
    plt.pause( 0.1 )

plt.show()

# Do the testing
print( "\nTesting..." )
for c in range( nTestCases ):
    
    dInput = adTest[ c ]
    print( "\tTest Case: %d, Value: %lf" % (c, dInput) )
    
    l_dAllZs, l_dAllAs = FeedForward( dInput )
    dPredicted = l_dAllAs[ -1 ]
    dDiff = dTarget - dPredicted
    dCost = 0.5 * dDiff * dDiff
    print( "\tInput: %lf, Predicted: %lf, Target: %lf, Difference: %lf, Cost: %lf\n" % (dInput, dPredicted, dTarget, dDiff, dCost) )

print( "Done!" )

 


Things you can try:

  • Change one or more of the following parameters, and see what happens:
    • Target value
    • Values of initial weights and biases
    • Number of layers
    • The learning rate, dEta
  • Change the cost function; e.g., try the linear function instead of the Sigmoid. Change the code accordingly.
  • Also, try to conceptually see what would happen when the number of neurons per layer is 2 or more…

Have fun!


A song I like:

(Marathi) “pahaaTe pahaaTe malaa jaag aalee”
Music and Singer: C. Ramchandra
Lyrics: Suresh Bhat