Data Science links—1

Oakay… My bookmarks library has grown too big. Time to move at least a few of them to a blog-post. Here they are. … The last one is not on Data Science, but it happens to be the most important one of them all!



On Bayes’ theorem:

Oscar Bonilla. “Visualizing Bayes’ theorem” [^].

Jayesh Thukarul. “Bayes’ Theorem explained” [^].

Victor Powell. “Conditional probability” [^].


Explanations with visualizations:

Victor Powell. “Explained Visually.” [^]

Christopher Olah. Many topics [^]. For instance, see “Calculus on computational graphs: backpropagation” [^].


Fooling the neural network:

Julia Evans. “How to trick a neural network into thinking a panda is a vulture” [^].

Andrej Karpathy. “Breaking linear classifiers on ImageNet” [^].

A. Nguyen, J. Yosinski, and J. Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images” [^]

Melanie Mitchell. “Artificial Intelligence hits the barrier of meaning” [^]


The Most Important link!

Ijad Madisch. “Why I hire scientists, and why you should, too” [^]


A song I like:

(Western, pop) “Billie Jean”
Artist: Michael Jackson

[Back in the ’80s, this song used to get played in the restaurants from the Pune camp area, and also in the cinema halls like West-End, Rahul, Alka, etc. The camp area was so beautiful, back then—also uncrowded, and quiet.

This song would also come floating on the air, while sitting in the evening at the Quark cafe, situated in the middle of all the IITM hostels (next to skating rink). Some or the other guy would be playing it in a nearby hostel room on one of those stereo systems which would come with those 1 or 2 feet tall “hi-fi” speaker-boxes. Each box typically had three stacked speakers. A combination of a separately sitting sub-woofer with a few small other boxes or a soundbar, so ubiquitous today, had not been invented yet… Back then, Quark was a completely open-air cafe—a small patch of ground surrounded by small trees, and a tiny hexagonal hut, built in RCC, for serving snacks. There were no benches, even, at Quark. People would sit on those small concrete blocks (brought from the civil department where they would come for testing). Deer would be roaming very nearby around. A daring one or two could venture to come forward and eat pizza out of your (fully) extended hand!…

…Anyway, coming back to the song itself, I had completely forgotten it, but got reminded when @curiouswavefn mentioned it in one of his tweets recently. … When I read the tweet, I couldn’t make out that it was this song (apart from Bach’s variations) that he was referring to. I just idly checked out both of them, and then, while listening to it, I suddenly recognized this song. … You see, unlike so many other guys of e-schools of our times, I wouldn’t listen to a lot of Western pop-songs those days (and still don’t). Beatles, ABBA and a few other groups/singers, may be, also the Western instrumentals (a lot) and the Western classical music (some, but definitely). But somehow, I was never too much into the Western pop songs. … Another thing. The way these Western singers sing, it used to be very, very hard for me to figure out the lyrics back then—and the situation continues mostly the same way even today! So, recognizing a song by its name was simply out of the question….

… Anyway, do check out the links (even if some of them appear to be out of your reach on the first reading), and enjoy the song. … Take care, and bye for now…]

 

Advertisements

The Machine Learning as an Expert System

To cut a somewhat long story short, I think that I can “see” that Machine Learning (including Deep Learning) can actually be regarded as a rules-based expert system, albeit of a special kind.

I am sure that people must have written articles expressing this view. However, simple googling didn’t get me to any useful material.

I would deeply appreciate it if someone could please point out references in this direction. Thanks in advance.


BTW, here is a very neat infographic on AI: [^]; h/t [^]. … Once you finish reading it, re-read this post, please! Exactly once again, and only the first part—i.e., without recursion!. …


A song I like:

(Marathi) “visar preet, visar geet, visar bheT aapuli”
Music: Yashwant Dev
Lyrics: Shantaram Nandgaonkar
Singer: Sudhir Phadke

 

Data science code snippets—2: Using world’s simplest ANN to visualize that deep learning is hard

Good morning! [LOL!]


In the context of machine learning, “learning” only means: changes that get effected to the weights and biases of a network. Nothing more, nothing less.

I wanted to understand exactly how the learning propagates through the layers of a network. So, I wrote the following piece of code. Actually, the development of the code went through two distinct stages.


First stage: A two-layer network (without any hidden layer); only one neuron per layer:

In the first stage, I wrote what demonstrably is world’s simplest possible artificial neural network. This network comes with only two layers: the input layer, and the output layer. Thus, there is no hidden layer at all.

The advantage with such a network (“linear” or “1D” and without hidden layer) is that the total error at the output layer is a function of just one weight and one bias. Thus the total error (or the cost function) is a function of only two independent variables. Therefore, the cost function-surface can (at all) be directly visualized or plotted. For ease of coding, I plotted this function as a contour plot, but a 3D plot also should be possible.

Here are couple of pictures it produces. The input randomly varies in the interval [0.0, 1.0) (i.e., excluding 1.0), and the target is kept as 0.15.

The following plot shows an intermittent stage during training, a stage when the gradient descent algorithm is still very much in progress.

SImplest network with just input and output layers, with only one neuron per layer: Gradient descent in progress.

The following plot shows when the gradient descent has taken the configuration very close to the local (and global) minimum:

SImplest network with just input and output layers, with only one neuron per layer: Gradient descent near the local (and global) minimum.


Second stage: An n-layer network (with many hidden layers); only one neuron per layer

In the second stage, I wanted to show that even if you add one or more hidden layers, the gradient descent algorithm works in such a way that most of the learning occurs only near the output layer.

So, I generalized the code a bit to have an arbitrary number of hidden layers. However, the network continues to have only one neuron per layer (i.e., it maintains the topology of the bogies of a train). I then added a visualization showing the percent changes in the biases and weights at each layer, as learning progresses.

Here is a representative picture it produces when the total number of layers is 5 (i.e. when there are 3 hidden layers). It was made with both the biases and the weights all being set to the value of 2.0:

It is clearly seen that almost all of learning is limited only to the output layer; the hidden layers have learnt almost nothing!

Now, by way of a contrast, here is what happens when you have all initial biases of 0.0, and all initial weights of 2.0:

Here, the last hidden layer has begun learning enough that it shows some visible change during training, even though the output layer learns much more than does the last hidden layer. Almost all learning is via changes to weights, not biases.

Next, here is the reverse situation: when you have all initial biases of 2.0, but all initial weights of 0.0:

The bias of the hidden layer does undergo a slight change, but in an opposite (positive) direction. Compared to the case just above, the learning now is relatively more concentrated in the output layer.

Finally, here is what happens when you initialize both biases and weights to 0.0.

The network does learn (the difference in the predicted vs. target does go on diminishing as the training progresses). However, the percentage change is too small to be visually registered (when plotted to the same scale as what was used earlier).


The code:

Here is the code which produced all the above plots (but you have to suitably change the hard-coded parameters to get to each of the above cases):

'''
SimplestANNInTheWorld.py
Written by and Copyright (c) Ajit R. Jadhav. All rights reserved.
-- Implements the case-study of the simplest possible 1D ANN.
-- Each layer has only neuron. Naturally, there is only one target!
-- It even may not have hidden layers. 
-- However, it can have an arbitrary number of hidden layers. This 
feature makes it a good test-bed to see why and how the neurons 
in the hidden layers don't learn much during deep learning, during 
a ``straight-forward'' application of the gradient descent algorithm. 
-- Please do drop me a comment or an email 
if you find this code useful in any way, 
say, in a corporate training setup or 
in academia. Thanks in advance!
-- History:
* 30 December 2018 09:27:57  IST: 
Project begun
* 30 December 2018 11:57:44  IST: 
First version that works.
* 01 January 2019 12:11:11  IST: 
Added visualizations for activations and gradient descent for the 
last layer, for no. of layers = 2 (i.e., no hidden layers). 
* 01 January 2019 18:54:36  IST:
Added visualizations for percent changes in biases and 
weights, for no. of layers >=3 (i.e. at least one hidden layer).
* 02 January 2019 08:40:17  IST: 
The version as initially posted on my blog.
'''
import numpy as np 
import matplotlib.pyplot as plt 

################################################################################
# Functions to generate the input and test data

def GenerateDataRandom( nTrainingCases ):
    # Note: randn() returns samples from the normal distribution, 
    # but rand() returns samples from the uniform distribution: [0,1). 
    adInput = np.random.rand( nTrainingCases )
    return adInput

def GenerateDataSequential( nTrainingCases ):
    adInput = np.linspace( 0.0, 1.0, nTrainingCases )
    return adInput

def GenerateDataConstant( nTrainingCases, dVal ):
    adInput = np.full( nTrainingCases, dVal )
    return adInput

################################################################################
# Functions to generate biases and weights

def GenerateBiasesWeightsRandom( nLayers ):
    adAllBs = np.random.randn( nLayers-1 )
    adAllWs = np.random.randn( nLayers-1 )
    return adAllBs, adAllWs

def GenerateBiasesWeightsConstant( nLayers, dB, dW ):
    adAllBs = np.ndarray( nLayers-1 )
    adAllBs.fill( dB )
    adAllWs = np.ndarray( nLayers-1 )
    adAllWs.fill( dW )
    return adAllBs, adAllWs

################################################################################
# Other utility functions

def Sigmoid( dZ ):
    return 1.0 / ( 1.0 + np.exp( - dZ ) )

def SigmoidDerivative( dZ ):
    dA = Sigmoid( dZ )
    dADer = dA * ( 1.0 - dA )
    return dADer

# Server function. Called with activation at the output layer. 
# In this script, the target value is always one and the 
# same, i.e., 1.0). 
# Assumes that the form of the cost function is: 
#       C_x = 0.5 * ( dT - dA )^2 
# where, note carefully, that target comes first. 
# Hence the partial derivative is:
#       \partial C_x / \partial dA = - ( dT - dA ) = ( dA - dT ) 
# where note carefully that the activation comes first.
def CostDerivative( dA, dTarget ):
    return ( dA - dTarget ) 

def Transpose( dA ):
    np.transpose( dA )
    return dA

################################################################################
# Feed-Forward

def FeedForward( dA ):
    ## print( "\tFeed-forward" )
    l_dAllZs = []
    # Note, this makes l_dAllAs have one extra data member
    # as compared to l_dAllZs, with the first member being the
    # supplied activation of the input layer 
    l_dAllAs = [ dA ]
    nL = 1
    for w, b in zip( adAllWs, adAllBs ):
        dZ = w * dA + b
        l_dAllZs.append( dZ )
        # Notice, dA has changed because it now refers
        # to the activation of the current layer (nL) 
        dA = Sigmoid( dZ )  
        l_dAllAs.append( dA )
        ## print( "\tLayer: %d, Z: %lf, A: %lf" % (nL, dZ, dA) )
        nL = nL + 1
    return ( l_dAllZs, l_dAllAs )

################################################################################
# Back-Propagation

def BackPropagation( l_dAllZs, l_dAllAs ):
    ## print( "\tBack-Propagation" )
    # Step 1: For the Output Layer
    dZOP = l_dAllZs[ -1 ]
    dAOP = l_dAllAs[ -1 ]
    dZDash = SigmoidDerivative( dZOP )
    dDelta = CostDerivative( dAOP, dTarget ) * dZDash
    dGradB = dDelta
    adAllGradBs[ -1 ] = dGradB

    # Since the last hidden layer has only one neuron, no need to take transpose.
    dAPrevTranspose = Transpose( l_dAllAs[ -2 ] )
    dGradW = np.dot( dDelta, dAPrevTranspose )
    adAllGradWs[ -1 ] = dGradW
    ## print( "\t* Layer: %d\n\t\tGradB: %lf, GradW: %lf" % (nLayers-1, dGradB, dGradW) )

    # Step 2: For all the hidden layers
    for nL in range( 2, nLayers ):
        dZCur = l_dAllZs[ -nL ]
        dZCurDash = SigmoidDerivative( dZCur )
        dWNext = adAllWs[ -nL+1 ]
        dWNextTranspose = Transpose( dWNext )
        dDot = np.dot( dWNextTranspose, dDelta ) 
        dDelta = dDot * dZCurDash
        dGradB = dDelta
        adAllGradBs[ -nL ] = dGradB

        dAPrev = l_dAllAs[ -nL-1 ]
        dAPrevTrans = Transpose( dAPrev )
        dGradWCur = np.dot( dDelta, dAPrevTrans )
        adAllGradWs[ -nL ] = dGradWCur

        ## print( "\tLayer: %d\n\t\tGradB: %lf, GradW: %lf" % (nLayers-nL, dGradB, dGradW) )

    return ( adAllGradBs, adAllGradWs )

def PlotLayerwiseActivations( c, l_dAllAs, dTarget ):
    plt.subplot( 1, 2, 1 ).clear()
    dPredicted = l_dAllAs[ -1 ]
    sDesc = "Activations at Layers. Case: %3d\nPredicted: %lf, Target: %lf" % (c, dPredicted, dTarget) 
    plt.xlabel( "Layers" )
    plt.ylabel( "Activations (Input and Output)" )
    plt.title( sDesc )
    
    nLayers = len( l_dAllAs )
    dES = 0.2	# Extra space, in inches
    plt.axis( [-dES, float(nLayers) -1.0 + dES, -dES, 1.0+dES] )
    
    # Plot a vertical line at the input layer, just to show variations
    plt.plot( (0,0), (0,1), "grey" )
    
    # Plot the dots for the input and hidden layers
    for i in range( nLayers-1 ):
        plt.plot( i, l_dAllAs[ i ], 'go' )
    # Plot the dots for the output layer
    plt.plot( nLayers-1, dPredicted, 'bo' )
    plt.plot( nLayers-1, dTarget, 'ro' )
    
def PlotGradDescent( c, dOrigB, dOrigW, dB, dW ):
    plt.subplot( 1, 2, 2 ).clear()
    
    d = 5.0
    ContourSurface( d )
    plt.axis( [-d, d, -d, d] ) 
    plt.plot( dOrigB, dOrigW, 'bo' )
    plt.plot( dB, dW, 'ro' )
    plt.grid()
    plt.xlabel( "Biases" )
    plt.ylabel( "Weights" )
    sDesc = "Gradient Descent for the Output Layer.\n" \
    "Case: %3d\nWeight: %lf, Bias: %lf" % (c, dW, dB) 
    plt.title( sDesc )
    
    
def ContourSurface( d ):
    nDivs = 10
    dDelta = d / nDivs
    w = np.arange( -d, d, dDelta )
    b = np.arange( -d, d, dDelta )
    W, B = np.meshgrid( w, b ) 
    A = Sigmoid( W + B )
    plt.imshow( A, interpolation='bilinear', origin='lower',
                cmap=plt.cm.Greys, # cmap=plt.cm.RdYlBu_r, 
                extent=(-d, d, -d, d), alpha=0.8 )
    CS = plt.contour( B, W, A )
    plt.clabel( CS, inline=1, fontsize=7 )

def PlotLayerWiseBiasesWeights( c, adOrigBs, adAllBs, adOrigWs, adAllWs, dPredicted, dTarget ):
    plt.clf()
    
    nComputeLayers = len( adOrigBs )
    plt.axis( [-0.2, nComputeLayers+0.7, -320.0, 320.0] )
    
    adBPct = GetPercentDiff( nComputeLayers, adAllBs, adOrigBs )
    adWPct = GetPercentDiff( nComputeLayers, adAllWs, adOrigWs )
    print( "Case: %3d" \
    "\nPercent Changes in Biases:\n%s" \
    "\nPercent Changes in Weights:\n%s\n" \
     % (c, adBPct, adWPct)  )
    adx = np.linspace( 0.0, nComputeLayers-1, nComputeLayers )
    plt.plot( adx + 1.0, adWPct, 'ro' )
    plt.plot( adx + 1.15, adBPct, 'bo' )
    plt.grid()
    plt.xlabel( "Layer Number" )
    plt.ylabel( "Percent Change in Weight (Red) and Bias (Blue)" )
    sTitle = "How most learning occurs only at an extreme layer\n" \
    "Percent Changes to Biases and Weights at Each Layer.\n" \
    "Training case: %3d, Target: %lf, Predicted: %lf" % (c, dTarget, dPredicted) 
    plt.title( sTitle )

def GetPercentDiff( n, adNow, adOrig ):
    adDiff = adNow - adOrig
    print( adDiff )
    adPct = np.zeros( n )
    dSmall = 1.0e-10
    if all( abs( adDiff ) ) > dSmall and all( abs(adOrig) ) > dSmall:
        adPct = adDiff / adOrig * 100.0
    return adPct


################################################################################
# The Main Script
################################################################################

dEta = 1.0 # The learning rate
nTrainingCases = 100
nTestCases = nTrainingCases // 5
adInput = GenerateDataRandom( nTrainingCases ) #, 0.0 )
adTest = GenerateDataRandom( nTestCases )

np.random.shuffle( adInput )
## print( "Data:\n %s" % (adInput) )

# Must be at least 2. Tested up to 10 layers.
nLayers = 2
# Just a single target! Keep it in the interval (0.0, 1.0), 
# i.e., excluding both the end-points of 0.0 and 1.0.

dTarget = 0.15

# The input layer has no biases or weights. Even the output layer 
# here has only one target, and hence, only one neuron.
# Hence, the weights matrix for all layers now becomes just a 
# vector.
# For visualization with a 2 layer-network, keep biases and weights 
# between [-4.0, 4.0]

# adAllBs, adAllWs = GenerateBiasesWeightsRandom( nLayers )
adAllBs, adAllWs = GenerateBiasesWeightsConstant( nLayers, 2.0, 2.0 )
dOrigB = adAllBs[-1]
dOrigW = adAllWs[-1]
adOrigBs = adAllBs.copy()
adOrigWs = adAllWs.copy()

## print( "Initial Biases\n", adAllBs )
## print( "Initial Weights\n", adAllWs )

plt.figure( figsize=(10,5) )
    
# Do the training...
# For each input-target pair,
for c in range( nTrainingCases ):
    dInput = adInput[ c ]
    ## print( "Case: %d. Input: %lf" % (c, dInput) )
    
    adAllGradBs = [ np.zeros( b.shape ) for b in adAllBs ]
    adAllGradWs = [ np.zeros( w.shape ) for w in adAllWs ]
    
    # Do the feed-forward, initialized to dA = dInput
    l_dAllZs, l_dAllAs = FeedForward( dInput )
    
    # Do the back-propagation
    adAllGradBs, adAllGradWs = BackPropagation( l_dAllZs, l_dAllAs )

    ## print( "Updating the network biases and weights" )
    adAllBs = [ dB - dEta * dDeltaB 
                for dB, dDeltaB in zip( adAllBs, adAllGradBs ) ]
    adAllWs = [ dW - dEta * dDeltaW 
                for dW, dDeltaW in zip( adAllWs, adAllGradWs ) ]
    
    ## print( "The updated network biases:\n", adAllBs )
    ## print( "The updated network weights:\n", adAllWs )

    if 2 == nLayers:
        PlotLayerwiseActivations( c, l_dAllAs, dTarget )
        dW = adAllWs[ -1 ]
        dB = adAllBs[ -1 ]
        PlotGradDescent( c, dOrigB, dOrigW, dB, dW )
    else:
        # Plot in case of many layers: Original and Current Weights, Biases for all layers
        # and Activations for all layers 
        dPredicted = l_dAllAs[ -1 ]
        PlotLayerWiseBiasesWeights( c, adOrigBs, adAllBs, adOrigWs, adAllWs, dPredicted, dTarget )
    plt.pause( 0.1 )

plt.show()

# Do the testing
print( "\nTesting..." )
for c in range( nTestCases ):
    
    dInput = adTest[ c ]
    print( "\tTest Case: %d, Value: %lf" % (c, dInput) )
    
    l_dAllZs, l_dAllAs = FeedForward( dInput )
    dPredicted = l_dAllAs[ -1 ]
    dDiff = dTarget - dPredicted
    dCost = 0.5 * dDiff * dDiff
    print( "\tInput: %lf, Predicted: %lf, Target: %lf, Difference: %lf, Cost: %lf\n" % (dInput, dPredicted, dTarget, dDiff, dCost) )

print( "Done!" )

 


Things you can try:

  • Change one or more of the following parameters, and see what happens:
    • Target value
    • Values of initial weights and biases
    • Number of layers
    • The learning rate, dEta
  • Change the cost function; e.g., try the linear function instead of the Sigmoid. Change the code accordingly.
  • Also, try to conceptually see what would happen when the number of neurons per layer is 2 or more…

Have fun!


A song I like:

(Marathi) “pahaaTe pahaaTe malaa jaag aalee”
Music and Singer: C. Ramchandra
Lyrics: Suresh Bhat

 

The quantum mechanical features of my laptop…

My laptop has developed certain quantum mechanical features after its recent repairs [^]. In particular, if I press the “power on” button, it does not always get “measured” into the “power-on” state.

That’s right. In starting the machine, it is not possible to predict when the power-on button may work, when it may lead to an actual boot-up. Sometimes it does, sometimes it doesn’t.

For instance, the last time I shut it down was on the last night, just before dinner. Then, after dinner, when I tried to restart it, the quantum mechanical features kicked in and the associated randomness was such that it simply refused the request. Ditto, this morning. Ditto, early afternoon today. But now (at around 18:00 hrs on 09 October), it somehow got up and going!


Fortunately, I have taken backup of crucial data (though not all). So, I can afford to look at it with a sense of humour.

But still, if I don’t come back for a somewhat longer period of time than is usual (about 8–10 days), then know that, in all probability, I was just waiting helplessly in getting this thing repaired, once again. (I plan to take it to the repairsman tomorrow morning.) …

…The real bad part isn’t this forced break in browsing or blogging. The real bad part is: my inability to continue with my ANN studies. It’s not possible to maintain any tempo in studies in this now-on-now-off sort of a manner—i.e., when the latter is not chosen by you.

Yes, I do like browsing, but once I get into the mood of studying a new topic (and, BTW, just reading through pop-sci articles does not count as studies) and especially if the studies also involve programming, then having these forced breaks is really bad. …

Anyway, bye for now, and take care.


PS: I added that note on browsing and then it struck me. Check out a few resources while I am gone and following up with the laptop repairs (and no links because right while writing this postscript, the machine crashed, and so I am somehow completing it using smartphone—I hate this stuff, I mean typing using at most two fingers, modtly just one):

  1. As to Frauchiger and Renner’s controversial much-discussed result, Chris Lee’s account at ArsTechnica is the simplest to follow. Go through it before any other sources/commentaries, whether to the version published recently in Nature Comm. or the earlier ones, since 2016.
  2. Carver Mead’s interview in the American Spectator makes for an interesting read even after almost two decades.
  3. Vinod Khosla’s prediction in 2017 that AI will make radiologists obsolete in 5 years’ time. One year is down already. And that way, the first time he made remarks to that sort of an effect were some 6+ years ago, in 2012!
  4. As to AI’s actual status today, see the Quanta Magazine article: “Machine learning confronts the elephant in the room” by Kevin Hartnett. Both funny and illuminating (esp. if you have some idea about how ML works).
  5. And, finally, a pretty interesting coverage of something about which I didn’t have any idea beforehand whatsoever: “New AI strategy mimics how brains learn to smell” by Jordana Cepelwicz in Quanta Mag.

Ok. Bye, really, for now. See you after the laptop begins working.


A Song I Like:
Indian, instrumental: Theme song of “Malgudi Days”
Music: L. Vaidyanathan

 

 

Machine “Learning”—An Entertainment [Industry] Edition

Yes, “Machine ‘Learning’,” too, has been one of my “research” interests for some time by now. … Machine learning, esp. ANN (Artificial Neural Networks), esp. Deep Learning. …

Yesterday, I wrote a comment about it at iMechanica. Though it was made in a certain technical context, today I thought that the comment could, perhaps, make sense to many of my general readers, too, if I supply a bit of context to it. So, let me report it here (after a bit of editing). But before coming to my comment, let me first give you the context in which it was made:


Context for my iMechanica comment:

It all began with a fellow iMechanician, one Mingchuan Wang, writing a post of the title “Is machine learning a research priority now in mechanics?” at iMechanica [^]. Biswajit Banerjee responded by pointing out that

“Machine learning includes a large set of techniques that can be summarized as curve fitting in high dimensional spaces. [snip] The usefulness of the new techniques [in machine learning] should not be underestimated.” [Emphasis mine.]

Then Biswajit had pointed out an arXiv paper [^] in which machine learning was reported as having produced some good DFT-like simulations for quantum mechanical simulations, too.

A word about DFT for those who (still) don’t know about it:

DFT, i.e. Density Functional Theory, is “formally exact description of a many-body quantum system through the density alone. In practice, approximations are necessary” [^]. DFT thus is a computational technique; it is used for simulating the electronic structure in quantum mechanical systems involving several hundreds of electrons (i.e. hundreds of atoms). Here is the obligatory link to the Wiki [^], though a better introduction perhaps appears here [(.PDF) ^]. Here is a StackExchange on its limitations [^].

Trivia: Kohn and Sham received a Physics Nobel for inventing DFT. It was a very, very rare instance of a Physics Nobel being awarded for an invention—not a discovery. But the Nobel committee, once again, turned out to have put old Nobel’s money in the right place. Even if the work itself was only an invention, it did directly led to a lot of discoveries in condensed matter physics! That was because DFT was fast—it was fast enough that it could bring the physics of the larger quantum systems within the scope of (any) study at all!

And now, it seems, Machine Learning has advanced enough to be able to produce results that are similar to DFT, but without using any QM theory at all! The computer does have to “learn” its “art” (i.e. “skill”), but it does so from the results of previous DFT-based simulations, not from the theory at the base of DFT. But once the computer does that—“learning”—and the paper shows that it is possible for computer to do that—it is able to compute very similar-looking simulations much, much faster than even the rather fast technique of DFT itself.

OK. Context over. Now here in the next section is my yesterday’s comment at iMechanica. (Also note that the previous exchange on this thread at iMechanica had occurred almost a year ago.) Since it has been edited quite a bit, I will not format it using a quotation block.


[An edited version of my comment begins]

A very late comment, but still, just because something struck me only this late… May as well share it….

I think that, as Biswajit points out, it’s a question of matching a technique to an application area where it is likely to be of “good enough” a fit.

I mean to say, consider fluid dynamics, and contrast it to QM.

In (C)FD, the nonlinearity present in the advective term is a major headache. As far as I can gather, this nonlinearity has all but been “proved” as the basic cause behind the phenomenon of turbulence. If so, using machine learning in CFD would be, by the simple-minded “analysis”, a basically hopeless endeavour. The very idea of using a potential presupposes differential linearity. Therefore, machine learning may be thought as viable in computational Quantum Mechanics (viz. DFT), but not in the more mundane, classical mechanical, CFD.

But then, consider the role of the BCs and the ICs in any simulation. It is true that if you don’t handle nonlinearities right, then as the simulation time progresses, errors are soon enough going to multiply (sort of), and lead to a blowup—or at least a dramatic departure from a realistic simulation.

But then, also notice that there still is some small but nonzero interval of time which has to pass before a really bad amplification of the errors actually begins to occur. Now what if a new “BC-IC” gets imposed right within that time-interval—the one which does show “good enough” an accuracy? In this case, you can expect the simulation to remain “sufficiently” realistic-looking for a long, very long time!

Something like that seems to have been the line of thought implicit in the results reported by this paper: [(.PDF) ^].

Machine learning seems to work even in CFD, because in an interactive session, a new “modified BC-IC” is every now and then is manually being introduced by none other than the end-user himself! And, the location of the modification is precisely the region from where the flow in the rest of the domain would get most dominantly affected during the subsequent, small, time evolution.

It’s somewhat like an electron rushing through a cloud chamber. By the uncertainty principle, the electron “path” sure begins to get hazy immediately after it is “measured” (i.e. absorbed and re-emitted) by a vapor molecule at a definite point in space. The uncertainty in the position grows quite rapidly. However, what actually happens in a cloud chamber is that, before this cone of haziness becomes too big, comes along another vapor molecule, and “zaps” i.e. “measures” the electron back on to a classical position. … After a rapid succession of such going-hazy-getting-zapped process, the end result turns out to be a very, very classical-looking (line-like) path—as if the electron always were only a particle, never a wave.

Conclusion? Be realistic about how smart the “dumb” “curve-fitting” involved in machine learning can at all get. Yet, at the same time, also remain open to all the application areas where it can be made it work—even including those areas where, “intuitively”, you wouldn’t expect it to have any chance to work!

[An edited version of my comment is over. Original here at iMechanica [^]]


 

“Boy, we seem to have covered a lot of STEM territory here… Mechanics, DFT, QM, CFD, nonlinearity. … But where is either the entertainment or the industry you had promised us in the title?”

You might be saying that….

Well, the CFD paper I cited above was about the entertainment industry. It was, in particular, about the computer games industry. Go check out SoHyeon Jeong’s Web site for more cool videos and graphics [^], all using machine learning.


And, here is another instance connected with entertainment, even though now I am going to make it (mostly) explanation-free.

Check out the following piece of art—a watercolor landscape of a monsoon-time but placid sea-side, in fact. Let me just say that a certain famous artist produced it; in any case, the style is plain unmistakable. … Can you name the artist simply by looking at it? See the picture below:

A sea beach in the monsoons. Watercolor.

If you are unable to name the artist, then check out this story here [^], and a previous story here [^].


A Song I Like:

And finally, to those who have always loved Beatles’ songs…

Here is one song which, I am sure, most of you had never heard before. In any case, it came to be distributed only recently. When and where was it recorded? For both the song and its recording details, check out this site: [^]. Here is another story about it: [^]. And, if you liked what you read (and heard), here is some more stuff of the same kind [^].


Endgame:

I am of the Opinion that 99% of the “modern” “artists” and “music composers” ought to be replaced by computers/robots/machines. Whaddya think?

[Credits: “Endgame” used to be the way Mukul Sharma would end his weekly Mindsport column in the yesteryears’ Sunday Times of India. (The column perhaps also used to appear in The Illustrated Weekly of India before ToI began running it; at least I have a vague recollection of something of that sort, though can’t be quite sure. … I would be a school-boy back then, when the Weekly perhaps ran it.)]