Equations in the matrix form for implementing simple artificial neural networks

(Marathi) हुश्श्… [translit.: “hushsh…”, equivalent word prevalent among the English-speaking peoples: “phewww…”]

I’ve completed the first cut in writing a document of the same title as that of this post. I wrote it in LaTeX. (Too many equations!)

I’ve just uploaded the PDF file at my GitHub account, here [^]. Remember, it’s still only in the alpha stage. (A beta release will follow after a few days. The final release may take place after a couple of weeks or so.)

Below the fold, I copy-paste the abstract and the preface of this document.

“Equations in the matrix form for implementing simple artificial neural networks”


This document presents the basic equations in reference to which artificial neural networks are designed and implemented. The scope is restricted to
the simpler feed-forward networks, including those having hidden layers. Convolutional and recurrent networks are out of the scope.

Equations are often initially noted using an index-based notation for the typical element. However, all the equations are eventually cast in the direct
matrix form, using a consistent set of notation. Some of the minor aspects of notation were invented to make the presentation as simple and direct as

The presentation here regards a layer as the basic unit. The term “layer” is understood in the same sense in which APIs of modern libraries like
TensorFlow-Keras 2.x take it. The presentation here is detailed enough that neural networks with hidden layers could be implemented, starting from
the scratch.


Raison d’être:

I wrote this document mainly for myself, to straighten out the different notations and formulae used in different sources and contexts.

In particular, I wanted to have a document that better matches the design themes used in today’s libraries (like TensorFlow-Keras 2.x) than the description in the text-books.

For instance, in many sources, the input layer is presented as consisting of both a fully connected layer and its corresponding activation layer. However, for flexibility, libraries like TF-Keras 2.x treat them as separate layers.

Also, some sources uniformly treat the input of any layer as \vec{X}, and output of any layer as activation, \vec{a} , but such usage overloads the term “activation”. Confusions also creep in because different conventions exist: treating the bias by expanding the input vector with 1 and the weights matrix with w_0 ; the “to–from” vs “from–to” convention for the weights matrix, etc.

I wanted to have a consistent notation that dealt with all such issues with a uniform, matrix-based notation that came as close to the numpy ndarray interface as possible.

Level of coverage:

The scope here is restricted to the simplest ANNs, including the simplest DL networks. Convolutional neural networks and recurrent neural networks are out of the scope.

Yet, this document wouldn’t make for a good tutorial for a complete beginner; it is likely to confuse him more than explaining anything to him. So, if you are completely new to ANNs, it is advisable to go through sources like Nielsen’s online book [^] to learn the theory of ANNs. Mazur’s fully worked out example of the back-propagation algorithm [^] should also prove to be very helpful,  before returning back to this document.

If you already know ANNs, and don’t want to see equations in the fully expanded forms—or, plain dislike the notation used here—then a good reference, roughly at the same level as this document, is the set of write-ups/notes by Mallya [^].


Any feedback, especially that regarding errors, typos, inconsistencies in notation, suggestions for improvements, etc., will be thankfully received.

How to cite this document:

TBD at the time of the final release version.

Further personal notings:

I began writing this document on 24 January 2020. By 30 January 2020, I had some 11 pages done up, which I released via the last post.

Unfortunately, it was too tentative, with lot of errors, misleading or inconsistent notation, etc. So, I deleted it immediately within a day. No point in having premature documents floating around in the cyberspace.

I had mentioned, right in the last post here on this blog (on 30 January 2020), that the post itself also would be gone. I will keep it for a while, and then, may be after a week or two, delete it.

Anyway, by the time I finished the alpha version today, the document had grown from the initial 11 pages to some 38 pages!

Typing out all the braces, square brackets, parentheses, subscripts for indices, subscripts for sizes of vectors and matrices… It all was tedious. … Somehow, I managed to finish it. (Will think twice before undertaking a similar project, but am already tempted to write a document each on CNNs and RNNs, too!)

Anyway, let me take a break for a while.

If interested in ANNs, please go through the document and let me have your feedback. Thanks in advance, take care, and bye for now.

A song I like:

[Just listen to Lata here! … Not that others don’t get up to the best possible levels, but still, Lata here is, to put it simply, heavenly! [BTW, the song is from 1953.]]

(Hindi) जाने न नजर पहचाने जिगर (“jaane naa najar pahechane jigar”)
Singers: Lata and Mukesh
Music: Shankar-Jaikishen
Lyrics: Hasrat Jaipuri


Equations using the matrix notation for a simple artificial neural network

Update on 2020.01.30 16:58 IST:

I have taken the document offline. Yes, it was too incomplete, tentative, and in fact also had errors. The biggest and most obvious error was about the error vector. 🙂

No, no one pointed any of the errors or flaws to me.

Yes, I will post an expanded and revised version later, hopefully in the first week of February. (I started work on this document on 24th Jan., but also was looking into other issues.) When I am done, I will delete this entire post, and make a new entry to announce the availability of the corrected, expanded and revised document.

The original post appears below.

Go, read [^].

Let me know about the typo’s. Also, errors. [Though I don’t expect you to do that. [I will eat my estimate of your moral character, on this count, at least for the time being.]]

I can, and might, take it out of the published domain, any time I want. [Yes, I am irresponsible, careless, unreliable, etc. [Also, “imposing” type. Without “team-spirit”. One who looks down on his colleagues as being way, way, inferior to me.]]

I will also improve on it—I mean the document. In fact, I even intend to expand it, with some brief notes to be added on various activation- and loss-functions.

Eventually, I may even publish it at GitHub. Or, at arXiv. [If they let me do that. But then, another consideration: there are physicists there!]


A song I like:

(Hindi) “कभी तो मिलेगी” (kabhee to milegi)
Singer: Lata
Music: Roshan
Lyrics: Majrooh Sultanpuri

[Credits happily listed in a mostly random order. [The issue was only with the last two; it was clear which one had to appear first.]



A recruiter calls me to talk about a Data Science position in Pune…

A recruiter calls me this morning, from Hyderabad, all unexpectedly. No emails beforehand, no recruiter messages at a jobs-site, no SMSs, nothing. Just a direct call. They are considering me for a Data Science position, in Pune. She says it’s a position about Data Science and Python.

Asks about my total and relevant experience. I tell: 23 years in all, ~12 years in s/w development. She asks about my Python experience. I tell: Familiarity for, may be, 10 years if not more; actual use for, may be, 5–6 years. (Turns out to be since 2006, and since at least 2013–14 times, in connection with scripting while using the open-source FEM libraries, respectively.)

She then asks me about my data science experience.

I tell that I’ve been into it for about a year by now, but no professional, paid experience as such. Also add that I do understand kernels from the Kaggle competitions. (In fact, I can think of bringing about meaningful variations in them too.)

She asks about my last job. I tell: Academia, recently, after PhD. (She sounds a bit concerned, may be confused. She must be looking at my resume.) But before that, I was in the software field, I say. And now, am now looking for a Data Science position. I then add: In the software development field, my last job was as a Systems Architect, reporting directly to the CEO. … By this time, she must have spotted this software experience listing in my resume. She says “OK,” with just a shade of a sense of satisfaction audible in the way she sounds.

She then again asks me about my Data Science experience. I now tell her directly: Paid experience, 0 (zero) years.

Hearing it, she keeps the phone down. Just like that. Without any concluding remarks. Not even just a veneer of a courtesey like a hurried “OK, if you are found suitable, we will get back to you” etc. Nothing. Not even that. No thanks, nothing.

She. Just. Keeps. The. Phone. Down.

It must be a project for one of those companies from America, especially from California, especially from the San Francisco Bay Area. Only they can be as dumbidiots* as that. And, they could very well be one of those “Capitalist”s, esp. Indians—there and here. “You are just as good as your performance on your last job!” Said sternly. And, the quote taken literally. In the current context, it is obviously taken to mean that I am as good as zero, when it comes to Data Science positions.

Dumbidiots*. Zeno’s descendents. They don’t deserve to hire me.

But these stupididiots* do amass a lot of money for themselves. Help build the nation. Etc.

Rich idiocy.

*By the rules of the Sanskrit grammar, this “sandhi” is correct. English is an Indo-European language. So, such a “sandhi” should be allowed. The jointed word means something like “k’mt’om” [^] “moorkha”. (You look up “moorkha”.)

A song I like:
(Hindi) “hum the, woh thee, aur, samaa rangeen…”
Singer: Kishore Kumar
Lyrics: Majrooh Sultanpuri
Music: S. D. Burman