Equations in the matrix form for implementing simple artificial neural networks

(Marathi) हुश्श्… [translit.: “hushsh…”, equivalent word prevalent among the English-speaking peoples: “phewww…”]

I’ve completed the first cut in writing a document of the same title as that of this post. I wrote it in LaTeX. (Too many equations!)

I’ve just uploaded the PDF file at my GitHub account, here [^]. Remember, it’s still only in the alpha stage. (A beta release will follow after a few days. The final release may take place after a couple of weeks or so.)

Below the fold, I copy-paste the abstract and the preface of this document.


“Equations in the matrix form for implementing simple artificial neural networks”

Abstract:

This document presents the basic equations in reference to which artificial neural networks are designed and implemented. The scope is restricted to
the simpler feed-forward networks, including those having hidden layers. Convolutional and recurrent networks are out of the scope.

Equations are often initially noted using an index-based notation for the typical element. However, all the equations are eventually cast in the direct
matrix form, using a consistent set of notation. Some of the minor aspects of notation were invented to make the presentation as simple and direct as
possible.

The presentation here regards a layer as the basic unit. The term “layer” is understood in the same sense in which APIs of modern libraries like
TensorFlow-Keras 2.x take it. The presentation here is detailed enough that neural networks with hidden layers could be implemented, starting from
the scratch.


Preface:

Raison d’être:

I wrote this document mainly for myself, to straighten out the different notations and formulae used in different sources and contexts.

In particular, I wanted to have a document that better matches the design themes used in today’s libraries (like TensorFlow-Keras 2.x) than the description in the text-books.

For instance, in many sources, the input layer is presented as consisting of both a fully connected layer and its corresponding activation layer. However, for flexibility, libraries like TF-Keras 2.x treat them as separate layers.

Also, some sources uniformly treat the input of any layer as \vec{X}, and output of any layer as activation, \vec{a} , but such usage overloads the term “activation”. Confusions also creep in because different conventions exist: treating the bias by expanding the input vector with 1 and the weights matrix with w_0 ; the “to–from” vs “from–to” convention for the weights matrix, etc.

I wanted to have a consistent notation that dealt with all such issues with a uniform, matrix-based notation that came as close to the numpy ndarray interface as possible.

Level of coverage:

The scope here is restricted to the simplest ANNs, including the simplest DL networks. Convolutional neural networks and recurrent neural networks are out of the scope.

Yet, this document wouldn’t make for a good tutorial for a complete beginner; it is likely to confuse him more than explaining anything to him. So, if you are completely new to ANNs, it is advisable to go through sources like Nielsen’s online book [^] to learn the theory of ANNs. Mazur’s fully worked out example of the back-propagation algorithm [^] should also prove to be very helpful,  before returning back to this document.

If you already know ANNs, and don’t want to see equations in the fully expanded forms—or, plain dislike the notation used here—then a good reference, roughly at the same level as this document, is the set of write-ups/notes by Mallya [^].

Feedback:

Any feedback, especially that regarding errors, typos, inconsistencies in notation, suggestions for improvements, etc., will be thankfully received.

How to cite this document:

TBD at the time of the final release version.


Further personal notings:

I began writing this document on 24 January 2020. By 30 January 2020, I had some 11 pages done up, which I released via the last post.

Unfortunately, it was too tentative, with lot of errors, misleading or inconsistent notation, etc. So, I deleted it immediately within a day. No point in having premature documents floating around in the cyberspace.

I had mentioned, right in the last post here on this blog (on 30 January 2020), that the post itself also would be gone. I will keep it for a while, and then, may be after a week or two, delete it.

Anyway, by the time I finished the alpha version today, the document had grown from the initial 11 pages to some 38 pages!

Typing out all the braces, square brackets, parentheses, subscripts for indices, subscripts for sizes of vectors and matrices… It all was tedious. … Somehow, I managed to finish it. (Will think twice before undertaking a similar project, but am already tempted to write a document each on CNNs and RNNs, too!)

Anyway, let me take a break for a while.

If interested in ANNs, please go through the document and let me have your feedback. Thanks in advance, take care, and bye for now.


A song I like:

[Just listen to Lata here! … Not that others don’t get up to the best possible levels, but still, Lata here is, to put it simply, heavenly! [BTW, the song is from 1953.]]

(Hindi) जाने न नजर पहचाने जिगर (“jaane naa najar pahechane jigar”)
Singers: Lata and Mukesh
Music: Shankar-Jaikishen
Lyrics: Hasrat Jaipuri