Loitering around…

Update:

OK. I am getting back to working on the remaining topics, in particular, taking down detailed notes on the QM spin. I plan to begin this activity starting this evening. Also, I can now receive queries, from “any” one, regarding my work on QM, including the bit mentioned in the post below. [The meaning of ` “any” one ‘ is explained below.]

[2021.03.24 13:17 IST]


Am just about completing one full day of plain loitering around, doing nothing.

No, of course, it couldn’t possibly have been literally nothing—whether of the शून्य (“shoonya”) variety, or the शुन्य (“shunya”) one. (Go consult a good Sanskrit dictionary for the subtle differences in the meaning of these two terms.)

So, what I mean to say by “doing nothing” is this:

The last entry in my research journal has the time-stamp of 2021.03.18 21:40:34 IST. So, by now, it’s almost like a full day of doing “nothing” for me.

It’s actually worse than that…

In fact, I started loitering around, including on the ‘net, even earlier, i.e., a few days ago. May be from 16th March, may be earlier. However, my journal pages (still loose, still not filed into the plastic folder) do show some entries, which get shorter and shorter, well until the above time-stamp. …The entry before the afore-mentioned has the time-stamp: 2021.03.18 19:12:52 IST.

But not a single entry over the past whole day.


So, what did I do over the last one day? and also, over a few days before it?

Well, the list is long.

I browsed. (Yes, including Twitter, Instagram, and FaceBook—others’ accounts, of course!)

I also downloaded a couple of research papers, and one short history-like book. I generally tried to read through them. Unsurprisingly, I found that I could not. The reason is: I just don’t have any mental energy left to do anything meaningful.

Apparently, I have exhausted all my energy in thinking about the linear momentum operator.

I think that by now I have thought about this one topic in most every way any human being possibly could. At least in parts (i.e., one part taken at a time). I have analyzed and re-analyzed, and re-re-analyzed. I kept on jotting down my “thoughts” (also in a way that would be mostly undecipherable to any one).

I kept getting exhausted, and still, I kept pushing myself. I kept on analyzing. Going back to my earlier thoughts, refining them and/or current thoughts. Et cetera.

In the end, finally, I reached the point where I couldn’t even push myself any longer—in spite of all my stamina to keep pursuing threads in “thinking”. I’ve some stories to share here, but some other time. …To cut all of them long stories short:

Some 12 hours after I thus fully crashed out of all my mental energies, at some moment, I somehow realized that:

I had already built a way to visualize a path in between my approach and the mainstream QM, regarding the linear momentum operator.

I made a briefest possible entry (consisting of exactly one small sketch over some 2″ by 5″ space). Which was at 2021.03.18 21:40:34 IST.

Then, I stopped pursuing it too.

Why bother? Especially when I can now visualize “it” any time I want?


But how good is it?

I think, it should work. But it also appears to be too shaky and too tenuous a connection to me—between the mainstream QM and my new approach.

Of course I’ve noted down a bit of maths to go with it too, and also the physical units for the quantities involved. Yet, two points remain:

As a relatively minor point: I haven’t had the energy to work out (let alone to do even the quick and dirty simulations for) all possible permutations and combinations of the kind of elements I am dealing with. So, there is a slim possibility that terms may cancel each other and so the formulation may not turn out to be general enough. (I’ve been fighting with such a thing for a long time by now.)

But as a relatively much more important point: As I said, this whole way of thinking about it seems too tenuous to me. Even if it works out OK (i.e., after considering all the permutations and combinations involved), this very way of looking at the things would still look at best tenuous to any one.

The only consolation I have is this idea (which had already become absolutely banal even decades ago):

Every thing about QM is different from the pre-quantum theories.

That’s the only thin thread of logic by which I my ideas hang. … Not as good as I wanted it. But not as bad as hanging all loose either…

And, yes, I’ve thought through the ontological aspects as well. … The QM ontology is radically different from the ontologies of all the pre-quantum theories. Especially, that of NM (Newtonian mechanics of particles and rigid bodies). But it is not so radically different from the ontology already required for EM (the Maxwell-Lorentz electrodynamics)—though there is a lot of difference between the EM and the QM ontologies.

And that’s what the current status looks like.


“So, when do you plan to publish it?”

Ummm… Not a good question. A better question, for me, is this:

What do I propose to do with my time, now?

The answer is simple. I will go in for what I know is going to be the most productive route.

Which is: I am going to continue loitering around.

Then, I will begin with taking detailed notes on the QM spin—the next topic from the mainstream QM—as soon as my mental energy returns.

That’s right. I won’t be even considering writing down my thoughts about that goddamn linear momentum operator. Not for any time in the near future. That’s the only way to optimize productivity. My productivity, that is.

So, sorry, I won’t be writing anything on the linear momentum any time soon, even if it precisely was the topic that kept me pre-occupied for such a long time—and also formed the topic of blogging for quite some over the recent past. So, sorry, this entire blog-post (the present one) is going to remain quite vague to you, for quite some time. You even might feel cheated at this point.

Well, but I do have a strong defence from my side: I’ve always said, time and once again, that I was always ready to share all my thoughts to “any” one. I mean, any one who (i) knows the theory of the mainstream QM (including its foundational issues), and (ii) also has looked into the experimental aspects of it (at least in the schematic form.)

So, any such a person can always drop a line to me.

Oh wait!

Don’t write anything to me right away. Hold on for a few days. I just want to kill my time around for now. That’s why.

I’ll let you know (may be via an update here), once I begin actually taking down my notes on the QM spin. That’s the time you—“you” the “any” one—may get in touch with me. That is, if “you” want to know what I’ve thought about that goddamn linear momentum operator. [OK. As the update at the top of the post indicates, now I’m ready.]

OK, bye for now, take care in the meanwhile, and don’t be surprised if I also visit your blog and all…


A Many songs I like:

[I also listened to a lot of songs over the past few days. I couldn’t find a single song that went very well with any one of my overall moods over the past few days… So, don’t try to read too much into this choice. And, I’ve got bored, so I won’t offer any further comment on this song either. (And, one way or the other, I actually don’t know why I like this song or the extent to which I actually like it. Not as of now, any way!)

(Hindi) जनम जनम का साथ है निभाने को (“janam janam kaa saath hai nibhaane ko”)
Singer: Mohammad Rafi
Music: Shankar-Jaikishan
Lyrics: Hasrat Jaipuri

I could not find a good quality original audio track. The “revival” version is here: [^]. It was this version which I first listened to, and used to listen to, while taking leisurely evening drives (for up to, say, 50 miles almost every day) in the area around Santa Rosa. Which was in California. But it didn’t feel that way. (It also was the home town of the “Peanuts” comics creator.) …

…OK, I will throw in one more:

(Marathi) तूं तेव्हा तशी (“too, tevhaa tashee”)
Music and Singer: Pt. Hridayanath Mangeshkar
Lyrics: Aaratee Prabhu

Which is yet another poem by Aaratee Prabhu, converted into a song by Hridayanath. But I won’t be able to talk about it. Not as of today anyway. Listening is good. A good quality audio is here [^].

…And, since I have been listening to songs a lot over the past few days, one more, just for this time around…

(Western, Pop) “How deep is your love”
Band: Bee Gees

I don’t know what the “Official Video” means, but it is here: [^]. I also don’t know what the “Deluxe Edition” of the audio means, but it’s here [^]. … I always happened to listen to the audio, which was, you know, at many places in Pune like in the `H’ club (of the student-run mess at COEP hostels); at the movie theatres running English movies in Pune (like Rahul, the old West-End, and Alka); most all restaurants from the Pune Camp area (and also a few from the Deccan area); also in the IIT Madras hostels; etc. All of this was during the ’80s, only. I don’t know why, but seems like I never came across this song, even at any of these places, once it was ’90s. … As usual, I didn’t even know the words, and so, couldn’t have searched for it. A few days ago, I was just going through a compilation of songs of ’70s when I spotted this one, and then searched on its lyrics and credits and all. I had remembered—and actually known—only the music… But yes, now that I know them, the words too seem pretty good…

Anyway, enough is enough. I already wrote a lot!  High time to go back to doing nothing…
]


History:
2021.03.19 22:27 IST: Originally published.
2021.03.24 13:25 IST: Update noted at the top of the post and also inline. Some minor corrections/editing.

 

 

Non-Interview Questions on Data Science—Part 1

This entry is the first in a series of posts which will note some of the questions that no one will ever ask you during any interview for any position in the Data Science industry.

Naturally, if you ask for my opinion, you should not consider modifying these questions a bit and posting them as a part of your own post on Medium.com, AnalyticsVidhya, KDNuggets, TowardsDataScience, ComingFromDataScience, etc.

No, really! There would be no point in lifting these questions and posting them as if they were yours, because no one in the industry is ever going to get impressed by you because you raised them. … I am posting them here simply because… because “I am like that only.”

OK, so here is the first installment in this practically useless series. (I should know. I go jobless.)

(Part 1 mostly covers linear and logistic regression, and just a bit of probability.)


Q.1: Consider the probability theory. How are the following ideas related to each other?: random phenomenon, random experiment, trial, result, outcome, outcome space, sample space, event, random variable, and probability distribution. In particular, state precisely the difference between a result and an outcome, and between an outcome and an event.

Give a few examples of finite and countably infinite sample spaces. Give one example of a random variable whose domain is not the real number line. (Hint: See the Advise at the end of this post concerning which books to consult.)


Q.2: In the set theory, when a set is defined through enumeration, repeated instances are not included in the definition. In the light of this fact, answer the following question: Is an event a set? or is it just a primitive instance subsumed in a set? What precisely is the difference between a trial, a result of a trial, and an event? (Hint: See the Advise at the end of this post concerning which books to consult.)


Q.3: Select the best alternative: In regression for making predictions with a continuous target data, if a model is constructed in reference to the equation y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3, then:
(a) It is a sub-type of the linear regression model.
(b) It is a polynomial regression model.
(c) It is a nonlinear regression model because powers > 1 of the independent variable x_i are involved.
(d) It is a nonlinear regression model because more than two \beta_m terms are involved.
(e) Both (a) and (b)
(g) Both (b) and (c)
(f) Both (c) and (d)
(g) All of (b), (c), and (d)
(h) None of the above.
(Hint: Don’t rely too much on the textbooks being used by the BE (CS) students in the leading engineering colleges in Pune and Mumbai.)


Q.4: Consider a data-set consisting of performance of students on a class test. It has three columns: student ID, hours studied, and marks obtained. Suppose you decide to use the simple linear regression technique to make predictions.

Let’s say that you assume that the hours studied are the independent variable (predictor), and the marks obtained are the dependent variable (response). Making this assumption, you make a scatter plot, carry out the regression, and plot the regression line predicted by the model too.

The question now is: If you interchange the designations of the dependent and independent variables (i.e., if you take the marks obtained as predictors and the hours studied as responses), build a second linear model on this basis, and plot the regression line thus predicted, will it coincide with the line plotted earlier or not. Why or why not?

Repeat the question for the polynomial regression. Repeat the question if you include the simplest interaction term in the linear model.


Q.5: Draw a schematic diagram showing circles for nodes and straight-lines for connections (as in the ANN diagrams) for a binary logistic regression machine that operates on just one feature. Wonder why your text-book didn’t draw it in the chapter on the logistic regression.


Q.6: Suppose that the training input for a classification task consists of r number of distinct data-points and c number of features. If logistic regression is to be used for classification of this data, state the number of the unknown parameters there would be. Make suitable assumptions as necessary, and state them.


Q.7: Obtain (or write) some simple Python code for implementing from the scratch a single-feature binary logistic regression machine that uses the simple (non-stochastic) gradient descent method that computes the gradient for each row (batch-size of 1).

Modify the code to show a real-time animation of how the model goes on changing as the gradient descent algorithm progresses. The animation should depict a scatter plot of the sample data (y vs. x) and not the parameters space (\beta_0 vs. \beta_1). The animation should highlight the data-point currently being processed in a separate color. It should also show a plot of the logistic function on the same graph.

Can you imagine, right before running (or even building) the animation, what kind of visual changes is the animation going to depict? how?


Q.8: What are the important advantage of the stochastic gradient descent method over the simple (non-stochastic) gradient descent?


Q.9: State true or false: (i) The output of the logistic function is continuous. (ii) The minimization of the cost function in logistic regression involves a continuous dependence on the undetermined parameters.

In the light of your answers, explain the reason why the logistic regression can at all be used as a classification mechanism (i.e. for targets that are “discrete”, not continuous). State only those axioms of the probability theory which are directly relevant here.


Q.10: Draw diagrams in the parameters-space for the Lasso regression and the Ridge regression. The question now is to explain precisely what lies inside the square or circular region. In each case, draw an example path that might get traced during the gradient descent, and clearly explain why the progress occurs the way it does.


Q.11: Briefly explain how the idea of the logistic regression gets applied in the artificial neural networks (ANNs). Suppose that a training data-set has c number of features, r number of data-rows, and M number of output bins (i.e. classification types). Assuming that the neural network does not carry any hidden layers, calculate the number of logistic regressions that would be performed in a single batch. Make suitable assumptions as necessary.

Does your answer change if you consider the multinomial logistic regression?


Q.12: State the most prominent limitation of the gradient descent methods. State the name of any one technique which can overcome this limitation.


Advise: To answer the first two questions, don’t refer to the programming books. In fact, don’t even rely too much on the usual textbooks. Even Wasserman skips over the topic and Stirzaker is inadquate. Kreyszig is barely OK. A recommended text (more rigorous but UG-level, and brief) for this topic is: “An Introduction to Probability and Statistics” (2015) Rohatgi and Saleh, Wiley.


Awww… Still with me?

If you read this far, chances are very bright that you are really^{really} desperately looking for a job in the data science field. And, as it so happens, I am also a very, very kind hearted person. I don’t like to disappoint nice, ambitious… err… “aspiring” people. So, let me offer you some real help before you decide to close this page (and this blog) forever.

Here is one question they might actually ask you during an interview—especially if the interviewer is an MBA:

A question they might actually ask you in an interview: What are the three V’s of big data? four? five?

(Yes, MBA’s do know arithmetic. At least, it was there on their CAT / GMAT entrance exams. Yes, you can use this question for your posts on Medium.com, AnalyticsVidhya, KDNuggets, TowardsDataScience, ComingFromDataScience, etc.)


A couple of notes:

  1. I might come back and revise the questions to make them less ambiguous or more precise.
  2. Also, please do drop a line if any of the questions is not valid, or shows a poor understanding on my part—this is easily possible.

 


A song I like:

[Credits listed in a random order. Good!]

(Hindi) “mausam kee sargam ko sun…”
Music: Jatin-Lalit
Singer: Kavita Krishnamoorthy
Lyrics: Majrooh Sultanpuri


History:

First written: Friday 14 June 2019 11:50:25 AM IST.
Published online: 2019.06.16 12:45 IST.
The songs section added: 2019.06.16 22:18 IST.

The seven books challenge—my list

“Accepted challenge to post covers of 7 books I love: no explanations, no reviews – just the cover”

You might have run into tweets of the above kind in the recent past. Here, I would like to accept that challenge. [Unlike those tweets, there is no “from” clause in the above sentence because no one actually challenged me to it! I just noticed this challenge in Ash Joglekar’s twitter feed, and decided to pick it up on my own!]


A few notes:

No reviews or explanations regarding the choices of books, but still, a few notes are due—e.g., why I supply only a list and not the snaps of the covers.

1. Many of my books still remain packed up in the movers-and-packers’ boxes. These boxes are kept tightly sticking to each other and right in front of the wall-cupboard that is full of even more books (stacked up several layers deep). Since there is no place elsewhere in the house, the boxes stay there—they cannot be opened because if they are, I don’t have the space to keep those books at some other place. Further, since the boxes are heavy, I cannot easily move them aside and reach into the cupboard either. In short, these days, most of my books happen to be physically inaccessible to me. (The apartment where we currently live is too small for us.) Unless there is a strong reason for reference, the books don’t get out; they just stay where they are.

Further, I don’t have paper copies for all the books that struck me when I took up this challenge, because a couple of them I only read in the university library (i.e. the Hill library of UAB), or later on, as PDF documents (not paper copies).

For all such reasons, instead of posting the covers, here, I will supply only the titles.

2. There were other books that had struck me even more preferentially, but I decided not to include them in this list here because they were in Marathi. Drop me a line if you wish to know which ones those were.

3. All in all, I spent roughly less than 2 minutes (possibly less than 1 minute) in getting to the following list. However, later on, I decided to re-arrange it in the chronological order in which I first ran into these books. The year of my first acquaintance with the book is given in the square brackets.


The list:

  • Introduction to Objectivist Epistemology, 1e, by Ayn Rand [1981]
  • Physics (the old paperback Indian ed. with yellow-and-black cover, in 2 volumes), by Resnick and Halliday [1984]
  • In Search of Schrodinger’s Cat, 1e, by John Gribbin (i.e., the cat book, not the kitten one) [1987 or 1988]
  • Mathematical Thought from Ancient to Modern Times (3 volumes), by Morris Kline [1992]
  • Twenty Cases Suggestive of Reincarnation, by Ian Stevenson [1993]
  • Computational Physics: Problem Solving with Python, by Landau, Paez and Bordeianu [2010 or so]
  • Quantum Chemistry, by Donald McQuarrie [2011]

Afterthoughts:

  • Since the initial posting, there is a change in one of the books. Now I list the 20 cases book by Ian Stevenson instead of his 4 volumes, because I now remember that the former was what I had completely read through; the latter I had only browsed through. … Hey, others get an entire day per book, OK?
  • On second thoughts, I wanted to have Quantum Chemistry by Donald McQuarrie [17 February 2011] in there. … So I have removed a CS book which used to appear on the list (viz., Structured Computer Organization, by Andrew Tanenbaum [1995]). In fact, since McQuarrie’s book is easily accessible to me right now, I am right away posting its cover here; see below.
  • … Guess I will have to post a second list some time later on! … I mean to say, there is no book of solid or fluid mechanics in there, none on CFD or FEM… And, none on so many other topics / other authors…

 

I guess the songs section is not really necessary for this post. So I will drop it for this time round.