# Non-Interview Questions on Data Science—Part 1

This entry is the first in a series of posts which will note some of the questions that no one will ever ask you during any interview for any position in the Data Science industry.

Naturally, if you ask for my opinion, you should not consider modifying these questions a bit and posting them as a part of your own post on Medium.com, AnalyticsVidhya, KDNuggets, TowardsDataScience, ComingFromDataScience, etc.

No, really! There would be no point in lifting these questions and posting them as if they were yours, because no one in the industry is ever going to get impressed by you because you raised them. … I am posting them here simply because… because “I am like that only.”

OK, so here is the first installment in this practically useless series. (I should know. I go jobless.)

(Part 1 mostly covers linear and logistic regression, and just a bit of probability.)

Q.1: Consider the probability theory. How are the following ideas related to each other?: random phenomenon, random experiment, trial, result, outcome, outcome space, sample space, event, random variable, and probability distribution. In particular, state precisely the difference between a result and an outcome, and between an outcome and an event.

Give a few examples of finite and countably infinite sample spaces. Give one example of a random variable whose domain is not the real number line. (Hint: See the Advise at the end of this post concerning which books to consult.)

Q.2: In the set theory, when a set is defined through enumeration, repeated instances are not included in the definition. In the light of this fact, answer the following question: Is an event a set? or is it just a primitive instance subsumed in a set? What precisely is the difference between a trial, a result of a trial, and an event? (Hint: See the Advise at the end of this post concerning which books to consult.)

Q.3: Select the best alternative: In regression for making predictions with a continuous target data, if a model is constructed in reference to the equation $y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3$, then:
(a) It is a sub-type of the linear regression model.
(b) It is a polynomial regression model.
(c) It is a nonlinear regression model because powers $> 1$ of the independent variable $x_i$ are involved.
(d) It is a nonlinear regression model because more than two $\beta_m$ terms are involved.
(e) Both (a) and (b)
(g) Both (b) and (c)
(f) Both (c) and (d)
(g) All of (b), (c), and (d)
(h) None of the above.
(Hint: Don’t rely too much on the textbooks being used by the BE (CS) students in the leading engineering colleges in Pune and Mumbai.)

Q.4: Consider a data-set consisting of performance of students on a class test. It has three columns: student ID, hours studied, and marks obtained. Suppose you decide to use the simple linear regression technique to make predictions.

Let’s say that you assume that the hours studied are the independent variable (predictor), and the marks obtained are the dependent variable (response). Making this assumption, you make a scatter plot, carry out the regression, and plot the regression line predicted by the model too.

The question now is: If you interchange the designations of the dependent and independent variables (i.e., if you take the marks obtained as predictors and the hours studied as responses), build a second linear model on this basis, and plot the regression line thus predicted, will it coincide with the line plotted earlier or not. Why or why not?

Repeat the question for the polynomial regression. Repeat the question if you include the simplest interaction term in the linear model.

Q.5: Draw a schematic diagram showing circles for nodes and straight-lines for connections (as in the ANN diagrams) for a binary logistic regression machine that operates on just one feature. Wonder why your text-book didn’t draw it in the chapter on the logistic regression.

Q.6: Suppose that the training input for a classification task consists of $r$ number of distinct data-points and $c$ number of features. If logistic regression is to be used for classification of this data, state the number of the unknown parameters there would be. Make suitable assumptions as necessary, and state them.

Q.7: Obtain (or write) some simple Python code for implementing from the scratch a single-feature binary logistic regression machine that uses the simple (non-stochastic) gradient descent method that computes the gradient for each row (batch-size of 1).

Modify the code to show a real-time animation of how the model goes on changing as the gradient descent algorithm progresses. The animation should depict a scatter plot of the sample data ($y$ vs. $x$) and not the parameters space ($\beta_0$ vs. $\beta_1$). The animation should highlight the data-point currently being processed in a separate color. It should also show a plot of the logistic function on the same graph.

Can you imagine, right before running (or even building) the animation, what kind of visual changes is the animation going to depict? how?

Q.8: What are the important advantage of the stochastic gradient descent method over the simple (non-stochastic) gradient descent?

Q.9: State true or false: (i) The output of the logistic function is continuous. (ii) The minimization of the cost function in logistic regression involves a continuous dependence on the undetermined parameters.

In the light of your answers, explain the reason why the logistic regression can at all be used as a classification mechanism (i.e. for targets that are “discrete”, not continuous). State only those axioms of the probability theory which are directly relevant here.

Q.10: Draw diagrams in the parameters-space for the Lasso regression and the Ridge regression. The question now is to explain precisely what lies inside the square or circular region. In each case, draw an example path that might get traced during the gradient descent, and clearly explain why the progress occurs the way it does.

Q.11: Briefly explain how the idea of the logistic regression gets applied in the artificial neural networks (ANNs). Suppose that a training data-set has $c$ number of features, $r$ number of data-rows, and $M$ number of output bins (i.e. classification types). Assuming that the neural network does not carry any hidden layers, calculate the number of logistic regressions that would be performed in a single batch. Make suitable assumptions as necessary.

Q.12: State the most prominent limitation of the gradient descent methods. State the name of any one technique which can overcome this limitation.

Advise: To answer the first two questions, don’t refer to the programming books. In fact, don’t even rely too much on the usual textbooks. Even Wasserman skips over the topic and Stirzaker is inadquate. Kreyszig is barely OK. A recommended text (more rigorous but UG-level, and brief) for this topic is: “An Introduction to Probability and Statistics” (2015) Rohatgi and Saleh, Wiley.

Awww… Still with me?

If you read this far, chances are very bright that you are really^{really} desperately looking for a job in the data science field. And, as it so happens, I am also a very, very kind hearted person. I don’t like to disappoint nice, ambitious… err… “aspiring” people. So, let me offer you some real help before you decide to close this page (and this blog) forever.

Here is one question they might actually ask you during an interview—especially if the interviewer is an MBA:

A question they might actually ask you in an interview: What are the three V’s of big data? four? five?

(Yes, MBA’s do know arithmetic. At least, it was there on their CAT / GMAT entrance exams. Yes, you can use this question for your posts on Medium.com, AnalyticsVidhya, KDNuggets, TowardsDataScience, ComingFromDataScience, etc.)

A couple of notes:

1. I might come back and revise the questions to make them less ambiguous or more precise.
2. Also, please do drop a line if any of the questions is not valid, or shows a poor understanding on my part—this is easily possible.

A song I like:

[Credits listed in a random order. Good!]

(Hindi) “mausam kee sargam ko sun…”
Music: Jatin-Lalit
Singer: Kavita Krishnamoorthy
Lyrics: Majrooh Sultanpuri

History:

First written: Friday 14 June 2019 11:50:25 AM IST.
Published online: 2019.06.16 12:45 IST.
The songs section added: 2019.06.16 22:18 IST.

/

# The Machine Learning as an Expert System

To cut a somewhat long story short, I think that I can “see” that Machine Learning (including Deep Learning) can actually be regarded as a rules-based expert system, albeit of a special kind.

I am sure that people must have written articles expressing this view. However, simple googling didn’t get me to any useful material.

I would deeply appreciate it if someone could please point out references in this direction. Thanks in advance.

BTW, here is a very neat infographic on AI: [^]; h/t [^]. … Once you finish reading it, re-read this post, please! Exactly once again, and only the first part—i.e., without recursion!. …

A song I like:

(Marathi) “visar preet, visar geet, visar bheT aapuli”
Music: Yashwant Dev
Lyrics: Shantaram Nandgaonkar