Data Science links—1

Oakay… My bookmarks library has grown too big. Time to move at least a few of them to a blog-post. Here they are. … The last one is not on Data Science, but it happens to be the most important one of them all!



On Bayes’ theorem:

Oscar Bonilla. “Visualizing Bayes’ theorem” [^].

Jayesh Thukarul. “Bayes’ Theorem explained” [^].

Victor Powell. “Conditional probability” [^].


Explanations with visualizations:

Victor Powell. “Explained Visually.” [^]

Christopher Olah. Many topics [^]. For instance, see “Calculus on computational graphs: backpropagation” [^].


Fooling the neural network:

Julia Evans. “How to trick a neural network into thinking a panda is a vulture” [^].

Andrej Karpathy. “Breaking linear classifiers on ImageNet” [^].

A. Nguyen, J. Yosinski, and J. Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images” [^]

Melanie Mitchell. “Artificial Intelligence hits the barrier of meaning” [^]


The Most Important link!

Ijad Madisch. “Why I hire scientists, and why you should, too” [^]


A song I like:

(Western, pop) “Billie Jean”
Artist: Michael Jackson

[Back in the ’80s, this song used to get played in the restaurants from the Pune camp area, and also in the cinema halls like West-End, Rahul, Alka, etc. The camp area was so beautiful, back then—also uncrowded, and quiet.

This song would also come floating on the air, while sitting in the evening at the Quark cafe, situated in the middle of all the IITM hostels (next to skating rink). Some or the other guy would be playing it in a nearby hostel room on one of those stereo systems which would come with those 1 or 2 feet tall “hi-fi” speaker-boxes. Each box typically had three stacked speakers. A combination of a separately sitting sub-woofer with a few small other boxes or a soundbar, so ubiquitous today, had not been invented yet… Back then, Quark was a completely open-air cafe—a small patch of ground surrounded by small trees, and a tiny hexagonal hut, built in RCC, for serving snacks. There were no benches, even, at Quark. People would sit on those small concrete blocks (brought from the civil department where they would come for testing). Deer would be roaming very nearby around. A daring one or two could venture to come forward and eat pizza out of your (fully) extended hand!…

…Anyway, coming back to the song itself, I had completely forgotten it, but got reminded when @curiouswavefn mentioned it in one of his tweets recently. … When I read the tweet, I couldn’t make out that it was this song (apart from Bach’s variations) that he was referring to. I just idly checked out both of them, and then, while listening to it, I suddenly recognized this song. … You see, unlike so many other guys of e-schools of our times, I wouldn’t listen to a lot of Western pop-songs those days (and still don’t). Beatles, ABBA and a few other groups/singers, may be, also the Western instrumentals (a lot) and the Western classical music (some, but definitely). But somehow, I was never too much into the Western pop songs. … Another thing. The way these Western singers sing, it used to be very, very hard for me to figure out the lyrics back then—and the situation continues mostly the same way even today! So, recognizing a song by its name was simply out of the question….

… Anyway, do check out the links (even if some of them appear to be out of your reach on the first reading), and enjoy the song. … Take care, and bye for now…]

 

Advertisements

Update: Pursuing some simple (and possibly new) ideas in Data Science

Last Saturday, I attended a Data Science-related meetup in Pune (the one organized by DataGiri). I enjoyed all the four sessions covered in it (one each on logistic regression, SVM, clustering, and ensemble methods). … Out of the past 4/5 events or 1-day introductory workshops on ML/DL which I have attended so far in Pune, I think this one was by far the best.

Attending events like these (also conferences) often has an effect: due to the informality of the interaction, you begin to look at the same things from a slightly different perspective. That precisely is what seems to have happened to me this time round.

Cutting straight to the point, I think that after attending this event, I might have stumbled across a couple of small little ideas concerning the techniques that were discussed. These ideas could have an element of novelty. At least that’s what I feel. … Several Internet searches (and consulting standard books up to Bishop and ESLII) hasn’t thrown up something similar so far. So, who knows… And yes, it’s not just the novelty; there also should be some advantages to be had in practical applications too.

Of course, Data Science is relatively a new field for me, and so, my knowledge of these topics is pretty limited. Still, currently, I am engaged in taking these ideas a little further. From what I have come across thus far, it does look like there should be something to these ideas. But I need to both flesh out the ideas and take the literature-search further… much, much further.

At the same time, I am also having a look at the angle of whether a patent or two can come out of these ideas or not. So far, the prospects do seem promising. So, if you have the means to sponsor patents, and if NDAs are OK by you, then feel free to get in touch with me for some more details and the current status of development.

Bottomline: Nothing major here; just a couple of small ideas (or small variations on the known techniques). But they do seem neat and novel. In any case, they certainly are worth pursuing a bit further.

…Take care and bye for now…


A song I like:

(Hindi) “mere jaise ban jaaoge…”
Singers: Jagjit and Chitra Singh
Lyrics: Saeed Rahi (?)
Music: Jagjit Singh

 

Non-Interview Questions on Data Science—Part 1

This entry is the first in a series of posts which will note some of the questions that no one will ever ask you during any interview for any position in the Data Science industry.

Naturally, if you ask for my opinion, you should not consider modifying these questions a bit and posting them as a part of your own post on Medium.com, AnalyticsVidhya, KDNuggets, TowardsDataScience, ComingFromDataScience, etc.

No, really! There would be no point in lifting these questions and posting them as if they were yours, because no one in the industry is ever going to get impressed by you because you raised them. … I am posting them here simply because… because “I am like that only.”

OK, so here is the first installment in this practically useless series. (I should know. I go jobless.)

(Part 1 mostly covers linear and logistic regression, and just a bit of probability.)


Q.1: Consider the probability theory. How are the following ideas related to each other?: random phenomenon, random experiment, trial, result, outcome, outcome space, sample space, event, random variable, and probability distribution. In particular, state precisely the difference between a result and an outcome, and between an outcome and an event.

Give a few examples of finite and countably infinite sample spaces. Give one example of a random variable whose domain is not the real number line. (Hint: See the Advise at the end of this post concerning which books to consult.)


Q.2: In the set theory, when a set is defined through enumeration, repeated instances are not included in the definition. In the light of this fact, answer the following question: Is an event a set? or is it just a primitive instance subsumed in a set? What precisely is the difference between a trial, a result of a trial, and an event? (Hint: See the Advise at the end of this post concerning which books to consult.)


Q.3: Select the best alternative: In regression for making predictions with a continuous target data, if a model is constructed in reference to the equation y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3, then:
(a) It is a sub-type of the linear regression model.
(b) It is a polynomial regression model.
(c) It is a nonlinear regression model because powers > 1 of the independent variable x_i are involved.
(d) It is a nonlinear regression model because more than two \beta_m terms are involved.
(e) Both (a) and (b)
(g) Both (b) and (c)
(f) Both (c) and (d)
(g) All of (b), (c), and (d)
(h) None of the above.
(Hint: Don’t rely too much on the textbooks being used by the BE (CS) students in the leading engineering colleges in Pune and Mumbai.)


Q.4: Consider a data-set consisting of performance of students on a class test. It has three columns: student ID, hours studied, and marks obtained. Suppose you decide to use the simple linear regression technique to make predictions.

Let’s say that you assume that the hours studied are the independent variable (predictor), and the marks obtained are the dependent variable (response). Making this assumption, you make a scatter plot, carry out the regression, and plot the regression line predicted by the model too.

The question now is: If you interchange the designations of the dependent and independent variables (i.e., if you take the marks obtained as predictors and the hours studied as responses), build a second linear model on this basis, and plot the regression line thus predicted, will it coincide with the line plotted earlier or not. Why or why not?

Repeat the question for the polynomial regression. Repeat the question if you include the simplest interaction term in the linear model.


Q.5: Draw a schematic diagram showing circles for nodes and straight-lines for connections (as in the ANN diagrams) for a binary logistic regression machine that operates on just one feature. Wonder why your text-book didn’t draw it in the chapter on the logistic regression.


Q.6: Suppose that the training input for a classification task consists of r number of distinct data-points and c number of features. If logistic regression is to be used for classification of this data, state the number of the unknown parameters there would be. Make suitable assumptions as necessary, and state them.


Q.7: Obtain (or write) some simple Python code for implementing from the scratch a single-feature binary logistic regression machine that uses the simple (non-stochastic) gradient descent method that computes the gradient for each row (batch-size of 1).

Modify the code to show a real-time animation of how the model goes on changing as the gradient descent algorithm progresses. The animation should depict a scatter plot of the sample data (y vs. x) and not the parameters space (\beta_0 vs. \beta_1). The animation should highlight the data-point currently being processed in a separate color. It should also show a plot of the logistic function on the same graph.

Can you imagine, right before running (or even building) the animation, what kind of visual changes is the animation going to depict? how?


Q.8: What are the important advantage of the stochastic gradient descent method over the simple (non-stochastic) gradient descent?


Q.9: State true or false: (i) The output of the logistic function is continuous. (ii) The minimization of the cost function in logistic regression involves a continuous dependence on the undetermined parameters.

In the light of your answers, explain the reason why the logistic regression can at all be used as a classification mechanism (i.e. for targets that are “discrete”, not continuous). State only those axioms of the probability theory which are directly relevant here.


Q.10: Draw diagrams in the parameters-space for the Lasso regression and the Ridge regression. The question now is to explain precisely what lies inside the square or circular region. In each case, draw an example path that might get traced during the gradient descent, and clearly explain why the progress occurs the way it does.


Q.11: Briefly explain how the idea of the logistic regression gets applied in the artificial neural networks (ANNs). Suppose that a training data-set has c number of features, r number of data-rows, and M number of output bins (i.e. classification types). Assuming that the neural network does not carry any hidden layers, calculate the number of logistic regressions that would be performed in a single batch. Make suitable assumptions as necessary.

Does your answer change if you consider the multinomial logistic regression?


Q.12: State the most prominent limitation of the gradient descent methods. State the name of any one technique which can overcome this limitation.


Advise: To answer the first two questions, don’t refer to the programming books. In fact, don’t even rely too much on the usual textbooks. Even Wasserman skips over the topic and Stirzaker is inadquate. Kreyszig is barely OK. A recommended text (more rigorous but UG-level, and brief) for this topic is: “An Introduction to Probability and Statistics” (2015) Rohatgi and Saleh, Wiley.


Awww… Still with me?

If you read this far, chances are very bright that you are really^{really} desperately looking for a job in the data science field. And, as it so happens, I am also a very, very kind hearted person. I don’t like to disappoint nice, ambitious… err… “aspiring” people. So, let me offer you some real help before you decide to close this page (and this blog) forever.

Here is one question they might actually ask you during an interview—especially if the interviewer is an MBA:

A question they might actually ask you in an interview: What are the three V’s of big data? four? five?

(Yes, MBA’s do know arithmetic. At least, it was there on their CAT / GMAT entrance exams. Yes, you can use this question for your posts on Medium.com, AnalyticsVidhya, KDNuggets, TowardsDataScience, ComingFromDataScience, etc.)


A couple of notes:

  1. I might come back and revise the questions to make them less ambiguous or more precise.
  2. Also, please do drop a line if any of the questions is not valid, or shows a poor understanding on my part—this is easily possible.

 


A song I like:

[Credits listed in a random order. Good!]

(Hindi) “mausam kee sargam ko sun…”
Music: Jatin-Lalit
Singer: Kavita Krishnamoorthy
Lyrics: Majrooh Sultanpuri


History:

First written: Friday 14 June 2019 11:50:25 AM IST.
Published online: 2019.06.16 12:45 IST.
The songs section added: 2019.06.16 22:18 IST.

The Machine Learning as an Expert System

To cut a somewhat long story short, I think that I can “see” that Machine Learning (including Deep Learning) can actually be regarded as a rules-based expert system, albeit of a special kind.

I am sure that people must have written articles expressing this view. However, simple googling didn’t get me to any useful material.

I would deeply appreciate it if someone could please point out references in this direction. Thanks in advance.


BTW, here is a very neat infographic on AI: [^]; h/t [^]. … Once you finish reading it, re-read this post, please! Exactly once again, and only the first part—i.e., without recursion!. …


A song I like:

(Marathi) “visar preet, visar geet, visar bheT aapuli”
Music: Yashwant Dev
Lyrics: Shantaram Nandgaonkar
Singer: Sudhir Phadke

 

Learnability of machine learning is provably an undecidable?—part 3: closure

Update on 23 January 2019, 17:55 IST:

In this series of posts, which was just a step further from the initial, brain-storming kind of a stage, I had come to the conclusion that based on certain epistemological (and metaphysical) considerations, Ben-David et al.’s conclusion (that learnability can be an undecidable) is logically untenable.

However, now, as explained here [^], I find that this particular conclusion which I drew, was erroneous. I now stand corrected, i.e., I now consider Ben-David et al.’s result to be plausible. Obviously, it merits a further, deeper, study.

However, even as acknowledging the above-mentioned mistake, let me also hasten to clarify that I still stick to my other positions, especially the central theme in this series of posts. The central theme here was that there are certain core features of the set theory which make implications such as Godel’s incompleteness theorems possible. These features (of the set theory) demonstrably carry a glaring epistemological flaw such that applying Godel’s theorem outside of its narrow technical scope in mathematics or computer science is not permissible. In particular, Godel’s incompleteness theorem does not apply to knowledge or its validation in the more general sense of these terms. This theme, I believe, continues to hold as is.

Update over.


Gosh! I gotta get this series out of my hand—and also head! ASAP, really!! … So, I am going to scrap the bits and pieces I had written for it earlier; they would have turned this series into a 4- or 5-part one. Instead, I am going to start entirely afresh, and I am going to approach this topic from an entirely different angle—a somewhat indirect but a faster route, sort of like a short-cut. Let’s get going.


Statements:

Open any article, research paper, book or a post, and what do you find? Basically, all these consist of sentences after sentences. That is, a series of statements, in a way. That’s all. So, let’s get going at the level of statements, from a “logical” (i.e. logic-thoretical) point of view.

Statements are made to propose or to identify (or at least to assert) some (or the other) fact(s) of reality. That’s what their purpose is.


The conceptual-level consciousness as being prone to making errors:

Coming to the consciousness of man, there are broadly two levels of cognition at which it operates: the sensory-perceptual, and the conceptual.

Examples of the sensory-perceptual level consciousness would consist of reaching a mental grasp of such facts of reality as: “This object exists, here and now;” “this object has this property, to this much degree, in reality,” etc. Notice that what we have done here is to take items of perception, and put them into the form of propositions.

Propositions can be true or false. However, at the perceptual level, a consciousness has no choice in regard to the truth-status. If the item is perceived, that’s it! It’s “true” anyway. Rather, perceptions are not subject to a test of truth- or false-hoods; they are at the very base standards of deciding truth- or false-hoods.

A consciousness—better still, an organism—does have some choice, even at the perceptual level. The choice which it has exists in regard to such things as: what aspect of reality to focus on, with what degree of focus, with what end (or purpose), etc. But we are not talking about such things here. What matters to us here is just the truth-status, that’s all. Thus, keeping only the truth-status in mind, we can say that this very idea itself (of a truth-status) is inapplicable at the purely perceptual level. However, it is very much relevant at the conceptual level. The reason is that at the conceptual level, the consciousness is prone to err.

The conceptual level of consciousness may be said to involve two different abilities:

  • First, the ability to conceive of (i.e. create) the mental units that are the concepts.
  • Second, the ability to connect together the various existing concepts to create propositions which express different aspects of the truths pertaining to them.

It is possible for a consciousness to go wrong in either of the two respects. However, mistakes are much more easier to make when it comes to the second respect.

Homework 1: Supply an example of going wrong in the first way, i.e., right at the stage of forming concepts. (Hint: Take a concept that is at least somewhat higher-level so that mistakes are easier in forming it; consider its valid definition; then modify its definition by dropping one of its defining characteristics and substituting a non-essential in it.)

Homework 2: Supply a few examples of going wrong in the second way, i.e., in forming propositions. (Hint: I guess almost any logical fallacy can be taken as a starting point for generating examples here.)


Truth-hood operator for statements:

As seen above, statements (i.e. complete sentences that formally can be treated as propositions) made at the conceptual level can, and do, go wrong.

We therefore define a truth-hood operator which, when it operates on a statement, yields the result as to whether the given statement is true or non-true. (Aside: Without getting into further epistemological complexities, let me note here that I reject the idea of the arbitrary, and thus regard non-true as nothing but a sub-category of the false. Thus, in my view, a proposition is either true or it is false. There is no middle (as Aristotle said), or even an “outside” (like the arbitrary) to its truth-status.)

Here are a few examples of applying the truth-status (or truth-hood) operator to a statement:

  • Truth-hood[ California is not a state in the USA ] = false
  • Truth-hood[ Texas is a state in the USA ] = true
  • Truth-hood[ All reasonable people are leftists ] = false
  • Truth-hood[ All reasonable people are rightists ] = false
  • Truth-hood[ Indians have significantly contributed to mankind’s culture ] = true
  • etc.

For ease in writing and manipulation, we propose to give names to statements. Thus, first declaring

A: California is not a state in the USA

and then applying the Truth-hood operator to “A”, is fully equivalent to applying this operator to the entire sentence appearing after the colon (:) symbol. Thus,

Truth-hood[ A ] <==> Truth-hood[ California is not a state in the USA ] = false


Just a bit of the computer languages theory: terminals and non-terminals:

To take a short-cut through this entire theory, we would like to approach the idea of statements from a little abstract perspective. Accordingly, borrowing some terminology from the area of computer languages, we define and use two types of symbols: terminals and non-terminals. The overall idea is this. We regard any program (i.e. a “write-up”) written in any computer-language as consisting of a sequence of statements. A statement, in turn, consists of certain well-defined arrangement of words or symbols. Now, we observe that symbols (or words) can be  either terminals or non-terminals.

You can think of a non-terminal symbol in different ways: as higher-level or more abstract words, as “potent” symbols. The non-terminal symbols have a “definition”—i.e., an expansion rule. (In CS, it is customary to call an expansion rule a “production” rule.) Here is a simple example of a non-terminal and its expansion:

  • P => S1 S2

where the symbol “=>” is taken to mean things like: “is the same as” or “is fully equivalent to” or “expands to.” What we have here is an example of an abstract statement. We interpret this statement as the following. Wherever you see the symbol “P,” you may substitute it using the train of the two symbols, S1 and S2, written in that order (and without anything else coming in between them).

Now consider the following non-terminals, and their expansion rules:

  • P1 => P2 P S1
  • P2 => S3

The question is: Given the expansion rules for P, P1, and P2, what exactly does P1 mean? what precisely does it stand for?

Answer:

  • P1 => (P2) P S1 => S3 (P) S1 => S3 S1 S2 S1

In the above, we first take the expansion rule for P1. Then, we expand the P2 symbol in it. Finally, we expand the P symbol. When no non-terminal symbol is left to expand, we arrive at our answer that “P1” means the same as “S3 S1 S2 S1.” We could have said the same fact using the colon symbol, because the colon (:) and the “expands to” symbol “=>” mean one and the same thing. Thus, we can say:

  • P1: S3 S1 S2 S1

The left hand-side and the right hand-side are fully equivalent ways of saying the same thing. If you want, you may regard the expression on the right hand-side as a “meaning” of the symbol on the left hand-side.

It is at this point that we are able to understand the terms: terminals and non-terminals.

The symbols which do not have any further expansion for them are called, for obvious reasons, the terminal symbols. In contrast, non-terminal symbols are those which can be expanded in terms of an ordered sequence of non-terminals and/or terminals.

We can now connect our present discussion (which is in terms of computer languages) to our prior discussion of statements (which is in terms of symbolic logic), and arrive at the following correspondence:

The name of every named statement is a non-terminal; and the statement body itself is an expansion rule.

This correspondence works also in the reverse direction.

You can always think of a non-terminal (from a computer language) as the name of a named proposition or statement, and you can think of an expansion rule as the body of the statement.

Easy enough, right? … I think that we are now all set to consider the next topic, which is: liar’s paradox.


Liar’s paradox:

The liar paradox is a topic from the theory of logic [^]. It has been resolved by many people in different ways. We would like to treat it from the viewpoint of the elementary computer languages theory (as covered above).

The simplest example of the liar paradox is , using the terminology of the computer languages theory, the following named statement or expansion rule:

  • A: A is false.

Notice, it wouldn’t be a paradox if the same non-terminal symbol, viz. “A” were not to appear on both sides of the expansion rule.

To understand why the above expansion rule (or “definition”) involves a paradox, let’s get into the game.

Our task will be to evaluate the truth-status of the named statement that is “A”. This is the “A” which comes on the left hand-side, i.e., before the colon.

In symbolic logic, a statement is nothing but its expansion; the two are exactly and fully identical, i.e., they are one and the same. Accordingly, to evaluate the truth-status of “A” (the one which comes before the colon), we consider its expansion (which comes after the colon), and get the following:

  • Truth-hood[ A ] = Truth-hood[ A is false ] = false           (equation 1)

Alright. From this point onward, I will drop explicitly writing down the Truth-hood operator. It is still there; it’s just that to simplify typing out the ensuing discussion, I am not going to note it explicitly every time.

Anyway, coming back to the game, what we have got thus far is the truth-hood status of the given statement in this form:

  • A: “A is false”

Now, realizing that the “A” appearing on the right hand-side itself also is a non-terminal, we can substitute for its expansion within the aforementioned expansion. We thus get to the following:

  • A: “(A is false) is false”

We can apply the Truth-hood operator to this expansion, and thereby get the following: The statement which appears within the parentheses, viz., the “A is false” part, itself is false. Accordingly, the Truth-hood operator must now evaluate thus:

  • Truth-hood[ A ] = Truth-hood[ A is false] = Truth-hood[ (A is false) is false ] = Truth-hood[ A is true ] = true            (equation 2)

Fun, isn’t it? Initially, via equation 1, we got the result that A is false. Now, via equation 2, we get the result that A is true. That is the paradox.

But the fun doesn’t stop there. It can continue. In fact, it can continue indefinitely. Let’s see how.

If only we were not to halt the expansions, i.e., if only we continue a bit further with the game, we could have just as well made one more expansion, and got to the following:

  • A: ((A is false) is false) is false.

The Truth-hood status of the immediately preceding expansion now is: false. Convince yourself that it is so. Hint: Always expand the inner-most parentheses first.

Homework 3: Convince yourself that what we get here is an indefinitely long alternating sequence of the Truth-hood statuses that: A is false, A is true, A is false, A is true

What can we say by way of a conclusion?

Conclusion: The truth-status of “A” is not uniquely decidable.

The emphasis is on the word “uniquely.”

We have used all the seemingly simple rules of logic, and yet have stumbled on to the result that, apparently, logic does not allow us to decide something uniquely or meaningfully.


Liar’s paradox and the set theory:

The importance of the liar paradox to our present concerns is this:

Godel himself believed, correctly, that the liar paradox was a semantic analogue to his Incompleteness Theorem [^].

Go read the Wiki article (or anything else on the topic) to understand why. For our purposes here, I will simply point out what the connection of the liar paradox is to the set theory, and then (more or less) call it a day. The key observation I want to make is the following:

You can think of every named statement as an instance of an ordered set.

What the above key observation does is to tie the symbolic logic of proposition with the set theory. We thus have three equivalent ways of describing the same idea: symbolic logic (name of a statement and its body), computer languages theory (non-terminals and their expansions to terminals), and set theory (the label of an ordered set and its enumeration).

As an aside, the set in question may have further properties, or further mathematical or logical structures and attributes embedded in itself. But at its minimal, we can say that the name of a named statement can be seen as a non-terminal, and the “body” of the statement (or the expansion rule) can be seen as an ordered set of some symbols—an arbitrarily specified sequence of some (zero or more) terminals and (zero or more) non-terminals.

Two clarifications:

  • Yes, in case there is no sequence in a production at all, it can be called the empty set.
  • When you have the same non-terminal on both sides of an expansion rule, it is said to form a recursion relation.

An aside: It might be fun to convince yourself that the liar paradox cannot be posed or discussed in terms of Venn’s diagram. The property of the “sheet” on which Venn’ diagram is drawn is, by some simple intuitive notions we all bring to bear on Venn’s diagram, cannot have a “recursion” relation.

Yes, the set theory itself was always “powerful” enough to allow for recursions. People like Godel merely made this feature explicit, and took full “advantage” of it.


Recursion, the continuum, and epistemological (and metaphysical) validity:

In our discussion above, I had merely asserted, without giving even a hint of a proof, that the three ways (viz., the symbolic logic of statements or  propositions, the computer languages theory, and the set theory) were all equivalent ways of expressing the same basic idea (i.e. the one which we are concerned about, here).

I will now once again make a few more observations, but without explaining them in detail or supplying even an indication of their proofs. The factoids I must point out are the following:

  • You can start with the natural numbers, and by using simple operations such as addition and its inverse, and multiplication and its inverse, you can reach the real number system. The generalization goes as: Natural to Whole to Integers to Rationals to Reals. Another name for the real number system is: the continuum.
  • You can use the computer languages theory to generate a machine representation for the natural numbers. You can also mechanize the addition etc. operations. Thus, you can “in principle” (i.e. with infinite time and infinite memory) represent the continuum in the CS terms.
  • Generating a machine representation for natural numbers requires the use of recursion.

Finally, a few words about epistemological (and metaphysical) validity.

  • The concepts of numbers (whether natural or real) have a logical precedence, i.e., they come first. The entire arithmetic and the calculus must come before does the computer-representation of some of their concepts.
  • A machine-representation (or, equivalently, a set-theoretic representation) is merely a representation. That is to say, it captures only some aspects or attributes of the actual concepts from maths (whether arithmetic or the continuum hypothesis). This issue is exactly like what we saw in the first and second posts in this series: a set is a concrete collection, unlike a concept which involves a consciously cast unit perspective.
  • If you try to translate the idea of recursion into the usual cognitive terms, you get absurdities such as: You can be your child, literally speaking. Not in the sense that using scientific advances in biology, you can create a clone of yourself and regard that clone to be both yourself and your child. No, not that way. Actually, such a clone is always your twin, not child, but still, the idea here is even worse. The idea here is you can literally father your own self.
  • Aristotle got it right. Look up the distinction between completed processes and the uncompleted ones. Metaphysically, only those objects or attributes can exist which correspond to completed mathematical processes. (Yes, as an extension, you can throw in the finite limiting values, too, provided they otherwise do mean something.)
  • Recursion by very definition involves not just absence of completion but the essence of the very inability to do so.

Closure on the “learnability issue”:

Homework 4: Go through the last two posts in this series as well as this one, and figure out that the only reason that the set theory allows a “recursive” relation is because a set is, by the design of the set theory, a concrete object whose definition does not have to involve an epistemologically valid process—a unit perspective as in a properly formed concept—and so, its name does not have to stand for an abstract mentally held unit. Call this happenstance “The Glaring Epistemological Flaw of the Set Theory” (or TGEFST for short).

Homework 5: Convince yourself that any lemma or theorem that makes use of Godel’s Incompleteness Theorem is necessarily based on TGEFST, and for the same reason, its truth-status is: it is not true. (In other words, any lemma or theorem based on Godel’s theorem is an invalid or untenable idea, i.e., essentially, a falsehood.)

Homework 6: Realize that the learnability issue, as discussed in Prof. Lev Reyzin’s news article (discussed in the first part of this series [^]), must be one that makes use of Godel’s Incompleteness Theorem. Then convince yourself that for precisely the same reason, it too must be untenable.

[Yes, Betteridge’s law [^] holds.]


Other remarks:

Remark 1:

As “asymptotical” pointed out at the relevant Reddit thread [^], the authors themselves say, in another paper posted at arXiv [^] that

While this case may not arise in practical ML applications, it does serve to show that the fundamental definitions of PAC learnability (in this case, their generalization to the EMX setting) is vulnerable in the sense of not being robust to changing the underlying set theoretical model.

What I now remark here is stronger. I am saying that it can be shown, on rigorously theoretical (epistemological) grounds, that the “learnability as undecidable” thesis by itself is, logically speaking, entirely and in principle untenable.

Remark 2:

Another point. My preceding conclusion does not mean that the work reported in the paper itself is, in all its aspects, completely worthless. For instance, it might perhaps come in handy while characterizing some tricky issues related to learnability. I certainly do admit of this possibility. (To give a vague analogy, this issue is something like running into a mathematically somewhat novel way into a known type of mathematical singularity, or so.) Of course, I am not competent enough to judge how valuable the work of the paper(s) might turn out to be, in the narrow technical contexts like that.

However, what I can, and will say is this: the result does not—and cannot—bring the very learnability of ANNs itself into doubt.


Phew! First, Panpsychiasm, and immediately then, Learnability and Godel. … I’ve had to deal with two untenable claims back to back here on this blog!

… My head aches….

… Code! I have to write some code! Or write some neat notes on ML in LaTeX. Only then will, I guess, my head stop aching so much…

Honestly, I just downloaded TensorFlow yesterday, and configured an environment for it in Anaconda. I am excited, and look forward to trying out some tutorials on it…

BTW, I also honestly hope that I don’t run into anything untenable, at least for a few weeks or so…

…BTW, I also feel like taking a break… May be I should go visit IIT Bombay or some place in konkan. … But there are money constraints… Anyway, bye, really, for now…


A song I like:

(Marathi) “hirvyaa hirvyaa rangaachi jhaaDee ghanadaaTa”
Music: Sooraj (the pen-name of “Shankar” from the Shankar-Jaikishan pair)
Lyrics: Ramesh Anavakar
Singers: Jaywant Kulkarni, Sharada


[Any editing would be minimal; guess I will not even note it down separately.] Did an extensive revision by 2019.01.21 23:13 IST. Now I will leave this post in the shape in which it is. Bye for now.