TL;DR: I think I have made a mistake in drawing my conclusions regarding the research conclusions in the paper by Ben-David et al., though not regarding the epistemological or cultural points concerning the glaring weaknesses of the set theory, or regarding the error of attempting to apply Godel’s infamous Incompleteness theorem more generally (i.e., outside its narrow technical scope in mathematics and logic as used in computer science). Accordingly, I am going to take down this entire series of posts after about a month or two, at which time I will come back with a PDF article that collects together all the valid points.
I caught a mistake in my reasoning only after finishing this series last night:
Just yesterday, I “finished” this mini-series on this issue of whether the learnability in machine learning could be proved, on the basis of the Godel theorem and at least in one kind of a machine learning scenario, to be an undecidable or not.
Effectively, what I said (right from post 1 through post 3 of this series) was that the thesis of this new Nature paper was logically untenable. I gave my reasons.
However, right last night, right within one hour after posting the third part in this short series of posts, I came to realize that I must have made a mistake in reaching my conclusions regarding the CS-part of it!
However, it was pretty late in night (almost mid-night here in India). I was afraid that if I get out of the bed and start the machine again, chances were high that I would end up staying awake until very late in the night, possibly all the way up to early morning. So, I decided to postpone owning up my mistake and issuing the clarification until in the morning.
Now that it’s morning, here we go.
The technical-scientific and the broader cultural-philosophical aspects of the news story:
The way I think about this Nature news story now, there are two separate issues entwined together here.
One part is the purely technical aspect, which is concerned entirely with a topic from machine learning, viz. the learnability issue. (I take learnability to mean the ability of a class of algorithms to effectively cause changes in the values of their built-in parameters (like weights and biases) in response to the data presented to them—nothing more, nothing less.) This part is entirely computer science-related.
Another part is seemingly more general. Here, we start from certain features of the mathematical set-theory which were used by Godel in formulating his incompleteness theorems. We then use these theorems to draw conclusions about knowledge and axiom systems in general. This part is crucially philosophical in nature; in particular, it is epistemological (and to some extent, metaphysical) in nature.
Obviously, the two parts are very different in nature. They have different contexts, methods, goals, and implications.
However, the culture of our times is such that, at least since the appearance of the 1979 book [^] by Hoftstader (which had become very popular once upon a time in India), the aforementioned two parts usually come very intimately intertwined together. The arguments so easily and blithely slip from one part to the other that it becomes a separate task by itself to isolate them apart.
It is against this backdrop (which has run for decades) that we have to pursue the following (opening) quote from Reyzin’s story:
“In 1931, the logician Kurt Godel showed that, in any system of axioms that is expressive enough to model arithmetic, some true statements will be unprovable.”
By “any system of axioms” what the author in all probability might have meant is: any conceivable system (known or unknown) of _mathematical_ axioms.
However, note the wording. The way the statement is formulated, it also seems to hold some more general implications. It seems to imply that even if you take a broader system of axioms (one that is more general in scope, or is more fundamental than being concerned only with mathematics), it would necessarily have to suffer from the same damage of unprovability—so long as arithmetic were to be one of its by-products. That is the broader sense I am concerned with, here.
Try it right now. See if you cannot re-read Reyzin’s statement to mean this kind of a thing or not. In any case, with Godel, this kind of an implication is precisely what commentators over decades have led the layman to believe.
What I said:
Now, coming to my own posts in this series.
Here, I think I did pretty well on the second count, i.e., on the philosophic side. I think I also did fairly well in explaining the ideas from CS and showing the inter-connections between set theory, CS languages theory and the usual form of symbolic logic of propositions.
However, I drew wrong conclusions on the computer science-side. Let me explain very briefly.
In my last post, in Remark 1, I said that:
“What I now remark here is stronger. I am saying that it can be shown, on rigorously theoretical (epistemological) grounds, that the “learnability as undecidable” thesis by itself is, logically speaking, entirely and in principle untenable.”
As if this were not enough, in Remark 2, I further said:
“However, what I can, and will say is this: the result does not—and cannot—bring the very learnability of ANNs itself into doubt.”
In both cases, the context actually was only machine learning. But I still ended up denying even plausibility to the results reported in the research paper.
There could also be other places in my previous three posts to a similar effect.
My revised position:
However, my reconsidered position is that it is possible that Ben-David et al. might have a valid result.
Note, I still have not gone through Ben-David et al.’s research paper (though by now, I have completed Prof. Reyzin’s news story). In fact I don’t even have the requisite knowledge to be able to comment anything on their paper.
All that I am now saying is that one cannot take the kind of reasoning as was presented in the previous three posts, and use it to deny even just a logical plausibility to Ben-David et al.’s claims. In other words, I am admitting the possibility that they could be right.
Thus, it is possible that machine learning—at least some type of data structures and algorithms in it—indeed may be susceptible to this kind of a weakness whereby you cannot say that the implementation architecture will necessarily learn (even if enough time and computational power is thrown at it).
But why do I now say that the result is plausible? How did I reach this conclusion?
Because I am now assuming that as a technical research result in CS, Ben-David et al. would be concerned with learnability in only one sense of that term: as an operational feature of execution of certain machine learning algorithm(s) on a digital computer (actually, on a Turing machine). If you stay entirely within the confines of this set of restrictions, then it is possible that learnability could be an undecidable feature, at least in some cases.
(That is to say, parameters of the implementation architecture such as the weights and the biases, may not always change in a way that the input-to-output mapping determined by them comes to be more or less equally applicable (with sufficient accuracy) to the test data as it is to the training / validation data. It may be possible even to theoretically prove, in at least some specific scenarios and at least for certain range(s) of the problem data, that the machine architecture would not respond favorably well over the entire range of the data.)
The reason why such a circumstance is possible is this: concepts of mathematical objects and operations are, from an epistemological point a view, of a very narrowly restricted class. It is possible that certain interrelations among them do present certain limitations like undecidability of machine learning.
You always find such (the so-called) pathological cases in maths. For instance, consider this XI/XII standard problem: what is the limit of the Heaviside step function in the band through ? Whatever be your answer (the technically correct answer of there being no limit, or the practical working solution sometimes implemented in software of outputting 0.5 as the limit, the logic being based on, as the limit of vanishing areas under the curve as we approach the step, or via regarding an approximate Fourier representation to be valid), it is important to realize that the entire problem is concerned only with mathematics—not with the issue of whether a physical object can exist in two different places at the same time or not. If you pose the latter question, the answer is known, and it is: an unambiguous no. (In contrast, in maths, we can only say that the limit is not defined for the step function—not that that idea of limits is applicable and that it does not exist. There is a subtle difference between the two.)
To cut a long story short, I was confusing in between applications of machine learning algorithms (including locating the referents for their learnability aspects) vs. their theoretical execution on an ideal computing machine. An analogous situation would be to say that I was confusing between the Venn diagrams which do not permit recursion to be represented, and the possibility of having recursion during execution of programs on a computer that has infinite memory and execution time (or for having recursion for a finite number of times on an actual machine).
Only this part went wrong; the rest was right all the time anyway.
Next course of action:
This is a little odd circumstance to handle.
- I do want to keep this series of posts intact, because they do contain a good deal of points that still remain valid. They also are original in the sense that I have not seen them written elsewhere. In other places, the points are known but the presentation still has at least elements of novelty.
- Further, I also don’t care anything at all about admitting my mistakes. I also don’t care much about keeping my mistakes online for future references, for that matter. (For instance, see this blog as well as my interactions at iMechanica.)
- However, at the same time, I also do not want confusions to persist concerning the good points—especially the philosophical ones. After all, I am inserting an update in each post saying that it in part is wrong / erroneous. Such an admission is likely to go against the good points too.
After thinking a bit about it, I have decided the following course of action.
- I will keep this series online for a long enough time, say about a month or two, so that the “acknowleding mistakes” part is well taken care of.
- However, eventually, I will take down this particular series. At the time of taking it offline, I will make sure to write a PDF article which isolates together what I consider are the good and valid points, and I will post it here at the same time.
An important point: If you wish to keep these posts (in their current shape) for purely academic or similar purposes, then you may save them to your personal drives. However, I would ask you not to put them on to an archiving or “look-back” service of any kind (public or private). After all, my write-ups are, basically and originally, a personal property of mine, and publishing them doesn’t mean that I have given up my ownership rights to them. So, please treat the write-ups accordingly; thanks in advance.
No songs section for this post. (It anyway will be gone after a month or two, or so!)
Take care, and bye for now…