# Do you really need a QC in order to have a really unpredictable stream of bits?

0. Preliminaries:

This post has reference to Roger Schlafly’s recent post [^] in which he refers to Prof. Scott Aaronson’s post touching on the issue of the randomness generated by a QC vis-a-vis that obtained using the usual classical hardware [^], in particular, to Aaronson’s remark:

“the whole point of my scheme is to prove to a faraway skeptic—one who doesn’t trust your hardware—that the bits you generated are really random.”

I do think (based on my new approach to QM [(PDF) ^]) that building a scalable QC is an impossible task.

I wonder if they (the QC enthusiasts) haven’t already begun realizing the hopelessness of their endeavours, and thus haven’t slowly begun preparing for a graceful exit, say via the QC-as-a-RNG route.

While Aaronson’s remarks also saliently involve the element of the “faraway” skeptic, I will mostly ignore that consideration here in this post. I mean to say, initially, I will ignore the scenario in which you have to transmit random bits over a network, and still have to assure the skeptic that what he was getting at the receiving end was something coming “straight from the oven”—something which was not tampered with, in any way, during the transit. The skeptic would have to be specially assured in this scenario, because a network is inherently susceptible to a third-party attack wherein the attacker seeks to exploit the infrastructure of the random keys distribution to his advantage, via injection of systematic bits (i.e. bits of his choice) that only appear random to the intended receiver. A system that quantum-mechanically entangles the two devices at the two ends of the distribution channel, does logically seem to have a very definite advantage over a combination of ordinary RNGs and classical hardware for the network. However, I will not address this part here—not for the most part, and not initially, anyway.

Instead, for most of this post, I will focus on just one basic question:

Can any one be justified in thinking that an RNG that operates at the QM-level might have even a slightest possible advantage, at least logically speaking, over another RNG that operates at the CM-level? Note, the QM-level RNG need not always be a general purpose and scalable QC; it can be any simple or special-purpose device that exploits, and at its core operates at, the specifically QM-level.

Even if I am a 100% skeptic of the scalable QC, I also think that the answer on this latter count is: yes, perhaps you could argue that way. But then, I think, your argument would still be pointless.

Let me explain, following my approach, why I say so.

2. RNGs as based on nonlinearities. Nonlinearities in QM vs. those in CM:

QM does involve either IAD (instantaneous action a distance), or very, very large (decidedly super-relativistic) speeds for propagation of local changes over all distant regions of space.

From the experimental evidence we have, it seems that there have to be very, very high speeds of propagation, for even smallest changes that can take place in the $\Psi$ and $V$ fields. The Schrodinger equation assumes infinitely large speeds for them. Such obviously cannot be the case—it is best to take the infinite speeds as just an abstraction (as a mathematical approximation) to the reality of very, very high actual speeds. However, the experimental evidence also indicates that even if there has to be some or the other upper bound to the speeds $v$, with $v \gg c$, the speeds still have to be so high as to seemingly approach infinity, if the Schrodinger formalism is to be employed. And, of course, as you know it, Schrodinger’s formalism is pretty well understood, validated, and appreciated [^]. (For more on the speed limits and IAD in general, see the addendum at the end of this post.)

I don’t know the relativity theory or the relativistic QM. But I guess that since the electric fields of massive QM particles are non-uniform (they are in fact singular), their interactions with $\Psi$ must be such that the system has to suddenly snap out of some one configuration and in the same process snap into one of the many alternative possible configurations. Since there are huge (astronomically large) number of particles in the universe, the alternative configurations would be {astronomically large}^{very large}—after all, the particles positions and motions are continuous. Thus, we couldn’t hope to calculate the propagation speeds for the changes in the local features of a configuration in terms of all those irreversible snap-out and snap-in events taken individually. We must take them in an ensemble sense. Further, the electric charges are massive, identical, and produce singular and continuous fields. Overall, it is the ensemble-level effects of these individual quantum mechanical snap-out and snap-in events whose end-result would be: the speed-of-light limitation of the special relativity (SR). After all, SR holds on the gross scale; it is a theory from classical electrodynamics. The electric and magnetic fields of classical EM can be seen as being produced by the quantum $\Psi$ field (including the spinor function) of large ensembles of particles in the limit that the number of their configurations approaches infinity, and the classical EM waves i.e. light are nothing but the second-order effects in the classical EM fields.

I don’t know. I was just loud-thinking. But it’s certainly possible to have IAD for the changes in $\Psi$ and $V$, and thus to have instantaneous energy transfers via photons across two distant atoms in a QM-level description, and still end up with a finite limit for the speed of light ($c$) for large collections of atoms.

OK. Enough of setting up the context.

2.2: The domain of dependence for the nonlinearity in QM vs. that in CM:

If QM is not linear, i.e., if there is a nonlinearity in the $\Psi$ field (as I have proposed), then to evaluate the merits of the QM-level and CM-level RNGs, we have to compare the two nonlinearities: those in the QM vs. those in the CM.

The classical RNGs are always based on the nonlinearities in CM. For example:

• the nonlinearities in the atmospheric electricity (the “static”) [^], or
• the fluid-dynamical nonlinearities (as shown in the lottery-draw machines [^], or the lava lamps [^]), or
• some or the other nonlinear electronic circuits (available for less than \$10 in hardware stores)
• etc.

All of them are based on two factors: (i) a large number of components (in the core system generating the random signal, not necessarily in the part that probes its state), and (ii) nonlinear interactions among all such components.

The number of variables in the QM description is anyway always larger: a single classical atom is seen as composed from tens, even hundreds of quantum mechanical charges. Further, due to the IAD present in the QM theory, the domain of dependence (DoD) [^] in QM remains, at all times, literally the entire universe—all charges are included in it, and the entire $\Psi$ field too.

On the other hand, the DoD in the CM description remains limited to only that finite region which is contained in the relevant past light-cone. Even when a classical system is nonlinear, and thus gets crazy very rapidly with even small increases in the number of degrees of freedom (DOFs), its DoD still remains finite and rather very small at all times. In contrast, the DoD of QM is the whole universe—all physical objects in it.

2.3 Implication for the RNGs:

Based on the above-mentioned argument, which in my limited reading and knowledge Aaronson has never presented (and neither has any one else either, basically because they all continue to believe in von Neumann’s characterization of QM as a linear theory), an RNG operating at the QM level does seem to have, “logically” speaking, an upper hand over an RNG operating at the CM level.

Then why do I still say that arguing for the superiority of a QM-level RNG is still pointless?

3. The MVLSN principle, and its epistemological basis:

If you apply a proper epistemology (and I have in my mind here the one by Ayn Rand), then the supposed “logical” difference between the two descriptions becomes completely superfluous. That’s because the quantities whose differences are being examined, themselves begin to lose any epistemological standing.

The reason for that, in turn, is what I call the MVLSN principle: the law of the Meaninglessness of the Very Large or very Small Numbers (or scales).

What the MVLSN principle says is that if your argument crucially depends on the use of very large (or very small) quantities and relationships between them, i.e., if the fulcrum of your argument rests on some great extrapolations alone, then it begins to lose all cognitive merit. “Very large” and “very small” are contextual terms here, to be used judiciously.

Roughly speaking, if this principle is applied to our current situation, what it says is that when in your thought you cross a certain limit of DOFs and hence a certain limit of complexity (which anyway is sufficiently large as to be much, much beyond the limit of any and every available and even conceivable means of predictability), then any differences in the relative complexities (here, of the QM-level RNGs vs. the CM-level RNGs) ought to be regarded as having no bearing at all on knowledge, and therefore, as having no relevance in any practical issue.

Both QM-level and CM-level RNGs would be far too complex for you to devise any algorithm or a machine that might be able to predict the sequence of the bits coming out of either. Really. The complexity levels already grow so huge, even with just the classical systems, that it’s pointless trying to predict the the bits. Or, to try and compare the complexity of the classical RNGs with the quantum RNGs.

A clarification: I am not saying that there won’t be any systematic errors or patterns in the otherwise random bits that a CM-based RNG produces. Sure enough, due statistical testing and filtering is absolutely necessary. For instance, what the radio-stations or cell-phone towers transmit are, from the viewpoint of a RNG based on radio noise, systematic disturbances that do affect its randomness. See random.org [^] for further details. I am certainly not denying this part.

All that I am saying is that the sheer number of DOF’s involved itself is so huge that the very randomness of the bits produced even by a classical RNG is beyond every reasonable doubt.

BTW, in this context, do see my previous couple of posts dealing with probability, indeterminism, randomness, and the all-important system vs. the law distinction here [^], and here [^].

4. To conclude my main argument here…:

In short, even “purely” classical RNGs can be way, way too complex for any one to be concerned in any way about their predictability. They are unpredictable. You don’t have to go chase the QM level just in order to ensure unpredictability.

Just take one of those WinTV lottery draw machines [^], start the air flow, get your prediction algorithm running on your computer (whether classical or quantum), and try to predict the next ball that would come out once the switch is pressed. Let me be generous. Assume that the switch gets pressed at exactly predictable intervals.

5. The Height of the Tallest Possible Man (HTPM):

If you still insist on the supposedly “logical” superiority of the QM-level RNGs, make sure to understand the MVLSN principle well.

The issue here is somewhat like asking this question:

What could possibly be the upper limit to the height of man, taken as a species? Not any other species (like the legendary “yeti”), but human beings, specifically. How tall can any man at all get? Where do you draw the line?

People could perhaps go on arguing, with at least some fig-leaf of epistemological legitimacy, over numbers like 12 feet vs. 14 feet as the true limit. (The world record mentioned in the Guinness Book is slightly under 9 feet [^]. The ceiling in a typical room is about 10 feet high.) Why, they could even perhaps go like: “Ummmm… may be 12 feet is more likely a limit than 24 feet? whaddaya say?”

Being very generous of spirit, I might still describe this as a borderline case of madness. The reason is, in the act of undertaking even just a probabilistic comparison like that, the speaker has already agreed to assign non-zero probabilities to all the numbers belonging to that range. Realize, no one would invoke the ideas of likelihood or probability theory if he thought that the probability for an event, however calculated, was always going to be zero. He would exclude certain kinds of ranges from his analysis to begin with—even for a stochastic analysis. … So, madness it is, even if, in my most generous mood, I might regard it as a borderline madness.

But if you assume that a living being has all the other characteristic of only a human being (including being naturally born to human parents), and if you still say that in between the two statements: (A) a man could perhaps grow to be 100 feet tall, and (B) a man could perhaps grow to be 200 feet tall, it is the statement (A) which is relatively and logically more reasonable, then what the principle (MVLSN) says is this: “you basically have lost all your epistemological bearing.”

That’s nothing but complex (actually, philosophic) for saying that you have gone mad, full-stop.

The law of the meaningless of the very large or very small numbers does have a certain basis in epistemology. It goes something like this:

Abstractions are abstractions from the actually perceived concretes. Hence, even while making just conceptual projections, the range over which a given abstraction (or concept) can remain relevant is determined by the actual ranges in the direct experience from which they were derived (and the nature, scope and purpose of that particular abstraction, the method of reaching it, and its use in applications including projections). Abstractions cannot be used in disregard of the ranges of the measurements over which they were formed.

I think that after having seen the sort of crazy things that even simplest nonlinear systems with fewest variables and parameters can do (for instance, which weather agency in the world can make predictions (to the accuracy demanded by newspapers) beyond 5 days? who can predict which way is the first vortex going to be shed even in a single cylinder experiment?), it’s very easy to conclude that the CM-level vs. QM-level RNG distinction is comparable to the argument about the greater reasonableness of a 100 feet tall man vs. that of a 200 feet tall man. It’s meaningless. And, madness.

6. Aaronson’s further points:

To be fair, much of the above write-up was not meant for Aaronson; he does readily grant the CM-level RNGs validity. What he says, immediately after the quote mentioned at the beginning of this post, is that if you don’t have the requirement of distributing bits over a network,

…then generating random bits is obviously trivial with existing technology.

However, since Aaronson believes that QM is a linear theory, he does not even consider making a comparison of the nonlinearities involved in QM and CM.

I thought that it was important to point out that even the standard (i.e., Schrodinger’s equation-based) QM is nonlinear, and further, that even if this fact leads to some glaring differences between the two technologies (based on the IAD considerations), such differences still do not lead to any advantages whatsoever for the QM-level RNG, as far as the task of generating random bits is concerned.

As to the task of transmitting them over a network is concerned, Aaronson then notes:

If you do have the requirement, on the other hand, then you’ll have to do something interesting—and as far as I know, as long as it’s rooted in physics, it will either involve Bell inequality violation or quantum computation.

Sure, it will have to involve QM. But then, why does it have to be only a QC? Why not have just special-purpose devices that are quantum mechanically entangled over wires / EM-waves?

And finally, let me come to yet another issue: But why would you at all have to have that requirement?—of having to transmit the keys over a network, and not using any other means?

Why does something as messy as a network have to get involved for a task that is as critical and delicate as distribution of some super-specially important keys? If 99.9999% of your keys-distribution requirements can be met using “trivial” (read: classical) technologies, and if you can also generate random keys using equipment that costs less than \$100 at most, then why do you have to spend billions of dollars in just distributing them to distant locations of your own offices / installations—especially if the need for changing the keys is going to be only on an infrequent basis? … And if bribing or murdering a guy who physically carries a sealed box containing a thumb-drive having secret keys is possible, then what makes the guys manning the entangled stations suddenly go all morally upright and also immortal?

From what I have read, Aaronson does consider such questions even if he seems to do so rather infrequently. The QC enthusiasts, OTOH, never do.

As I said, this QC as an RNG thing does show some marks of trying to figure out a respectable exit-way out of the scalable QC euphoria—now that they have already managed to wrest millions and billions in their research funding.

My two cents.

Speed limits are needed out of the principle that infinity is a mathematical concept and cannot metaphysically exist. However, the nature of the ontology involved in QM compels us to rethink many issues right from the beginning. In particular, we need to carefully distinguish between all the following situations:

1. The transportation of a massive classical object (a distinguishable, i.e. finite-sized, bounded piece of physical matter) from one place to another, in literally no time.
2. The transmission of the momentum or changes in it (like forces or changes in them) being carried by one object, to a distant object not in direct physical contact, in literally no time.
3. Two mutually compensating changes in the local values of some physical property (like momentum or energy) suffered at two distant points by the same object, a circumstance which may be viewed from some higher-level or abstract perspective as transmission of the property in question over space but in no time. In reality, it’s just one process of change affecting only one object, but it occurs in a special way: in mutually compensating manner at two different places at the same time.

Only the first really qualifies to be called spooky. The second is curious but not necessarily spooky—not if you begin to regard two planets as just two regions of the same background object, or alternatively, as two clearly different objects which are being pulled in various ways at the same time and in mutually compensating ways via some invisible strings or fields that shorten or extend appropriately. The third one is not spooky at all—the object that effects the necessary compensations is not even a third object (like a field). Both the interacting “objects” and the “intervening medium” are nothing but different parts of one and the same object.

What happens in QM is the third possibility. I have been describing such changes as occurring with an IAD (instantaneous action at a distance), but now I am not too sure if such a usage is really correct or not. I now think that it is not. The term IAD should be reserved only for the second category—it’s an action that gets transported there. As to the first category, a new term should be coined: ITD (instantaneous transportation to distance). As to the third category, the new term could be IMCAD (instantaneous and mutually compensating actions at a distance). However, this all is an afterthought. So, in this post, I only have ended up using the term IAD even for the third category.

Some day I will think more deeply about it and straighten out the terminology, may be invent some or new terms to describe all the three situations with adequate directness, and then choose the best… Until then, please excuse me and interpret what I am saying in reference to context. Also, feel free to suggest good alternative terms. Also, let me know if there are any further distinctions to be made, i.e., if the above classification into three categories is not adequate or refined enough. Thanks in advance.

A song I like:

[A wonderful “koLi-geet,” i.e., a fisherman’s song. Written by a poet who hailed not from the coastal “konkaN” region but from the interior “desh.” But it sounds so authentically coastal… Listening to it today instantly transported me back to my high-school days.]

Singing, Music and Lyrics: Shaahir Amar Sheikh

History: Originally published on 2019.07.04 22:53 IST. Extended and streamlined considerably on 2019.07.05 11:04 IST. The songs section added: 2019.07.05 17:13 IST. Further streamlined, and also further added a new section (no. 6.) on 2019.07.5 22:37 IST. … Am giving up on this post now. It grew from about 650 words (in a draft for a comment at Schlafly’s blog) to 3080 words as of now. Time to move on.

Still made further additions and streamlining for a total of ~3500 words, on 2019.07.06 16:24 IST.

# Determinism, Indeterminism, Probability, and the nature of the laws of physics—a second take…

After I wrote the last post [^], several points struck me. Some of the points that were mostly implicit needed to be addressed systematically. So, I began writing a small document containing these after-thoughts, focusing more on the structural side of the argument.

However, I don’t find time to convert these points + statements into a proper write-up. At the same time, I want to get done with this topic, at least for now, so that I can better focus on some other tasks related to data science. So, let me share the write-up in whatever form it is in, currently. Sorry for its uneven tone and all (compared to even my other writing, that is!)

Causality as a concept is very poorly understood by present-day physicists. They typically understand only one sense of the term: evolution in time. But causality is a far broader concept. Here I agree with Ayn Rand / Leonard Peikoff (OPAR). See the Ayn Rand Lexicon entry, here [^]. (However, I wrote the points below without re-reading it, and instead, relying on whatever understanding I have already come to develop starting from my studies of the same material.)

Physical universe consists of objects. Objects have identity. Identity is the sum total of all characteristics, attributes, properties, etc., of an object. Objects act in accordance with their identity; they cannot act otherwise. Interactions are not primary; they do not come into being without there being objects that undergo the interactions. Objects do not change their respective identities when they take actions—not even during interactions with other objects. The law of causality is a higher-level view taken of this fact.

In the cause-effect relationship, the cause refers to the nature (identity) of an object, and the effect refers to an action that the object takes (or undergoes). Both refer to one and the same object. TBD: Trace the example of one moving billiard ball undergoing a perfectly elastic collision with another billiard ball. Bring out how the interaction—here, the pair of the contact forces—is a name for each ball undergoing an action in accordance with its nature. An interaction is a pair of actions.

A physical law as a mapping (e.g., a function, or even a functional) from inputs to outputs.

The quantitative laws of physics often use the real number system, i.e., quantification with infinite precision. An infinite precision is a mathematical concept, not physical. (Expect physicists to eternally keep on confusing between the two kinds of concepts.)

Application of a physical law traces the same conceptual linkages as are involved in the formulation of law, but in the reverse direction.

In both formulation of a physical law and in its application, there is always some regime of applicability which is at least implicitly understood for both inputs and outputs. A pertinent idea here is: range of variations. A further idea is the response of the output to small variations in the input.

Example: Prediction by software whether a cricket ball would have hit the stumps or not, in an LBW situation.

The input position being used by the software in a certain LBW decision could be off from reality by millimeters, or at least, by a fraction of a millimeter. Still, the law (the mapping) is such that it produces predictions that are within small limits, so that it can be relied on.

Two input values, each theoretically infinitely precise, but differing by a small magnitude from each other, may be taken to define an interval or zone of input variations. As to the zone of the corresponding output, it may be thought of as an oval produced in the plane of the stumps, using the deterministic method used in making predictions.

The nature of the law governing the motion of the ball (even after factoring in aspects like effects of interaction with air and turbulence, etc.) itself is such that the size of the O/P zone remains small enough. (It does not grow exponentially.) Hence, we can use the software confidently.

That is to say, the software can be confidently used for predicting—-i.e., determining—the zone of possible landing of the ball in the plane of the stumps.

Overall, here are three elements that must be noted: (i) Each of the input positions lying at the extreme ends of the input zone of variations itself does have an infinite precision. (ii) Further, the mapping (the law) has theoretically infinite precision. (iii) Each of the outputs lying at extreme ends of the output zone also itself has theoretically infinite precision.

Existence of such infinite precision is a given. But it is not at all the relevant issue.

What matters in applications is something more than these three. It is the fact that applications always involve zones of variations in the inputs and outputs.

Such zones are then used in error estimates. (Also for engineering control purposes, say as in automation or robotic applications.) But the fact that quantities being fed to the program as inputs themselves may be in error is not the crux of the issue. If you focus too much on errors, you will simply get into an infinite regress of error bounds for error bounds for error bounds…

Focus, instead, on the infinity of precision of the three kinds mentioned above, and focus on the fact that in addition to those infinitely precise quantities, application procedure does involve having zones of possible variations in the input, and it also involves the problem estimating how large the corresponding zone of variations in the output is—whether it is sufficiently small for the law and a particular application procedure or situation.

In physics, such details of application procedures are kept merely understood. They are hardly, if ever, mentioned and discussed explicitly. Physicists again show their poor epistemology. They discuss such things in terms not of the zones but of “error” bounds. This already inserts the wedge of dichotomy: infinitely precise laws vs. errors in applications. This dichotomy is entirely uncalled for. But, physicists simply aren’t that smart, that’s all.

“Indeterministic mapping,” for the above example (LBW decisions) would the one in which the ball can be mapped as going anywhere over, and perhaps even beyond, the stadium.

Such a law and the application method (including the software) would be useless as an aid in the LBW decisions.

However, phenomenologically, the very dynamics of the cricket ball’s motion itself is simple enough that it leads to a causal law whose nature is such that for a small variation in the input conditions (a small input variations zone), the predicted zone of the O/P also is small enough. It is for this reason that we say that predictions are possible in this situation. That is to say, this is not an indeterministic situation or law.

Not all physical situations are exactly like the example of the predicting the motion of the cricket ball. There are physical situations which show a certain common—and confusing—characteristic.

They involve interactions that are deterministic when occurring between two (or few) bodies. Thus, the laws governing a simple interaction between one or two bodies are deterministic—in the above sense of the term (i.e., in terms of infinite precision for mapping, and an existence of the zones of variations in the inputs and outputs).

But these physical situations also involve: (i) a nonlinear mapping, (ii) a sufficiently large number of interacting bodies, and further, (iii) coupling of all the interactions.

It is these physical situations which produce such an overall system behaviour that it can produce an exponentially diverging output zone even for a small zone of input variations.

So, a small change in I/P is sufficient to produce a huge change in O/P.

However, note the confusing part. Even if the system behaviour for a large number of bodies does show an exponential increase in the output zone, the mapping itself is such that when it is applied to only one pair of bodies in isolation of all the others, then the output zone does remain non-exponential.

It is this characteristic which tricks people into forming two camps that go on arguing eternally. One side says that it is deterministic (making reference to a single-pair interaction), the other side says it is indeterministic (making reference to a large number of interactions, based on the same law).

The fallacy arises out of confusing a characteristic of the application method or model (variations in input and output zones) with the precision of the law or the mapping.

Example: N-body problem.

Example: NS equations as capturing a continuum description (a nonlinear one) of a very large number of bodies.

Example: Several other physical laws entering the coupled description, apart from the NS equations, in the bubbles collapse problem.

Example: Quantum mechanics

The Law vs. the System distinction: What is indeterministic is not a law governing a simple interaction taken abstractly (in which context the law was formed), but the behaviour of the system. A law (a governing equation) can be deterministic, but still, the system behavior can become indeterministic.

Even indeterministic models or system designs, when they are described using a different kind of maths (the one which is formulated at a higher level of abstractions, and, relying on the limiting values of relative frequencies i.e. probabilities), still do show causality.

Yes, probability is a notion which itself is based on causality—after all, it uses limiting values for the relative frequencies. The ability to use the limiting processes squarely rests on there being some definite features which, by being definite, do help reveal the existence of the identity. If such features (enduring, causal) were not to be part of the identity of the objects that are abstractly seen to act probabilistically, then no application of a limiting process would be possible, and so not even a definition probability or randomness would be possible.

The notion of probability is more fundamental than that of randomness. Randomness is an abstract notion that idealizes the notion of absence of every form of order. … You can use the axioms of probability even when sequences are known to be not random, can’t you? Also, hierarchically, order comes before does randomness. Randomness is defined as the absence of (all applicable forms of) orderliness; orderliness is not defined as absence of randomness—it is defined via the some but any principle, in reference to various more concrete instances that show some or the other definable form of order.

But expect not just physicists but also mathematicians, computer scientists, and philosophers, to eternally keep on confusing the issues involved here, too. They all are dumb.

Summary:

Let me now mention a few important take-aways (though some new points not discussed above also crept in, sorry!):

• Physical laws are always causal.
• Physical laws often use the infinite precision of the real number system, and hence, they do show the mathematical character of infinite precision.
• The solution paradigm used in physics requires specifying some input numbers and calculating the corresponding output numbers. If the physical law is based on real number system, than all the numbers used too are supposed to have infinite precision.
• Applications always involve a consideration of the zone of variations in the input conditions and the corresponding zone of variations in the output predictions. The relation between the sizes of the two zones is determined by the nature of the physical law itself. If for a small variation in the input zone the law predicts a sufficiently small output zone, people call the law itself deterministic.
• Complex systems are not always composed from parts that are in themselves complex. Complex systems can be built by arranging essentially very simpler parts that are put together in complex configurations.
• Each of the simpler part may be governed by a deterministic law. However, when the input-output zones are considered for the complex system taken as a whole, the system behaviour may show exponential increase in the size of the output zone. In such a case, the system must be described as indeterministic.
• Indeterministic systems still are based on causal laws. Hence, with appropriate methods and abstractions (including mathematical ones), they can be made to reveal the underlying causality. One useful theory is that of probability. The theory turns the supposed disadvantage (a large number of interacting bodies) on its head, and uses limiting values of relative frequencies, i.e., probability. The probability theory itself is based on causality, and so are indeterministic systems.
• Systems may be deterministic or indeterministic, and in the latter case, they may be described using the maths of probability theory. Physical laws are always causal. However, if they have to be described using the terms of determinism or indeterminism, then we will have to say that they are always deterministic. After all, if the physical laws showed exponentially large output zone even when simpler systems were considered, they could not be formulated or regarded as laws.

In conclusion: Physical laws are always causal. They may also always be regarded as being deterministic. However, if systems are complex, then even if the laws governing their simpler parts were all deterministic, the system behavior itself may turn out to be indeterministic. Some indeterministic systems can be well described using the theory of probability. The theory of probability itself is based on the idea of causality albeit measures defined over large number of instances are taken, thereby exploiting the fact that there are far too many objects interacting in a complex manner.

A song I like:

(Hindi) “ho re ghungaroo kaa bole…”
Singer: Lata Mangeshkar
Music: R. D. Burman
Lyrics: Anand Bakshi

# Learnability of machine learning is provably an undecidable?—part 3: closure

Update on 23 January 2019, 17:55 IST:

In this series of posts, which was just a step further from the initial, brain-storming kind of a stage, I had come to the conclusion that based on certain epistemological (and metaphysical) considerations, Ben-David et al.’s conclusion (that learnability can be an undecidable) is logically untenable.

However, now, as explained here [^], I find that this particular conclusion which I drew, was erroneous. I now stand corrected, i.e., I now consider Ben-David et al.’s result to be plausible. Obviously, it merits a further, deeper, study.

However, even as acknowledging the above-mentioned mistake, let me also hasten to clarify that I still stick to my other positions, especially the central theme in this series of posts. The central theme here was that there are certain core features of the set theory which make implications such as Godel’s incompleteness theorems possible. These features (of the set theory) demonstrably carry a glaring epistemological flaw such that applying Godel’s theorem outside of its narrow technical scope in mathematics or computer science is not permissible. In particular, Godel’s incompleteness theorem does not apply to knowledge or its validation in the more general sense of these terms. This theme, I believe, continues to hold as is.

Update over.

Gosh! I gotta get this series out of my hand—and also head! ASAP, really!! … So, I am going to scrap the bits and pieces I had written for it earlier; they would have turned this series into a 4- or 5-part one. Instead, I am going to start entirely afresh, and I am going to approach this topic from an entirely different angle—a somewhat indirect but a faster route, sort of like a short-cut. Let’s get going.

Statements:

Open any article, research paper, book or a post, and what do you find? Basically, all these consist of sentences after sentences. That is, a series of statements, in a way. That’s all. So, let’s get going at the level of statements, from a “logical” (i.e. logic-thoretical) point of view.

Statements are made to propose or to identify (or at least to assert) some (or the other) fact(s) of reality. That’s what their purpose is.

The conceptual-level consciousness as being prone to making errors:

Coming to the consciousness of man, there are broadly two levels of cognition at which it operates: the sensory-perceptual, and the conceptual.

Examples of the sensory-perceptual level consciousness would consist of reaching a mental grasp of such facts of reality as: “This object exists, here and now;” “this object has this property, to this much degree, in reality,” etc. Notice that what we have done here is to take items of perception, and put them into the form of propositions.

Propositions can be true or false. However, at the perceptual level, a consciousness has no choice in regard to the truth-status. If the item is perceived, that’s it! It’s “true” anyway. Rather, perceptions are not subject to a test of truth- or false-hoods; they are at the very base standards of deciding truth- or false-hoods.

A consciousness—better still, an organism—does have some choice, even at the perceptual level. The choice which it has exists in regard to such things as: what aspect of reality to focus on, with what degree of focus, with what end (or purpose), etc. But we are not talking about such things here. What matters to us here is just the truth-status, that’s all. Thus, keeping only the truth-status in mind, we can say that this very idea itself (of a truth-status) is inapplicable at the purely perceptual level. However, it is very much relevant at the conceptual level. The reason is that at the conceptual level, the consciousness is prone to err.

The conceptual level of consciousness may be said to involve two different abilities:

• First, the ability to conceive of (i.e. create) the mental units that are the concepts.
• Second, the ability to connect together the various existing concepts to create propositions which express different aspects of the truths pertaining to them.

It is possible for a consciousness to go wrong in either of the two respects. However, mistakes are much more easier to make when it comes to the second respect.

Homework 1: Supply an example of going wrong in the first way, i.e., right at the stage of forming concepts. (Hint: Take a concept that is at least somewhat higher-level so that mistakes are easier in forming it; consider its valid definition; then modify its definition by dropping one of its defining characteristics and substituting a non-essential in it.)

Homework 2: Supply a few examples of going wrong in the second way, i.e., in forming propositions. (Hint: I guess almost any logical fallacy can be taken as a starting point for generating examples here.)

Truth-hood operator for statements:

As seen above, statements (i.e. complete sentences that formally can be treated as propositions) made at the conceptual level can, and do, go wrong.

We therefore define a truth-hood operator which, when it operates on a statement, yields the result as to whether the given statement is true or non-true. (Aside: Without getting into further epistemological complexities, let me note here that I reject the idea of the arbitrary, and thus regard non-true as nothing but a sub-category of the false. Thus, in my view, a proposition is either true or it is false. There is no middle (as Aristotle said), or even an “outside” (like the arbitrary) to its truth-status.)

Here are a few examples of applying the truth-status (or truth-hood) operator to a statement:

• Truth-hood[ California is not a state in the USA ] = false
• Truth-hood[ Texas is a state in the USA ] = true
• Truth-hood[ All reasonable people are leftists ] = false
• Truth-hood[ All reasonable people are rightists ] = false
• Truth-hood[ Indians have significantly contributed to mankind’s culture ] = true
• etc.

For ease in writing and manipulation, we propose to give names to statements. Thus, first declaring

A: California is not a state in the USA

and then applying the Truth-hood operator to “A”, is fully equivalent to applying this operator to the entire sentence appearing after the colon (:) symbol. Thus,

Truth-hood[ A ] <==> Truth-hood[ California is not a state in the USA ] = false

Just a bit of the computer languages theory: terminals and non-terminals:

To take a short-cut through this entire theory, we would like to approach the idea of statements from a little abstract perspective. Accordingly, borrowing some terminology from the area of computer languages, we define and use two types of symbols: terminals and non-terminals. The overall idea is this. We regard any program (i.e. a “write-up”) written in any computer-language as consisting of a sequence of statements. A statement, in turn, consists of certain well-defined arrangement of words or symbols. Now, we observe that symbols (or words) can be  either terminals or non-terminals.

You can think of a non-terminal symbol in different ways: as higher-level or more abstract words, as “potent” symbols. The non-terminal symbols have a “definition”—i.e., an expansion rule. (In CS, it is customary to call an expansion rule a “production” rule.) Here is a simple example of a non-terminal and its expansion:

• P => S1 S2

where the symbol “=>” is taken to mean things like: “is the same as” or “is fully equivalent to” or “expands to.” What we have here is an example of an abstract statement. We interpret this statement as the following. Wherever you see the symbol “P,” you may substitute it using the train of the two symbols, S1 and S2, written in that order (and without anything else coming in between them).

Now consider the following non-terminals, and their expansion rules:

• P1 => P2 P S1
• P2 => S3

The question is: Given the expansion rules for P, P1, and P2, what exactly does P1 mean? what precisely does it stand for?

• P1 => (P2) P S1 => S3 (P) S1 => S3 S1 S2 S1

In the above, we first take the expansion rule for P1. Then, we expand the P2 symbol in it. Finally, we expand the P symbol. When no non-terminal symbol is left to expand, we arrive at our answer that “P1” means the same as “S3 S1 S2 S1.” We could have said the same fact using the colon symbol, because the colon (:) and the “expands to” symbol “=>” mean one and the same thing. Thus, we can say:

• P1: S3 S1 S2 S1

The left hand-side and the right hand-side are fully equivalent ways of saying the same thing. If you want, you may regard the expression on the right hand-side as a “meaning” of the symbol on the left hand-side.

It is at this point that we are able to understand the terms: terminals and non-terminals.

The symbols which do not have any further expansion for them are called, for obvious reasons, the terminal symbols. In contrast, non-terminal symbols are those which can be expanded in terms of an ordered sequence of non-terminals and/or terminals.

We can now connect our present discussion (which is in terms of computer languages) to our prior discussion of statements (which is in terms of symbolic logic), and arrive at the following correspondence:

The name of every named statement is a non-terminal; and the statement body itself is an expansion rule.

This correspondence works also in the reverse direction.

You can always think of a non-terminal (from a computer language) as the name of a named proposition or statement, and you can think of an expansion rule as the body of the statement.

Easy enough, right? … I think that we are now all set to consider the next topic, which is: liar’s paradox.

The liar paradox is a topic from the theory of logic [^]. It has been resolved by many people in different ways. We would like to treat it from the viewpoint of the elementary computer languages theory (as covered above).

The simplest example of the liar paradox is , using the terminology of the computer languages theory, the following named statement or expansion rule:

• A: A is false.

Notice, it wouldn’t be a paradox if the same non-terminal symbol, viz. “A” were not to appear on both sides of the expansion rule.

To understand why the above expansion rule (or “definition”) involves a paradox, let’s get into the game.

Our task will be to evaluate the truth-status of the named statement that is “A”. This is the “A” which comes on the left hand-side, i.e., before the colon.

In symbolic logic, a statement is nothing but its expansion; the two are exactly and fully identical, i.e., they are one and the same. Accordingly, to evaluate the truth-status of “A” (the one which comes before the colon), we consider its expansion (which comes after the colon), and get the following:

• Truth-hood[ A ] = Truth-hood[ A is false ] = false           (equation 1)

Alright. From this point onward, I will drop explicitly writing down the Truth-hood operator. It is still there; it’s just that to simplify typing out the ensuing discussion, I am not going to note it explicitly every time.

Anyway, coming back to the game, what we have got thus far is the truth-hood status of the given statement in this form:

• A: “A is false”

Now, realizing that the “A” appearing on the right hand-side itself also is a non-terminal, we can substitute for its expansion within the aforementioned expansion. We thus get to the following:

• A: “(A is false) is false”

We can apply the Truth-hood operator to this expansion, and thereby get the following: The statement which appears within the parentheses, viz., the “A is false” part, itself is false. Accordingly, the Truth-hood operator must now evaluate thus:

• Truth-hood[ A ] = Truth-hood[ A is false] = Truth-hood[ (A is false) is false ] = Truth-hood[ A is true ] = true            (equation 2)

Fun, isn’t it? Initially, via equation 1, we got the result that A is false. Now, via equation 2, we get the result that A is true. That is the paradox.

But the fun doesn’t stop there. It can continue. In fact, it can continue indefinitely. Let’s see how.

If only we were not to halt the expansions, i.e., if only we continue a bit further with the game, we could have just as well made one more expansion, and got to the following:

• A: ((A is false) is false) is false.

The Truth-hood status of the immediately preceding expansion now is: false. Convince yourself that it is so. Hint: Always expand the inner-most parentheses first.

Homework 3: Convince yourself that what we get here is an indefinitely long alternating sequence of the Truth-hood statuses that: A is false, A is true, A is false, A is true

What can we say by way of a conclusion?

Conclusion: The truth-status of “A” is not uniquely decidable.

The emphasis is on the word “uniquely.”

We have used all the seemingly simple rules of logic, and yet have stumbled on to the result that, apparently, logic does not allow us to decide something uniquely or meaningfully.

Liar’s paradox and the set theory:

The importance of the liar paradox to our present concerns is this:

Godel himself believed, correctly, that the liar paradox was a semantic analogue to his Incompleteness Theorem [^].

Go read the Wiki article (or anything else on the topic) to understand why. For our purposes here, I will simply point out what the connection of the liar paradox is to the set theory, and then (more or less) call it a day. The key observation I want to make is the following:

You can think of every named statement as an instance of an ordered set.

What the above key observation does is to tie the symbolic logic of proposition with the set theory. We thus have three equivalent ways of describing the same idea: symbolic logic (name of a statement and its body), computer languages theory (non-terminals and their expansions to terminals), and set theory (the label of an ordered set and its enumeration).

As an aside, the set in question may have further properties, or further mathematical or logical structures and attributes embedded in itself. But at its minimal, we can say that the name of a named statement can be seen as a non-terminal, and the “body” of the statement (or the expansion rule) can be seen as an ordered set of some symbols—an arbitrarily specified sequence of some (zero or more) terminals and (zero or more) non-terminals.

Two clarifications:

• Yes, in case there is no sequence in a production at all, it can be called the empty set.
• When you have the same non-terminal on both sides of an expansion rule, it is said to form a recursion relation.

An aside: It might be fun to convince yourself that the liar paradox cannot be posed or discussed in terms of Venn’s diagram. The property of the “sheet” on which Venn’ diagram is drawn is, by some simple intuitive notions we all bring to bear on Venn’s diagram, cannot have a “recursion” relation.

Yes, the set theory itself was always “powerful” enough to allow for recursions. People like Godel merely made this feature explicit, and took full “advantage” of it.

Recursion, the continuum, and epistemological (and metaphysical) validity:

In our discussion above, I had merely asserted, without giving even a hint of a proof, that the three ways (viz., the symbolic logic of statements or  propositions, the computer languages theory, and the set theory) were all equivalent ways of expressing the same basic idea (i.e. the one which we are concerned about, here).

I will now once again make a few more observations, but without explaining them in detail or supplying even an indication of their proofs. The factoids I must point out are the following:

• You can start with the natural numbers, and by using simple operations such as addition and its inverse, and multiplication and its inverse, you can reach the real number system. The generalization goes as: Natural to Whole to Integers to Rationals to Reals. Another name for the real number system is: the continuum.
• You can use the computer languages theory to generate a machine representation for the natural numbers. You can also mechanize the addition etc. operations. Thus, you can “in principle” (i.e. with infinite time and infinite memory) represent the continuum in the CS terms.
• Generating a machine representation for natural numbers requires the use of recursion.

Finally, a few words about epistemological (and metaphysical) validity.

• The concepts of numbers (whether natural or real) have a logical precedence, i.e., they come first. The entire arithmetic and the calculus must come before does the computer-representation of some of their concepts.
• A machine-representation (or, equivalently, a set-theoretic representation) is merely a representation. That is to say, it captures only some aspects or attributes of the actual concepts from maths (whether arithmetic or the continuum hypothesis). This issue is exactly like what we saw in the first and second posts in this series: a set is a concrete collection, unlike a concept which involves a consciously cast unit perspective.
• If you try to translate the idea of recursion into the usual cognitive terms, you get absurdities such as: You can be your child, literally speaking. Not in the sense that using scientific advances in biology, you can create a clone of yourself and regard that clone to be both yourself and your child. No, not that way. Actually, such a clone is always your twin, not child, but still, the idea here is even worse. The idea here is you can literally father your own self.
• Aristotle got it right. Look up the distinction between completed processes and the uncompleted ones. Metaphysically, only those objects or attributes can exist which correspond to completed mathematical processes. (Yes, as an extension, you can throw in the finite limiting values, too, provided they otherwise do mean something.)
• Recursion by very definition involves not just absence of completion but the essence of the very inability to do so.

Closure on the “learnability issue”:

Homework 4: Go through the last two posts in this series as well as this one, and figure out that the only reason that the set theory allows a “recursive” relation is because a set is, by the design of the set theory, a concrete object whose definition does not have to involve an epistemologically valid process—a unit perspective as in a properly formed concept—and so, its name does not have to stand for an abstract mentally held unit. Call this happenstance “The Glaring Epistemological Flaw of the Set Theory” (or TGEFST for short).

Homework 5: Convince yourself that any lemma or theorem that makes use of Godel’s Incompleteness Theorem is necessarily based on TGEFST, and for the same reason, its truth-status is: it is not true. (In other words, any lemma or theorem based on Godel’s theorem is an invalid or untenable idea, i.e., essentially, a falsehood.)

Homework 6: Realize that the learnability issue, as discussed in Prof. Lev Reyzin’s news article (discussed in the first part of this series [^]), must be one that makes use of Godel’s Incompleteness Theorem. Then convince yourself that for precisely the same reason, it too must be untenable.

[Yes, Betteridge’s law [^] holds.]

Other remarks:

Remark 1:

As “asymptotical” pointed out at the relevant Reddit thread [^], the authors themselves say, in another paper posted at arXiv [^] that

While this case may not arise in practical ML applications, it does serve to show that the fundamental definitions of PAC learnability (in this case, their generalization to the EMX setting) is vulnerable in the sense of not being robust to changing the underlying set theoretical model.

What I now remark here is stronger. I am saying that it can be shown, on rigorously theoretical (epistemological) grounds, that the “learnability as undecidable” thesis by itself is, logically speaking, entirely and in principle untenable.

Remark 2:

Another point. My preceding conclusion does not mean that the work reported in the paper itself is, in all its aspects, completely worthless. For instance, it might perhaps come in handy while characterizing some tricky issues related to learnability. I certainly do admit of this possibility. (To give a vague analogy, this issue is something like running into a mathematically somewhat novel way into a known type of mathematical singularity, or so.) Of course, I am not competent enough to judge how valuable the work of the paper(s) might turn out to be, in the narrow technical contexts like that.

However, what I can, and will say is this: the result does not—and cannot—bring the very learnability of ANNs itself into doubt.

Phew! First, Panpsychiasm, and immediately then, Learnability and Godel. … I’ve had to deal with two untenable claims back to back here on this blog!

… Code! I have to write some code! Or write some neat notes on ML in LaTeX. Only then will, I guess, my head stop aching so much…

Honestly, I just downloaded TensorFlow yesterday, and configured an environment for it in Anaconda. I am excited, and look forward to trying out some tutorials on it…

BTW, I also honestly hope that I don’t run into anything untenable, at least for a few weeks or so…

…BTW, I also feel like taking a break… May be I should go visit IIT Bombay or some place in konkan. … But there are money constraints… Anyway, bye, really, for now…

A song I like: