Why is polynomial regression linear?

1. The problem statement etc.:

Consider the polynomial regression equation:

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \beta_4 x_1^2 + \beta_5 x_2^2 + \epsilon    [Eq. I]

where it is understood that y, x_1, x_2 and \epsilon actually are “vector”s, i.e., there is a separate column for each of them. A given dataset contains a large number of rows; each row has some value of y, x_1, and x_2 given in it.

Following the “standard” usage in statistics:

  • y is called the
    • “response” variable or
    • “outcome” variable;
  • \beta_i are called:
    • “parameters” or
    • “effects” or
    • “(estimated) regression coefficients”;
  • x_1, x_2, etc. are called:
    • “predictor” variables, or
    • “explanatory” variables, or
    • “controlled” variables, or
    • also as “factors”;
  • and \epsilon is called:
    • “errors”, or
    • “disturbance term”, or
    • “residuals” (or “residual errors”)
    • “noise”, etc.

I hate statisticians.

For simplicity, I will call y dependent variable, x‘s independent variables, and \epsilon errors. The \beta‘s will be called coefficients.

Consider an Excel spreadsheet having one column for y and possibly many columns for x‘s. There are N number of rows.

Let’s use the index i for rows, and j for columns. Accordingly, y_i is the value of the dependent variable in the ith row; x_{ij} are the values of the independent variables in the same row; \beta_m‘s are the undetermined coefficients for the dataset as a whole; and \epsilon_i is the error of the ith row when some model is assumed.

Notice, here I refer to the coefficients \beta_ms using an index other than i or j. This is deliberate. The index for \beta‘s  runs from 0 to m. In general, the number of coefficients can be greater than the number of independent variables, and in general, it has no relation to how many rows there are in the dataset.

Obviously, a model which is as as “general” as above sure comes with products of x_j terms or their squares (or, in a bigger model, with as many higher-order terms as you like).

Obviously, therefore, if you ask most any engineer, he will say that Eq. I is an instance of a non-linear regression.

However, statisticians disagree. They say that the particular regression model mentioned previously is linear.

They won’t clarify on their own, but if you probe them, they will supply the following as a simple example of a nonlinear model. (Typically, what they would supply would be far more complicated model, but trust me, the following simplest example too satisfies their requirements of a nonlinear regression.)

y = \beta_0 + \beta_1^2 x + \epsilon            [Eq II].

(And, yes, you could have raised \beta_0 to a higher order too.)

I hate statisticians. They mint at least M number of words for the same concept (with M \geq 5), with each word having been very carefully chosen in such a way that it minimizes all your chances of understanding the exact nature of that concept. Naturally, I hate them.

Further, I also sometimes hate them because they don’t usually tell you, right in your first course, some broad but simple examples of nonlinear regression, right at the same time when they introduce you to the topic of linear regression.

Finally, I also hate them because they never give adequate enough an explanation as to why they call the linear regression “linear”. Or, for that matter, why they just can’t get together and standardize on terminology.

Since I hate statisticians so much, but since the things they do also are mentioned in practical things like Data Science, I also try to understand why they do what they do.

In this post, I will jot down the reason the reason behind their saying that Eq. I. is linear regression, but Eq. II is nonlinear regression. I will touch upon several points of the context, but in a somewhat arbitrary order. (This is a very informally written, and very lengthy a post. It also comes with homework.)


2. What is the broadest purpose of regression analysis?:

The purpose of regression analysis is to give you a model—an equation—which you could use so as to predict y from a given tuple (x_1, x_2, \dots). The prediction should be as close as possible.


3. When do you use regression?

You use regression only when a given system is overdetermined. Mark this point well. Make sure to understand it.

A system of linear algebraic equations is said to be “consistent” when there are as many independent equations as there are unknown variables. For instance, a system like:
2 x + 7y = 10
9 x + 5 y = -12
is consistent.

There are two equations, and two unknowns x and y. You can use any direct method such as Kramer’s method, Gaussian elimination, matrix-factorization, or even matrix-inversion (!), and get to know the unknown variables. When the number of unknowns i.e. the number of equations is large (FEM/CFD can easily produce systems of millions of equations in equal number of uknowns), you can use iterative approaches like SOR etc.

When the number of equations is smaller than the number of unkowns, the system is under-determined. You can’t solve such a system and get to a unique value by way of a solution.

When the number of equations is greater than the number of unknowns, the system is over-determined.

I am over-determined to see to it that I don’t explain you everything about this topic of under- and over-determined systems. Go look up on the ‘net. I will only mention that, in my knowledge (and I could be wrong, but it does seem that):

Regression becomes really relevant only for the over-determined systems.

You could use regression even in the large consistent systems, but there, it would become indistinguishable from the iterative approaches to solutions. You could also use regression for the under-determined systems (and you can anyway use the least squares for them). But I am not sure if you would want to use specifically regression here. In any case…

Data Science is full of problems where systems of equations are over-determined.

Every time you run into an Excel spreadsheet that has more number of rows than columns, you have an over-determined system.

That’s such systems are not consistent—it has no unique solution. That’s why regression comes in handy; it becomes important. If all systems were to be consistent, people would be happy using deterministic solutions; not regression.


4. How do you use a regression model?

4.1 The \hat{y} function as the placeholder:

In regression, we first propose (i.e. assume) a mathematical model, which can be broadly stated as:

y = \hat{y} + \epsilon,

where y are the row-wise given data values, and \hat{y} are their respective estimated values. Thus, \hat{y} is a function of the given dataset values x_j‘s; it stands for the value of y as estimated by the regression model. The error term gives you the differences from the actual vs. predicted values—row-wise.

4.2 An assumed \hat{y} function:

But while y_i‘s and x_{ij}‘s are the given values, the function \hat{y} is what we first have to assume.

Accordingly, we may say, perhaps, that:

\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_1 x_2 + \beta_4 x_1^2 + \beta_5 x_2^2                   ……….[Eq. 1]

or, alternatively, that:

\hat{y} = \beta_0 + \beta_1^2 x_1                   ……….[Eq. 2]

These two equations would give you two different estimates \hat{y}s for the “true” (and unknown) y‘s, given the same dataset (the same spreadsheet).

Now, notice a few things about the form of the mathematical equation being assumed—for a regression model.

4.3 \beta_m‘s refer to the entire data-table taken as a whole—not row-wise:

In any given regression model, \beta_m coefficients do not change their values from one row to another row. They remain the same for all the rows of a given, concrete, dataset. It’s only the values of x_1, x_2 and y that change from row to row. The values of \beta_0, \beta_1, \dots remain the same for all the rows of the dataset taken together.

4.4 Applying a regressed model to new data works with one row at a time:

In using a regression model, we assume a model equation (Eq. 1 or Eq. 2), then plug in a new tuple of the known values of the independent variables (x_1, x_2, \dots) values into it, and voila! Out comes the \hat{y}—as predicted by that model. We treat \hat{y} as the best estimate for some true but unknown y for that specific combination of (x_1, x_2, \dots) values.

Here, realize, for making predictions, just one row of the test dataset is enough—provided the values of \beta_m‘s are already known.

In contrast, to build a regression model, we need all the N number of rows of the training dataset, where N can be a very large number.

By the time you come to use a regression model, somehow, you have already determined the initially “undetermined” coefficients \beta_m‘s. They have already become known to you. That’s why, all that you need to do now is to just plug in some new values of x_j‘s, and get the prediction \hat{y} out.

This fact provides us with the first important clue:

The use of regression treats \beta_m coefficients as the known values that remain the same for all data-rows. That is, the use of regression treats \beta_m values as constants!

But how do you get to know what values to use for \beta_ms? How do you get to a regressed model in the first place?


5. How do you conduct regression?—i.e., how do you build the regression model?

In regression analysis, you first assume a model (say, Eq. 1 or Eq. 2 above). You then try to determine the values of \beta_0 and \beta_1 using some method like the least squares and gradient descent.

I will not go here into why you use the method of the least squares. In briefest: for linear systems, the cost function surface of the sum of the squared errors turns out to be convex (which is not true for the logistic regression model), and the least squares method gives you the demonstrably “best” estimate for the coefficients. Check out at least this much, here [^]. If interested, check out the Gauss-Markov theorem it mentions on the ‘net.

I also ask you to go, teach yourself, the gradient descent method. The best way to do that is by “stealing” a simple-to-understand Python code. Now, chuck aside the Jupyter notebook, and instead use an IDE, and debug-step through the code. Add at least two print statements for every Python statement. (That’s an order!)

Once done with it, pick up the mathematical literature. You will now find that you can go through the maths as easily as a hot knife moves through a piece of an already warm butter. [I will try to post a simple gradient descent code here in the next post, but no promises.]

So, that’s how you get to determine the (approximate but workable) values of \beta_m coefficients. You use the iterative technique of gradient descent, treating regression as a kind of an optimization problem. [Don’t worry if you don’t know the latter.]

“This is getting confusing. Tell me the real story” you say?

Ok. Vaguely speaking, the real story behind the gradient descent and the least squares goes something like in the next section.


6. How the GD works—a very high-level view:

To recap: Using a regression model treats the \beta_m coefficients as constants. Reaching (or building) a regression model from a dataset treats the \beta_m coefficients as variables. (This is regardless of whether you use gradient descent or any other method.)

In the gradient descent algorithm (GD), you choose a model equation like Eq. 1 or Eq. 2.

Then, you begin with some arbitrarily (but reasonably!) chosen \beta_m values as the initial guess.

Then, for each row of the dataset, you substitute the guessed \beta_m in the chosen model equation. Since y, x_1, and x_2 are known for each row of the dataset, you can find the error for each row—if these merely guessed \beta_m values were to be treated as being an actual solution for all of them.

Since the merely guessed values of \beta_m aren’t in general the same as the actual solution, their substitution on the right hand-side of the model (Eq. 1 or Eq. 2) doesn’t reproduce the left hand-side. Therefore, there are errors:

\epsilon = y - \hat{y}

Note, there is a separate error value for each row of the dataset.

You then want to know some single number which tells you how bad your initial guess of \beta_m values was, when this guess was applied to every row in the dataset as such. (Remember, \beta_m always remain the same for the entire dataset; they don’t change from row to row.) In other words, you want a measure for the ``total error” for the dataset.

Since the row-wise errors can be both positive and negative, if you simply sum them up, they might partially or fully cancel each other out. We therefore need some positive-valued measures for the row-wise errors, before we add all of them together. One simple (and for many mathematical reason, a very good) choice to turn both positive and negative row-wise errors into an always positive measure is to take squares of the individual row-wise errors, and then add them up together, so as to get to the sum of the squared errors, i.e., the total error for the entire dataset.

The dataset error measure typically adopted then is the average of the squares of the row-wise errors.

In this way, you come to assume a cost function for the total dataset. A cost function is a function of the \beta_m values. If there are two \beta‘s in your model, say \beta_0 and \beta_1, then the cost function C would vary as either or both the coefficient values were varied. In general, it would form a cost function surface constructed in reference to a parameters-space.

You then calculate an appropriate correction to be applied to the initially guessed values of \beta_ms, so as to improve your initial guess. This correction is typically taken as being directly proportional to the current value of the cost function for the given dataset.

In short, you take a guess for \beta_m‘s, find the current value of the total cost function C_n for the dataset (which is obtained by substituting the guessed \beta_m values into the assumed model equation). Then, you apply the correction to the \beta_m values, and thereby update them.

You then repeat this procedure, improving your guess during each iteration.

That’s what every one says about the gradient descent. Now, something which they don’t point out, but is important for our purposes:

Note that in any such a processing (using GD or any other iterative technique), you essentially are treating the \beta_m terms as variables, not as constants. In other words:

The process of regression—i.e., the process of model-building (as in contrast to using an already built model)—treats the unknown coefficients \beta_m‘s as variables, not constants.

The variable values of \beta_m‘s values then iteratively converge towards some stable set of values which, within a tolerance, you can accept as being the final solution.

The values of \beta_m‘s so obtained are, thus, regarded as being essentially constant (within some specified tolerance band).


7. How statisticians look at the word “regression”:

Statisticians take the word “regression” primarily in the sense of the process of building a regression model.

Taken in this sense, the word “regression” is not a process of using the final model. It is not a process of taking some value on x-axis, locating the point on the model curve, and reading out the \hat{y} from it.

To statisticians,

Regression is, primarily, a process of regressing from a set of specific (y, x1, x2, \dots) tuples in a given dataset, to the corresponding estimated mean values \hat{y}s.

Thus, statisticians regard regression as the shifting, turning, and twisting of a tentatively picked up curve of \hat{y} vs. x_1, so that it comes to fit the totality of the dataset in some best possible sense. [With more than one parameter or coefficient, it’s a cost function surface.]

To repeat: The process of regressing from the randomly distributed and concretely given y values to the regressed mean values \hat{y}, while following an assumed function-al form for the model, necessarily treats \beta_m‘s as variables.

Thus, \hat{y}, and therefore C (the cost function used in GD) is a function not just of x_{ij}‘s but also of \beta_m‘s.

So, the regression process is not just:

\hat{y} = f( x_1, x_2, \dots).

It actually is:

\hat{y} = f( x_1, x_2, \dots; \beta_0, \beta_1, \beta_2, \dots).

Now, since (x_1, x_2, x_3,\dots) remain constants, it is \beta_m‘s which truly become the king—it is they which truly determine the row-wise \hat{y}‘s, and hence, the total cost function C = C( x_1, x_2, \dots; \beta_0, \beta_1, \beta_2, \dots).

So, on to the most crucial observation:

Since \hat{y} is a function of \beta_ms, and since \beta_ms are regarded as changeable, the algorithm’s behaviour depends on what kind of a function-al dependency \hat{y} has, in the assumed model, on the \beta_m coefficients.

In regression with Eq. 1, \hat{y} is a linear function of \beta_m‘s. Hence, the evolution of the \beta_m values during a run of the algorithm would be a linear evolution. Hence this regression—this process of updating \beta_m values—is linear in nature.

However, in Eq. 2, \beta_m‘s evolve quadratically. Hence, it is a non-linear regression.

Overall,

The end-result of a polynomial regression is a mathematical function which, in general, is nonlinear in {x_j}‘s.

However, the process of regression—the evolution—itself is linear in \beta_m‘s for Eq. 1

Further, no regression algorithm or process ever changes any of the given (x_1, x_2, \dots) or y values.

Statisticians therefore say that the polynomial regression is a subclass of linear regression—even if it is a polynomial in x_j‘s.


8. Homework:

Homework 1:

Argue back with the statisticians.

Tell them that in using Eq. 1 above, the regression process does evolve linearly, but each evolution occurs with respect to another x-axis, say a x'-axis, where x' has been scaled quadratically with respect to the original x. It’s only after this quadratic scaling (or mapping) that we can at all can get a straight line in the mapped space—not in the original space.

Hence, argue that Eq. 1 must be regarded as something like “semi-linear” or “linear-quadratic” regression.

Just argue, and see what they say. Do it. Even if only for fun.

I anticipate that they will not give you a direct answer. They will keep it vague. The reason is this:

Statisticians are, basically, mathematicians. They would always love to base their most basic definitions not in the epistemologically lower-level fact or abstractions, but in as higher-level abstractions as is possible for them to do. (Basically they all are Platonists at heart, in short—whether explicitly or implicitly).

That’s why, when you argue with them in the preceding manner, they will simply come to spread a large vague smile on their own face, and point out to you the irrefutable fact that the cost-function surface makes a direct reference only to the parameters-space (i.e. a space spanned by the variations in \beta_m‘s). The smile would be vague, because they do see your point, but even if they do, they also know that answering this way, they would succeed in having silenced you. Such an outcome, in their rule-book of the intellectuals, stands for victory—theirs. It makes them feel superior. That’s what they really are longing for.

It’s my statistical prediction that most statisticians would answer thusly with you. (Also most any Indian “intellectual”. [Indians are highly casteist a people—statistically speaking. No, don’t go by my word for it; go ask any psephologist worth his salt. So, the word should be: caste-intellectuals. But I look for the “outliers” here, and so, I don’t use that term—scare-quotes are enough.])

In the case this most probable event occurs, just leave them alone, come back, and enjoy some songs. …

Later on, who knows, you might come to realize that even if Platonism diminishes their discipline and personae, what they have developed as a profession and given you despite their Platonism, had enough of good elements which you could use practically. (Not because Platonism gives good things, but because this field is science, ultimately.)

…Why, in a more charitable mood, you might even want to thank them for having done that—for having given you some useful tools, even if they never clarified all their crucial concepts well enough to you. They could not have—given their Platonism and/or mysticism. So, when you are in a sufficiently good mood, just thank them, and leave them alone. … “c’est la vie…”

Homework 2:

Write a brief (one page, 200–300 words) summary for this post. Include all the essentials. Then, if possible, also make a flash card or two out of it. For neat examples, check out Chris Albon’s cards [^].


A song I like:

(Hindi) “gaataa rahe meraa dil…”
Music: S. D. Burman (with R.D. and team’s assistance)
Singers: Kishore Kumar, Lata Mangeshkar
Lyrics: Shailendra


History:
— First Published: 2019.11.29 10:52 IST.

PS: Typos to be corrected later today.

 

Ontologies in physics—10: Objects in QM. Aetherial fields in QM. Particle-in-a-box.

0. Prologue:

The last time we saw the context for, and the scheme of the inductive derivation of, the Schrodinger equation. In this post, we will see the ontology which it demands—the kind of ontological objects there have to be, so that the physical meaning of the Schrodinger equation can be understood correctly.

I wrote down at least 2 or 3 different ways of presentations of the topics for this post. However, either the points weren’t clear enough, or the discussion was going too far away, and I was losing the focus on ontology per say.

That’s why, I have decided to first present the ontology of QM without any justification, and only then to explain why assuming this particular ontology, rather than any other, makes sense. In justifying this ontology, we will have to note the salient peculiarities regarding the mathematical nature of Schrodinger’s equation, as also many relevant quantum mechanical features.

In this post, we will deal with only one-particle quantum systems.

So, let’s get going with the actual ontology first.


1. Our overall view of the QM ontology:

1.1. Introductory remarks:

To specify an ontology of physics is to state the basic types of objects there have to exist in the physical reality, and the basic ways in which they interact, so that the given theory of physics makes sense—the physical phenomena the theory subsumes are identified with appropriate concepts, causal relations, laws, and so, an understanding can be developed for applications, for building new systems that make use of the subsumed phenomena. The basic purpose of physics is to develop understanding so that it can be put to use to build better systems—structures, engines, machines, circuits, devices, gadgets, etc.

Accordingly, we will first give a list of the type of objects that must exist in the physical world so that the quantum mechanical phenomena can be completely described using them. The theory we will assume is Schrodinger’s non-relativistic quantum mechanics of multiple particles, including phenomena like entanglement, but without including the quantum mechanical spin. However, in this post, we will cover those aspects that can be understood (or at least touched upon) using only the single-particle quantum systems.

1.2. The list of objects in our QM ontology:

The list of our QM ontological objects is this:

  • The EC Objects of electrons and protons.
  • A special category of objects called neutrons.
  • The aether filling all of the 3D space where other objects are not, and certain field-conditions present in it; the all-connecting aspect of the physical universe.
  • The photon as a certain kind of a transient condition in the aether, i.e., a virtual object.

Let’s see all of them in detail, one by one, but beginning with the aether first.


2. The aether:

Explaining the concept of the aether and its necessity are full-fledged topics by themselves, and we have already said a lot about the ontology of this background object in the previous posts. So, we will note just a few indicative characteristics of the aether here.

Our idea of the QM aether is exactly the same as that of the EM aether of Lorentz. The only difference is that the aether, when used in QM, the aether is seen as supporting not only the electrostatic fields but also one more type of a field, the complex-valued quantum mechanical field.

To note some salient points about the aether:

  • The aether has no such inertia that it shows up in the electrostatic or quantum-mechanical phenomena. So, in this sense, the aether is non-inertial in nature.
  • It exists in all parts of space where the other QM ontological objects (of electrons, protons and neutrons) are not.
  • It exchanges electrostatic as well as additional quantum-mechanical forces with the electrons and protons, but always by direct contact alone.
  • Apart from the electrostatic and quantum-mechanical forces, there are no other forces that enter into our ontological description. Thus, there is no drag-force exerted by the aether on the electrons, protons or neutrons (basically because the Lorentz aether is not a mechanical aether; it is not an NM-Ontological object). In the non-relativistic QM, we also ignore fields like magnetic, gravitational, etc.
  • All parts of the aether always remains stationary, i.e., no CV of itself translates in space at any time. Even if there is any actual translation going on in the aether, the quantum mechanical phenomena are unable to capture it, and so, a capacity to translate does not enter our ontology.
  • However, unlike in the EM theory, when it comes to QM, we have to assume that there are other motions in aether. In QM, the aether does come to carry a kinetic energy too, whereas in EM, the kinetic energy is a feature of only the massive EC Objects. So, the aether is stationary—but that’s only translation-wise. Yet, even in the absence of net displacements, it does force (and is forced by) the elementary charged objects of the electrons and protons.

We will note further details regarding the fields in the aether as we progress.


3. Electrons and protons:

The view of electrons and protons which we take in the QM ontology is exactly the same as that in the ontology of electrostatics; so see the previous posts in this series for details not again repeated here.

Electrons and protons are seen as elementary point-particles having, within the algebraic sign, the same amount of electrostatic charge e. They set up certain 3D field conditions in the non-inertial aether, but acting in pairs. We may sometimes informally call them as point-charges, but it is to be kept in mind that, strictly speaking, in our view, we do not regard the charge to be an attribute of the point-particle, but only of the aether.

For two arbitrary EC objects (electrons or protons) q_i and q_j forming a pair, there are two fields which simultaneously exist in the 3D aether. None can exist without the other. These fields may be characterized as force-fields or as potential energy fields.

In the interest of clarity in the multi-particle situations, we will now expand on the notation presented earlier in this series. Accordingly,

\vec{\mathcal{F}}(q_i|q_j) is the 3D force field which exists everywhere in the aether. It gives the Coulomb force that q_j experiences from the aether at its instantaneous position \vec{r}_j via direct contact (between the aether and itself). Thus, in this notation, q_j is the forced charge, and q_i is the field-producing charge. Quantitatively, this force-field is given by Coulomb’s law:

\vec{\mathcal{F}}(q_i|q_j) = \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_i q_A}{r_{iA}^2} \hat{r}_{iA}, where q_A = q_j.

Similarly, \vec{\mathcal{F}}(q_i|q_j) is the aetherial force-field set up by q_j and felt by q_i in the same pair, and is given as:

\vec{\mathcal{F}}(q_j|q_i) = \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_j q_A}{r_{jA}^2} \hat{r}_{jA}, where q_A = q_i.

The fields are singular at the location of the forcing charge, but not at the location of the forced charge. Due to the divergence theorem, a given charge does not experience its own field.

There is no self-interaction problem either, because the EC Object (the point-charge) is ontologically a different object from both the aether and the NM objects. Only an NM Object could possibly explode under the self-field, primarily, because an NM Object is a composite. However, an EC Object (of an electron or a proton) is not an NM Object—it is elementary, not composite.

Notice that the specific forces at the positions of the q_i and q_j are equal in magnitude and opposite in directions. However, these two vectors act on two different objects, and therefore they don’t cancel each other. The two vectors also act at two different locations. In any case, in going from these two vectors to the two vector fields, it’s misleading to keep thinking in terms of one force-field as being the opposite of the other! Their respective anchoring locations (i.e. the two singularities) themselves are different, and they have the same signs too!! They are the same 1/(r^2) fields, but spatially shifted so as to anchor into the two charges of a pair.

When there are N number of elementary charged particles in a system, then a given charge q_j will experience the force fields produced by all the other (N-1) number of charges at its position. We can list them all before the pipe | symbol. For instance, \vec{\mathcal{F}}(q_1, q_3, q_4|q_2) is the net field that q_2 feels at its position \vec{r}_2; it equals the sum of the three force-fields produced by the other three charges because of the three pairs in which they act:
\vec{\mathcal{F}}(q_1, q_3, q_4|q_2) = \vec{\mathcal{F}}(q_1|q_2) + \vec{\mathcal{F}}(q_3|q_2) + \vec{\mathcal{F}}(q_4|q_2).

The charges always act pairs-wise; hence there always are pairs of fields; a single field cannot exist. Therefore, any analysis that has only one field (e.g., as in the quantum harmonic oscillator problem or the H atom problem), must be regarded as only a mathematical abstraction, not an existent.

The two fields of a given specific pair both are of the same algebraic sign: both + or both -. However, a given charge q_j may come to experience fields of arbitrary signs—depending on the signs of the other q_i‘s forming those particular pairs.

The electrons and protons thus affect each other via the intervening aether.

In electrostatics as well as in non-relativistic QM, the interaction between charges are via direct contact. However, the two fields of any arbitrary pair of charges shift instantaneously in space—the entirety of a field “moves” when the singular point where it is anchored, moves. Thus, there is no action-at-a-distance in this ontology. However, there are instantaneous changes everywhere in space.

A relativistic theory of QM would include magentic fields and their interactions with the electric fields. It is these interactions which together impose the relativistic speed limit of v < c for all material particles. However, such speed-limiting interaction are absent in the non-relativistic QM theory.

The electron and protons have the same magnitude of charge, but different masses.

The Coulombic force should result in accelerations of both the charges in a pair. However, since the proton is approx. 1846 times more massive than the electron, the actual accelerations (and hence net displacements over a finite time interval) undergone by them are vastly different.

There is a viewpoint (originally put forth by Lorentz, I guess) which says that since the entire interaction proceeds through the aether, there is no need to have massive particles of charge at all. This argument in essence says: We took the attribute of the electric charge away from the particle and re-attributed it to the aether. Why not do the same for the mass?

Here we observe that mass can be regarded as an attribute of the interactions of two *singular* fields in the aether. We tentatively choose to keep the instantaneous location of the attribute of the mass only at the distinguished point of the singularity. In short, we have both particles and the aether. If the need be, we will revisit this aspect of our ontology later on.

The electrostatic aetherial fields can also be expressed via two physically equivalent but mathematically different formulations: vector force-fields, and scalar energy-fields—also called the “potential” energy fields in the Schrodinger QM.

Notation: The potential energy field seen by q_j due to q_i is now on noted, and given, as:

V(q_i|q_j) = \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_i\,q_A}{r_{iA}},

where q_A = q_j, and similarly for the other field of the pair, viz., V(q_j|q_i)

See the previous posts from this series for a certain reservation we have for calling them the potential energy fields (and not just internal energy fields). In effect, what we seem to have here is an interesting scenario:

When we have a pair of charges in the physical 3D space (say an infinite domain), then we have two singular fields existing simultaneously, as noted above. Moving the two charges from their definite positions “to” infinity makes the system devoid of all energy. When they are present at definite positions, their singular fields of V noted above imply an infinite amount of energy within the volume of the system. However, since the system-boundaries for a system of charged point-particles can be defined only at the point-locations where they are present, the work that can be extracted from the system is finite—even if the total energy content is infinite. In short, we have a situation that addition of two infinities results in a finite quantity.

Does this way of looking at the things provide a clue to solve the problem of cancelling infinities in the re-normalization problem? If yes, and if none has put forth a comparably clear view, please cite this work.


4. Neutrons:

Neutrons are massive objects that do not participate in electrostatic interactions.

From very basic, ontological, viewpoint, they could have presented very tricky situations to deal with.

For instance: When an EC Object (i.e., an electron or a proton) moves through the aether, there is no force over and above the one exerted by the Coulombic field on it. But EC Objects are massive particles. So, a tempting conclusion might be to say that the aether exerts no drag force at all on any massive object, and hence, there should be no drag force on the motion of a free neutron either.

I am not clear on such points. But I have certain reservations and apprehensions about it.

It is clear that the aforementioned tempting conclusion does not carry. It is known that the aether does not exert drag on the EC Objects. But an EC Object is different from a chargeless object of the neutron. Even a forced EC Object still has a field singularly anchored in its own position; it is just that in experiencing the forces by the field, the component of its own singular field plays no part (due to the divergence theorem). But the neutron, being chargeless object, has no singular field anchored in its position at all. It doesn’t have a field that is “silent” for its own motions. Since for a forced particle, the forces are exerted by the aether in its vicinity, I am not clear if the neutron should behave the same. May be, we could associate a pair of equal and opposite (positive and negative) fields anchored in the neutron’s position (of arbitrary q_N strength, not elementary), so that it again is chargeless, but can be seen to be interacting with the aether. If so, then the neutron could be seen as a special kind of an EC Object—one which has two equal and opposite aetherial-fields associated with it. In that case, we can be consistent and say that the neutron will not experience a drag force from the aether for the same reason the electron or the proton does not. I am not clear if I should be adopting such a position. I have to think further about it.

So, overall, my choice is to ignore all such issues altogether, and regard the neutrons, in the non-relativistic QM, as being present only in the atomic nucleus at all times. The nucleus itself is regarded, abstractly, as a charged point-particle in its own right.

Thus, effectively, we come regard the nuclear neutrons as just additions of constant masses to the total mass of the protons, and consider this extra-massive positively charged composite as the point-particle of the nucleus.


5. In QM, there is an aetherial field for the kinetic energy:

As stated previously, in addition to the electrostatic fields (mathematically expressed as force-fields or as energy-fields), in QM, the aether also comes to carry a certain time-varying field. The energy associated with these fields is kinetic in nature. That is to say, there should be some motion within the aether which corresponds to this part of the total energy.

We will come to characterize these motions with the complex-valued \Psi(x,t) field. However, as the discussion below will clarify, the wavefunction is only a mathematically isolated attribute of the physically existing kinetic energy field.

We will see that the motion associated with the quantum mechanical kinetic energy does not result in the net displacement of a CV. (It may be regarded as the motion of time-varying strain-fields.)

In our ontology, the kinetic energy field (and hence the field that is the wavefunction) primarily “lives” in the physical 3D space.

However, when the same physics is seen from a higher-level, abstract, mathematical viewpoint, the same field may also be seen as “living” in an abstract 3ND configuration space. Adopting such an abstract view has its advantages in simplifying some of the mathematical manipulations at a more abstract level. However, make a note that doing so also risks losing the richness of the concept of the physical fields, and with it, the opportunity to tackle the unusual features of the quantum mechanical theory right.


6. Photon:

In our view, the photon is a neither a spatially discrete particle nor even a condition that is permanently present in the aether.

A photon represents a specific kind of a transient condition in the aetherial quantum mechanical fields which comes to exist only for some finite interval of time.

In particular, it refers to the difference in the two field-conditions corresponding to a change in the energy eigenstates (of the same particle).

In the last sentence, we should have added: “of the same particle” without parentheses; however, doing so requires us to identify what exactly is a particle when the reference is squarely being made to field conditions. A proper discussion of photons cannot actually be undertaken until a good amount of physics preceding it is understood. So, we will develop the understanding of this “particle” only slowly.

For the time being, however, make a note of the fact that:

In our view, all photons always are “virtual” particles.

Photons are attributes of real conditions in the aether, and in this sense, they are not virtual. But they are not spatially discrete particles. They always refer to continuous changes in the field conditions with time. Since these changes are anchored into the positions of the positively charged protons in the atomic nuclei, and since the protons are point-particles, therefore, a photon also has at least one singularity in the electrostatic fields to which its definition refers. (I am still not clear whether we need just one singularity or at least two.) In short, photon does have point-position(s) as the reference points. Its emission/absorption events cannot be specified without making reference to definite points. In this sense, it does have a particle character.

Finally, one more point about photons:

Not all transient changes in the fields refer to photons. The separation vectors between charges are always changing, and they are always therefore causing transient changes in the system wavefunction. But not all such changes result in a change of energy eigenstates. So, not all transient field changes in the aether are photons. Any view of QM that seeks to represent every change in a quantum system via an exchange of photons is deeply suspect, to say the least. Such a view is not justified on the basis of the inductive context or nature of the Schrodinger equation.

We will now develop the context required to identify the exact ontological nature of the quantum mechanical kinetic energy fields.


7. The form of Schrodinger’s equation points to an oscillatory phenomenon:

Schrodinger’s equation (SE) in 1D formulation reads:

i\,\hbar \dfrac{\partial \Psi(x,t)}{\partial t} =\ -\, \dfrac{\hbar^2}{2m}\dfrac{\partial^2\Psi(x,t)}{\partial x^2} + V(x,t)\Psi(x,t)

BTW, when we say SE, we always mean TDSE (time-dependent Schrodinger’s equation). When we want to refer specifically to the time-independent Schrodinger’s equation, we will call it by the short form TISE. In short, TISE is not SE!

Setting constants to unity, the SE shows this form:
i\,\dfrac{\partial \Psi(x,t)}{\partial t} =\ -\, \dfrac{\partial^2\Psi(x,t)}{\partial x^2} + V(x,t)\Psi(x,t).

Its form is partly comparable to the following two real-valued PDEs:

heat-diffusion equation with internal heat generation:
\dfrac{\partial T(x,t)}{\partial t} =\ \dfrac{\partial^2 T(x,t)}{\partial x^2} + \dot{Q}(x,t),

and the wave equation:
\dfrac{\partial^2 u(x,t)}{\partial t^2} =\ \dfrac{\partial^2 u(x,t)}{\partial x^2} + V(x,t)u(x,t),

Yet, the SE is different from both.

  • Unlike the diffusion equation, the SE has the i sticking out on the left-hand side, and a negative sign (think of it as (i)(i) on the first term on the right hand-side. That makes the solution of SE complex—literally. For quite a long time (years), I pursued this idea, well known to the Monte Carlo Quantum Chemistry community, that the SE is the diffusion equation but in imaginary time it. Turns out that this idea, while useful in simplifying simulation techniques for problems like determining the bonding energy of molecules, doesn’t really help throw much light on the ontology of QM. Indeed, it serves to get at the right ontology more difficult.
  • As to the wave equation, it too has only a partial similarity to SE. We mentioned the last time the main difference: In the wave PDE, the time differential is to the second order, whereas in the SE, it is to the first order.

The crucial thing to understand here is (and I got it from Lubos Motl’s blog or replies on StackExchange or so) that even if the time-differential is to the first-order, you still get solutions that oscillate in time—if the wave variable is regarded as being full-fledged complex-valued.

The important lesson to be drawn: The Schrodinger equation gives the maths of some kind of a vibratory/oscillatory system. The term “wavefunction” is not a misnomer. (Under the diffusion equation analogy, for some time, I had wondered if it shouldn’t be called “diffusionfunction”. That way of looking at it is wrong, misleading, etc.)

So, to understand the physics and ontology of the SE better, we need to understand vibrations/oscillations/waves better. I don’t have the time to do it here, so I refer you to David Morin’s online draft book on waves as your best free resource. A good book also seems to be the one by Walter Fox Smith’s “Waves and Oscillations, a Prelude to QM” though I haven’t gone through all its parts (but what exactly is his last name?). A slightly “harder” book but excellent, at the UG level, and free, comes from Howard Georgi. Mechanical engineers could equally well open their books on vibrations and FEM analysis of the same. For real quick notes, see Allan Bower’s UG course notes on this topic as a part of his dynamics course at the Brown university.


8. Ontology of the quantum mechanical fields:

8.1. Schrodinger’s equation has complex-valued fields of energies:

OK. To go back to Schrodinger’s equation:

i\,\hbar \dfrac{\partial \Psi(x,t)}{\partial t} =\ -\, \dfrac{\hbar^2}{2m} \dfrac{\partial^2\Psi(x,t)}{\partial x^2} + V(x,t)\Psi(x,t) = (\text{a real-valued constant}) \Psi(x,t).

As seen in the last post, the scheme of derivation of the SE makes it clear that these terms have come from: the total internal energy, the kinetic energy, and the potential energy, respectively. Informally, we may refer to them as such. However, notice that whereas V(x,t) by itself is a field, what appears in the SE is the term of V(x,t) multiplifed by \Psi(x,t), which makes all the energies complex-valued. Further, since \Psi(x,t) is a field, all energies in the SE also are fields.

If you wish to have real-valued fields of energies, then you have no choice but to divide all the terms in the SE by \Psi(x,t). That’s what we indicated in the last post too. However, note, complex-valued fields cannot still be got rid of; they still enter the calculations.

8.2. Potential energy fields only come from the elementary point-charges:

The V(x,t) field itself is the same as in the electrostatics:

V(x,t) = \dfrac{1}{2} \dfrac{1}{4\,\pi\,\epsilon_0} \sum\limits_{i}^{N}\sum\limits_{j\neq i; j=1}^{N} \dfrac{q_i\,q_j}{r_iA},
where |q_i| = |q_j| = -e, with e being the fundamental electronic charge.

In our QM ontology we postulate that the above equation is logically complete as far as the potential energy field of QM is concerned.

That is to say, in the basic ontological description of QM, we do not entertain any other sources of potentials (such as gravity or magnetism). Equally important, we also do not entertain arbitrarily specified values for potentials (such as the parabolic potential well of the quantum harmonic oscillator, or the well with the sharply vertical walls of the particle-in-a-box model). Arbitrary potentials are mere mathematical abstractions—approximate models—that help us gain insight into some aspects of the physical phenomena; they do not describe the quantum mechanical reality in full. Only the electrostatic potential that is singularly anchored into elementary charge positions, does.

At least in the basic non-relativistic quantum mechanics, there is no scope to accommodate magnetism. The gravity, being too weak, also is best neglected. Thus, the only potentials allowed are the singular electrostatic ones.

We shall revisit this issue of the potentials after we solve the measurement problem. From our viewpoint, the mainstream QM’s use of arbitrary potentials of arbitrary sources is fine too, as the linear formulation of the mainstream QM turns out to be a limiting case of our nonlinear formulation.

8.3. What physically exists is only the complex-valued internal energy field:

Notice that according to our QM ontology, what physically exists is only the single field of the complex-valued total internal energy field.

Its isolation into different fields like the potential energy field, the kinetic energy field, the momentum field, or the wavefunction field, etc. are all mathematically isolated quantities. These fields do have certain direct physical referents, but only as aspects or attributes of the total internal energy field. They do have a physical existence, but their existence is not independent of the total internal energy field.

Finally, note that the total internal energy field itself exists only as a field condition in the aether; it is an attribute of the aether; it cannot exist without the aether.


9. Implications of the complex-valued nature of the internal energy field:

9.1. System-level attributes to spatial fields—real- vs. complex-valued functions:

Consider an isolated system—say the physical universe. In our notation, E denotes the aspatial global attribute of its internal energy. Think of a perfectly isolated box for a system. Then E is like a label identifying a certain quantity of joule slapped on to it. It has no spatial existence inside the box—nor outside it. It’s just a device of book-keeping.

To convert E into a spatially identifiable object, we multiply it by some field, say F(x,t). Then, E F(x,t) becomes a field.

If F(x,t) is real-valued, then \int\limits_{\Omega_\text{small CV}} \text{d}\Omega_\text{small CV}\, E\,F(x,t) gives you the amount of E present in a small CV (which is just a part of the system, not the whole). To fix ideas, suppose you have a stereo boom-box with two detachable speakers. Then, the volume of the overall boombox is a sum of the volumes of each of its three parts. The volume is a real-valued number, and so, the total volume is the simple sum of its parts V = V_1 + V_2 + V_3. Ditto for the weights of these parts. Ditto, for the energy in a volumetric part of a system if the energy forms a real-valued field.

Now, when the field is complex-valued, say denoted as \tilde{F}(x,t), then the volume integral still applies. \int\limits_{\Omega_\text{small CV}} \text{d}\Omega_\text{small CV}\, E\,\tilde{F}(x,t) still gives you the amount of the complex valued quantity E\tilde{F}(x,t) present in the CV. But the fact that \tilde{F} is complex-valued means that there actually are two fields of E inside that small CV. Expressing \tilde{F}(x,t) = a(x,t) + i b(x,t), there are two real-valued fields, a(x,t) and b(x,t). So, the energy inside the small CV also has two energy components: E_R = E a(x,t) and E_I = E b(x,t), which we call “real” and “imaginary”. Actually, physically, they both are real-valued. However, the magnitude of their net effect |E \tilde{F}(x,t)| != E_R + E_I. Instead, it follows the Pythagorean theorem all the way to the positive sign: |E \tilde{F}| = |\sqrt{E_R^2 + E_I^2}|. (Aren’t you glad you learnt that theorem!)

If you take it in a naive-minded way, then E can be greater or smaller than E_R + E_I, and so things won’t sum up to |E \tilde{F}|—conservation seems to fail.

But in fact, energy conservation does hold. It’s just that it follows a further detailed law of combining the two field components within a given CV (or the entire system).

In QM, the wavefunction \Psi(x,t) plays the role of \tilde{F} given above. It brings the aspatial energy E from its Platonic mathematical “heaven” and, further, being a field itself, also distributes it in space—thereby giving a complex-valued field of E.

We do not know the physical mechanism which manipulates the real and imaginary parts \Psi_R(x,t) and \Psi_I(x,t) so that they come to obey the Pythogorean theorem. But we know that unless we have \Psi(x,t) as complex-valued, the book-keeping of the system’s energy does not come out right—in QM, that is.

Since the product E_{\text{sys}}\Psi(x,t) can come up any time, and since what ontologically exists is a single quantity, not a product of two, it’s better to have a different notation for it. Accordingly, define:

\tilde{E}(x,t) = E_{\text{sys}}\,\Psi(x,t)

9.2. In QM, the conserved quantity itself is complex-valued:

Note an important difference between pre-quantum mechanics and QM:

The energy conservation principle for the classical (pre-quantum) mechanics says that E_{\text{sys}} = \int\limits_{\Omega} \text{d}\Omega E(x,t) is conserved.
The energy conservation principle for quantum mechanics is that \tilde{E}_{\text{sys}} = \int\limits_{\Omega} \text{d}\Omega \tilde{E}(x,t) is conserved.

No one says it. But it is there, right in the context (the scheme of derivation) of the Schrodinger equation!

For the cyclic change, we started from the classical conservation statement:
\oint \text{d}E_{\text{sys}} = 0 = \oint \text{d}T_{\text{sys}} + \oint \text{d}\Pi_{\text{sys}}

Or, in differential terms (for an arbitrary change, not cyclic):
\text{d}E_{\text{sys}} = 0 = \text{d}T_{\text{sys}} + \text{d}\Pi_{\text{sys}}.

Or, integrating over the end-points of an arbitrary process,
E_{\text{sys}} = \text{a constant (real-valued) number}.

We then multiplied both sides by \Psi(x,t) (remember the quizzical-looking multiplication from the last post?), and only then got to Schrodinger’s equation. In effect, we did:
\text{d}E_{\text{sys}}\Psi(x,t) = 0 = \text{d}T_{\text{sys}}\Psi(x,t) + \text{d}\Pi_{\text{sys}}\Psi(x,t).

That’s nothing but saying, using the notation introduced just above, that:
\text{d}\tilde{E}(x,t) = 0 = \text{d}\tilde{T}(x,t) + \text{d}\tilde{\Pi}(x,t).

Or, integrating over the end-points of an arbitrary process and over the system volume,
\tilde{E}_{\text{sys}} = \text{a constant complex number}.

So, what’s conserved is not E but \tilde{E}.

The aspatial, global, thermodynamic number for the total internal energy is the complex number \tilde{E}_{\text{sys}} in QM. QM by postulation comes with two coupled real-valued fields together obeying the algebra of complex numbers.


10. Consequences of conservation of complex-valued energy of the universe:

10.1. There is a real-valued measure of quantum-mechanical energy which is conserved too:

In QM, is there a real-valued number that gets conserved too? if not by postulate then at least by consequence?

Answer: Well, yes, there is. But it loses the richness of the physics of complex-numbers.

To obtain the conserved real-valued number, we follow the same procedure as for “converting” a complex number to a real number, i.e., extracting a real-valued and essential feature of a complex number. We take its absolute magnitude. If \tilde{E}_{\text{sys}} is a constant complex number, then obviously, |\tilde{E}_{\text{sys}}| is a constant number too. Accordingly,

|\tilde{E}_{\text{sys}}| = |\sqrt{\tilde{E}_{\text{sys}}\,\tilde{E}_{\text{sys}}^{*}}| = \text{another, real-valued, constant}.

But obviously, a statement of this kind of a constancy has lost all the richness of QM.

10.2. The normalization condition has its basis in the energy conservation:

Another implication:

Since |\tilde{E}_{\text{sys}}| itself is conserved, so is |\tilde{E}_{\text{sys}}|^2 too.

[An aside to experts: I think we thus have solved the curious problem of the arbitrary phase factors in quantum mechanics, too. Let me know if you disagree.]

It then follows, by definitions of \tilde{E}_{\text{sys}}, \tilde{E} and \Psi(x,t), that

\int\limits_{\Omega}\text{d}\Omega\,\Psi(x,t)\Psi^{*}(x,t) = 1

Thus, the square-normalization condition follows from the energy conservation principle.

We believe this view places the normalization condition on firm grounds.

The mainstream QM (at least as presented in textbooks) makes reference to (i) Born’s postulate for the probability of finding a particle in an elemental volume, and (ii) conservation of mass for the system (“the electron has to be somewhere in the system”).

In our view, the normalization condition arises because of conservation of energy alone. Conservation of mass is a separate principle, in our opinion. It applies to the attribute of mass of the EC Object of elementary charges. But not to the aetherial field of \Psi. Ontologically, the massive EC Objects and the aether are different entities. Finally, the probabilistic notions of particle position have no relevance in deriving the normalization condition. You don’t have to insert the measurement theory before imposing the normalization condition. Indeed, the measurement postulate comes way later.

Notice that the total complex-valued number for the energy of the universe remains constant. However, the time-dependence of \Psi(x,t) implies that the aether, and hence the universe forever remains in a state of oscillatory motions. (In the nonlinear theory, the system remains oscillatory, but the state evolutions are not periodic. Mark the difference between these two ideas.)

10.3. The wavefunction of the universe is always in energy eigenstates.

Another interesting consequence of the energy conservation principle is this:

Consider these two conclusions: (i) The universe is an isolated system; hence, its energy is conserved. (ii) There is only one aether object in the universe; hence, there is only one universal wavefunction.

A direct consequence therefore is this:

For an isolated system, the system wavefunction always remains in energy eigenstates. Hence, every state assumed by the universal wavefunction is an energy eigenstate.

Take a pause to note a few peculiarities about the preceding statement.

No, this statement does not at all reinforce misconceptions (see Dan Styer’s paper, here: [^][Preprint PDF ^])

The statement refers to isolated systems, including the universe. It does not refer to closed or open systems. When matter and/or energy can cross system boundaries, a mainstream-supposed “wavefunction” of the system itself may not remain in an energy eigenstate. Yet, the universe (system plus environment) always remains in some or the other energy eigenstate.

However, the fact that the universal wavefunction is always in an energy eigenstate does not mean that the universe always remains in a stationary state. Notice that the V(x,t) itself is time-dependent. So, the time-changes in it compel the \Psi to change in time too. (In the language of mainstream QM: The Hamiltonian operator is time-dependent, and yet, at any instant, the state of the universe must be an energy eigenstate.)

In our view, due to nonlinearity, V(x,t) also is an indirect function of the instantaneous \Psi(x,t). Will cover the nonlinearity and the measurement problem the next time. (Yes, I am extending this series by one post.)

Of course, at any instant, the integral over the domain of the algebraic sum of the kinetic and the potential energy fields is always going to come to the single number which is: the aspatial attribute of the total internal energy number for the isolated system.

10.4. The wavefunction \Psi(x,t) is ontic, but only indirectly so—it’s an attribute of the energy field, and hence of the aether, which is ontic:

So, is the wavefunction ontic or epistemic? It is ontic.

An attribute does not have a physical existence independent of, or as apart from, the object whose attribute it is. However, this does not mean that an attribute does not have any physical existence at all. Saying so would be a ridiculously simple error. Objects exist, and they exist as identities. The identity of an object refers to all its attributes—known and unknown. So, to say that an object exists is also to say that all its attributes exist (with all their metaphysically existing sizes too). It is true that blueness does not exist without there being a blue object. But if a blue object exist, obviously, its blueness exists in the reality out there too—it exists with all the blue objects. So, “things” such as blueness are part of existence. Accordingly, the wavefunction is ontic.

Yet, the isolation (i.e. identification) of the wavefunction as an attribute of the aether does require a complex chain of reasoning. Ummm… Yes, literally complex too, because it does involve the complex-valued SE.

The aether is a single object. There are no two or more aethers in the universe—or zero. Hence, there is only a single complex-valued field of energy, that of the total internal energy. For this reason, there is only one wavefunction field in the universe—regardless of the number of particles there might be in it. However, the system wavefunction can always be mathematically decomposed into certain components particular to each particle. We will revisit this point when we cover multi-particle quantum systems.

10.5. The wavefunction \Psi(x,t) itself is dimensionless:

In our view, the wavefunction, i.e., \Psi(x,t) itself is dimensionless. We base this conclusion on the fact that while deriving the Schrodinger equation, where \Psi(x,t) gets introduced, each term of the equation is regarded as an energy term. Since each term has \Psi(x,t) also appearing in it (and you cannot get rid of the complex nature of the Schrodinger equation merely by dividing all terms by it), obviously, the multiplying factor of \Psi(x,t) must be taken as being dimensionless. That’s how we in fact have proceeded.

The mainstream view is to assign the dimensions of \dfrac{1}{\sqrt{\text{(length)}^d}}, where d is the dimensionality of the embedding space. This interpretation is based on Born’s rule and conservation of matter; for instance, see here [^].

However, as explained in the sub-section 10.2., we arrive at the normalization condition from the energy conservation principle, and not in reference to Born’s postulate at all.

All in all, \Psi(x,t) is dimensionless. It appears in theory only for mathematical convenience. However, once defined, it can be seen as an attribute (aspect) of the complex-valued internal energy field (and its two components, viz. the complex-valued kinetic- and potential-energy fields). In this sense, it is ontic—as explained in the preceding sub-section.


11. Visualizing the wavefunction and the single particle in the PIB model:

11.1. Introductory remarks:

What we will be doing in this section is not ontology, strictly speaking, but only physics and visualization. PIB stands for: Particle-In-a-Box. Study this model from any textbook and only then read further.

The PIB model is unrealistic, but pedagogically useful. It is unrealistic because it uses a potential energy distribution that is not singularly anchored into point-particle positions. So, the potential energy distribution must be seen as a mathematically convenient abstraction. PIB is not real QM, in short. It’s the QM of the moron, in a way—the electron has no “potential” inside the well.

11.2. The potential energy function used in the model:

The model says that there is just one particle in a finite interval of space, and its V(x,t) always stays the same at all times. So, it uses V(x) in place of V(x,t).

The V(x) is defined to be zero everywhere in the domain except at the boundary-points, where the particle is supposed to suddenly acquire an infinite potential energy. Yes, the infinitely tall walls are inside the system, not outside it. The potential energy field is the potential energy of a point-particle, and unless it were to experience an infinity of potential energy while staying within the finite control volume of the system, no non-trivial solution would at all be possible. (The trivial solution for the SE when V(x) = 0 is that \Psi(x,t) = 0—whether the domain is finite or infinite.) In short, the “side-walls” are included in the shipped package.

If the particle is imagined to be an electron, then why does its singular field not come into picture? Simple: There is only one electron, and a given EC Object (an elementary point-charge) never comes to experience its own field. Thus, the PIB model is unrealistic on another ground: In reality, force-fields due to charges always come in pairs. However, since we consider only one particle in PIB, there are no singular force-fields anchored into a moving particle’s position, in it, at all.

Yes, forces do act on the particle, but only at the side-walls. At the boundary points, it is a forced particle. Everywhere else, it is a free particle. Peculiar.

The domain of the system remains fixed at all times. So, the potential walls remain fixed in space—before, during, and after the particle collides with them.

The impulse exerted on the particle at the time of collision at the boundary is theoretically infinite. But it lasts only for an infinitesimally small patch of space (which is represented as the point of the boundary). Hence, it cannot impart an infinity of velocity or displacement. (An infinitely large force would have to act over a finite interval of space and time before it could possibly result in an infinitely large velocity or displacement.)

OK. Enough about analysis in terms of forces. To arrive at the particular solution of this problem using analytical methods (as with most any other advanced problem), energy-analytical methods are superior. So, we go back to the energy-based analysis, and Schrodinger’s equation.

11.3. TDSE as a continuous sequence of TISE’s:

Note that you can always apply the product ansatz to \Psi(x,t), and thereby split it into two functions:

\Psi(x,t) = \chi(x)\tau(t),

where \chi(x) is the space-dependent part and \tau(t) is the time-dependent part.

No one tells you, but it is true that:

Even when the Hamiltonian operator is time-dependent, you can still use the product ansatz separately at every instant.

It is just that doing so is not very useful in analytical solution procedures, because both the \chi(x) and \tau(t) themselves change in time. Therefore, you cannot take a single time-dependent function \tau(t) as applying at all times, and thereby simplify the differential equation. You would have to progress the solution in time—somehow—and then again apply the product ansatz to obtain new functions of \chi(x) and \tau(t) which would be valid only for the next instant in the continuous progression of such changes.

So, analytical solution procedures do not at all benefit from the product ansatz when the Hamiltonian operator is time-dependent.

However, when you use numerical approaches, you can always progress the solution in time using suitable methods, and then, whatever \Psi(x,t)\big|_{t_n} you get for the current time t_n, you can regard it as if it were solving a TISE which was valid for that instant alone.

In other words, the TDSE is seen as being a continuous progression of different instantaneous TISE’s. Seen this way, each \Psi(x,t)\big|_{t_n} can be viewed as representing an energy eigenstate at every instant.

Not just that, but since there is no heat in QM, the adiabatic approximation always applies. So, for an isolated system or the physical universe:

For an isolated system or the physical universe, the time-dependent part \tau(t) of \Psi(x,t) may not be the same function at all times. Yet, it always progresses through a continuous progression of different \chi(x) and \tau(t)‘s.

We saw in the sub-section 10.3. that the universal wavefunction must always be in energy eigenstates. We had reached that conclusion in reference to energy conservation principle and the uniqueness of the aether in the universe. Now, in this sub-section, we saw a more detailed meaning of it.

11.4. PIB anyway uses time-independent potential energy function, and hence, time-independent Hamiltonian:

When V(x) is time-independent, the time-dependent part \tau(t) stays the same for all times. Using this fact, the SE reduces to one and the same pair of \chi(x) and \tau(t). So, the TISE in this case is very simple to solve. See your textbooks on how to solve the TISE for the PIB problem.

However, make sure to

work through any solution using only the full-fledged complex variables.

The solutions given in most text-books will prove insufficient for our purposes. For instance, if \tau(t) is the time-dependent part of the solution of TISE, then don’t substitute \tau(t) = \cos \omega t in place of the full-fledged \tau = e^{-i\omega t}.

Let the \tau(t) acquire imaginary parts too, as it evolves in time.

The reason for this insistence on the full complex numbers will soon become apparent.

11.5. Use the full-fledged 3D physical space:

To visualize this solution, realize that as in EM so also in QM, even if the problem is advertised as being 1D, it still makes sense to see this one dimension as an aspect of the actually existing 3D physical space. (In EM, you need to go “up” to 3D because the curl demands it. In QM, the reason will become apparent if you do the homework given below.)

Accordingly, we imagine two infinitely large parallel planes for the system boundaries, and the aether filling the space in between them. (Draw a sketch. I won’t. I would have, in a real class-room, but don’t have the enthusiasm to draw pics while writing mere blog-posts. And, whatever happened to your interest in visualization rather than in “math”?) The planes remain fixed in space.

Now, pick up a line passing normally through the two parallel planes. This is our x-axis.

11.6. The aetherial momentum field:

Next, consider the aetherial momentum field, defined by:

\vec{p}(x,t) =\ i\,\hbar\,\nabla\Psi(x,t).

This definition for the complex-valued momentum field is suggested by the form of the complex-valued quantum mechanical kinetic energy field. It has been derived in analogy to the classical expression T = \dfrac{p^2}{2m}.

In our PIB model, this field exists not just on the chosen line of the x-axis, but also everywhere in the 3D space. It’s just that it has no variation along the y– and z-axes.

11.7. Gaining physical clarity (“intuition”) with analysis in terms of forces, first:

In the PIB model, when the massive point-particle of the electron is at some point \vec{r}_j, then it experiences a zero potential force (except at the boundary points).

So, electrostatically speaking, the electron (i.e. the singularity at the EC Object’s position) should not move away from the point where it was placed as part of IC/BCs of the problem. However, the existence of the momentum field implies that it does move.

To see how this happens, consider the fact that \Psi(x,t) involves not just the space-dependent part \chi(x), but also the time-dependent part \Theta(t). So,

The total wavefunction \Psi(\vec{r}_j, t) is time-dependent—it continuously changes in time. Even in stationary problems.

Naturally, there should be an aetherial force-field associated with the aetherial momentum field (i.e. the aetherial kinetic energy field) too. It is given by:

\vec{F}_{T}(x,t) = \dfrac{\partial}{\partial t} \vec{p}_{T}(x,t) = \dfrac{\partial}{\partial t} \left[ i\,\hbar\,\nabla\Psi(x,t) \right],

where the subscript T denotes the fact these quantities refer to their conceptual origins in the kinetic energy field. These _T quantities are over and above those due to the electrostatic force-fields. So, if V were not to be zero in our model, then there would a force-field due to the electrostatic interactions as well, which we might denote as \vec{F}_{V}, where the subscript _V denotes the origin in the potentials.

Anyway, here V(x) = 0 at all internal points, and so, only the quantity of force given by \vec{F}_{T}(\vec{r}_j,t) would act on our particle when it strays at the location \vec{r}_j. Naturally, it would get whacked! (Feel good?)

The instantaneous local acceleration for the elemental CV of the aether around the point \vec{r}_j is given by \vec{a}_{T}(\vec{r}_j,t) = \dfrac{1}{m} \dfrac{\partial \vec{p}_{T}(\vec{r}_j,t)}{\partial t}.

This acceleration should imply a velocity too. It’s easy to see that the velocity so implied is nothing but

\vec{v}_{T}(\vec{r}_j,t) = \dfrac{1}{m} \vec{p}_{T}(\vec{r}_j,t).

Yes, we went through a “circle,” because we basically had defined the force on the basis of momentum, and we had given the more basic definition of momentum itself on the basis of the kinetic energy fields.

11.8. Representing complex-valued fields as spatial entities is logically consistent with everything we know:

Notice that all the fields we considered in the force-based analysis: the momentum field, the force-field, the acceleration field, and the velocity field are complex-valued. This is where the 3D-ness of our PIB model comes handy.

Think of any arbitrary yz-planes in the domain as representing the mathematical Argand-plane. Then, the \Psi(x,t) field at an arbitrary point \vec{r}_j would be a phasor of constant length, but rotating in the same yz-plane at a constant angular velocity, given by the time-dependent part \tau(t).

Homework: Write a Python simulation to show an animation of a few representative phasors for a few points in the domain, following the above convention.

11.9. Time evolution, and the spatial directions of the \Psi(x,t)-based vector fields:

Consider the changes in the \Psi(x,t) field, distributed in the physical 3D space.

Consider that as \tau(t) evolves in time, even if the IC had only a real-valued function like \cos t specified for it, considering the full-fledged complex-valued nature of \tau(t), it would soon enough (with the passage of an infinitesimal amount of time), acquire a so-called “imaginary” component.

Following our idea of representing the real- and imaginary-components in the y– and z-axes, the \Psi(x,t) field no longer remains confined to a variation along the x-axis alone. It also has variations along the plane normal to the x-axis.

Accordingly, the unit vectors for the grad operator, and hence for all the vector quantities (of momentum, velocity, force and acceleration) also acquire a definite orientation in the physical 3D space—without causing any discomfort to the “math” of the mainstream quantum mechanics.

Homework: Consider the case when \Psi(x,t) varies along all three spatial axes. An easy example would be that of the hydrogen atom wavefunction. Verify that the spatial representation of the vector fields (momentum, velocity, force or acceleration) proposed by us causes no harm to the the “math” of the mainstream quantum mechanics.

If doing simulations, you can integrate in time (using a suitable time-stepping technique), and come to calculate the instantaneous displacements of the particle, too. Exercise left for the reader.

Homework: Perform both analytical integration and numerical for the PIB model. Verify that your simulation is correct.

Homework: Build an animation for the motion of the point-particle of the EC Object, together with the time-variations of all the complex-valued fields: \Psi(x,t), and all the complex-valued vector fields derived from it.

11.10. Too much of homework?

OK. I’ve been assigning so many pieces for the homework today. Have I completed any one of them for myself? Well, actually not. But read on, anyway.

The locus of all possible particle-positions would converge to a point only at the boundary points (because \Psi(x,t) = 0 there. At all the internal points in the domain, the particle-position should be away from the x-axis.

That’s my anticipation, but I have not checked it. In fact, I have not built even a single numerical simulation of the sort mentioned here.

So, take this chance to prove me wrong!

Please do the homework and let me know if I am going wrong. Thanks in advance. (I have to finish this series first, somehow!)


12. What the PIB model tells about the wave-particle duality:

What happened to the world-famous wave-particle duality? If you build the animations, you would know!

There is a point-particle of the electron (which we regard as the point of the singularity in the \vec{\mathcal{F}} field), and there is an actual, 3D field of the internal energy fields—and hence of \Psi(x,t). And, assuming our hypothesis of representing phasors of the complex numbers via a spatial representation, of all the complex-valued fields—including the vector fields like displacement.

The particle motion is governed by both the potential energy-forces and the kinetic energy-forces. That is, the aetherial wavefunction “guides” etc. the particle. In our view, the kinetic energy field too forces the particle.

“Ah, smart!,” you might object. “And what happened to the Born rule? If the wavefunction is a field, then there is a probability for finding the particle anywhere—not just at the position where it is, as predicted in this model. So, your model is obviously dumb!! It’s not quantum mechanics at all!!!”

Hmmm… We have not solved the measurement problem yet, have we?

We will need to cover the many-particle QM first, and then go to the nonlinearity implied by the kinetic energy field-forces, and only then would we be able to present our solution to the measurement problem. Since I got tired of typing (this post is already ~9,500 words), I will cover it in some other post. I will also try to touch on entanglement, because it would come in the flow of the coverage.

But in the meanwhile, try to play with something.

Homework: “Invert” the time-displacement function/relationship you obtain for the PIB model, and calculate the time spent by the particle in each infinitesimally small CV of the 3D domain, during a complete round-trip across the domain. Find its x-component. See if you can relate the motion, in any way, to the probability rule given by Born (i.e., try to anticipate our next development).

Do that. This way, you will stay prepared to spot if I have made any mistakes in this post, and also if I make any further mistakes in the next—and have made any mistakes in the last post as well.

Really. I could easily have made a mistake or two. … These matters still are quite new to me, and I really haven’t worked out the maths of everything ahead of writing these posts. That’s why I say so.


13. A preview of the things to come:

I had planned to finish this series in this post. In a sense, it is over.

The most crucial ontological aspects have already been given. Starting from the comprehensive list of the QM objects, we also saw that the quantum mechanical aetherial fields are all complex-valued; that there is an additional kinetic energy field too, not just potential; and also saw our new ideas concerning how to visualize the complex-valued fields by regarding the Argand plane as a mathematical abstraction of a real physical plane in 3D. We also saw how these QM ontological objects come together in a simple but fairly well illustrative problem of the PIB. We even touched on the wave-particle duality.

So, as far as ontology is concerned, even the QM ontology is now essentially over. There might be important repercussions of the ontological points we discussed here (and, also before, in this series). But as far as I can see, these should turn out to be mostly consequences, not any new fundamental points.

Of course, a lot of physics issues still remain to be clarified. I would like to address them too.

So, while I am at it, I would also like to say something about the following topics: (i) Multi-particle quantum systems. (ii) Issue of the 3D vs. 3ND nature of the wavefunction field. (iii) Physics of entanglement. (iv) Measurement problem.

All these topics use the same ontology as used here. But saying something about them would, I hope, help understand it better. Applications always serve to understand the exact scope and the nuances of a theory. In their absence, a theory, even if well specified, still runs the risk of being misunderstood.

That’s why I would like to pick up the above four topics.

No promises, but I will try to write an “extra” post in this series, and finish off everything needed to understand the points touched upon in the Outline document (which I had uploaded at iMechanica in February this year, see here [^]). Unlike until now, this next post would be mostly geared towards QM experts, and so, it would progress rapidly—even unevenly or in a seeming “broken” manner. (Experts would always be free to get in touch with me; none has, in the 8+ months since the uploading of the Outline document at iMechanica.)

I would like it if this planned post (on the four physics topics from QM) forms the next post on this blog, but then again, as I said, no promises. There might be an interruption with other topics in the meanwhile (though I would try to keep them at the bay). Plus, I am plain tired and need a break too. So, no promises regarding the time-frame of when it might come.

OK.

So, do the homework, and think about the whole thing. Also, brush up on the topic of coupled oscillations, say from David Morin/Walter Fox Smith/Howard Georgi, or even as covered in the FEM modeling of idealized spring-mass systems. Do that, so that you are ready for the next post in this series—whenever it comes.

In the meanwhile, sure feel free to drop in a comment or email if you find that I am going wrong somewhere—especially in the maths of it or its implications. Thanks in advance.

Take care, and bye for now.


A song I like:

(Marathi) “aalee kuThoonashee kanee taaLa mrudungaachi dhoona”
Music and Singer: Vasant Ajgaonkar
Lyrics: Sopandev Chaudhari

 


History:
— First published: 2019.11.05 17:19 IST.
— Added the sub-section 10.5. and the songs section. Corrected LaTeX typos. the same day at 20:31 IST.
— Expanded the section 11. considerably, and also added sub-section titles to it. Revised also the sections 12. and 13. Overall, a further addition of approx. 1,500 words. Also corrected typos. Now, unless there is an acute need even for typo-corrections (i.e. if something goes blatantly in an opposite direction than the meaning I had in mind), I would leave this post in the shape in which it is. 2019.11.06 11:06 IST.

Ontologies in physics—9: Derivation of Schrodinger’s equation: context, and essential steps

Updates (corrections, additions, revisions) have been made by 2019.10.28 10:52 IST. Only one of them is explicitly noted, but the others are still there (too many to note separately). However, the essentials of the basic points are kept as they were.


1. The cavity-radiation spectrum:

The continuous spectral-intensity curve for the cavity radiation was established empirically.

Now, before you jump to the Rayleigh-Jeans efforts, or the counting of the EM normal modes in the abstract space, as modern (esp. American) textbooks are wont to do, please take a moment to note an opinion of mine.

1.1 “Cavity-radiation” as a far more informative term:

I believe that we must first come to appreciate the late 19-th century applied physicists (mostly in Germany) who rightly picked up the cavity radiation as the right phenomenon for understanding the matter-light interactions.

Their motivation in studying the cavity radiation was to have a good datum in theory, a good theoretical standard, so that more efficient incandescent light bulbs could be produced for better profits by private businesses. This motivation ultimately paved the way for the discovery of quantum mechanics.

As to the nomenclature of the object/phenomenon they standardized, in my opinion, the term “cavity radiation” is far more informative than the term typically used in modern textbooks, viz. “black-body radiation”. Two reasons:

(i) Only a negligibly small hole on the surface of the cavity acts as a black-body; the rest of the cavity surface does not. Any other approximate realization of the perfectly black-body using a solid object alone, is not as good a choice, because the spectrum a solid body produces depends on the material of the solid (cf. Kirchhoff). The cavity spectrum, however, is independent of the wall-material; its spectrum is dominated by the effects due to pure aether in the cavity than those by the solid wall-material.

(ii) The “cavity-ness” of the body helps in theoretical analysis too, because unlike a structure-less solid continuum, the cavity has very easily demarkable regions for the matter and the eather, i.e., a region each for the material electric oscillators and the light-field. Since the spatial regions occupied by each participating phenomenon is generally different, their roles can be idealized away easily. This is exactly how, in theory, we take away the mass of a mechanical spring as also the stresses in a finite ball, and reach the idealized mechanical system of a massless-spring attached to a point-mass.

1.2 The problem for the theory:

The late 19th-century physicists did a lot of good experimental studies and arrived at the continuous spectrum of the cavity radiation.

The question now was how to explain this spectrum on the basis of the two most fundamental theories known at the time, viz., classical electrodynamics (relevant, because light was known to be EM waves), and thermodynamics including statistical mechanics (relevant, because the cavity was kept heated uniformly to the same temperature, so as to ensure thermal equilibrium between the light field and the cavity walls).

When classical electromagnetic theory was applied to this cavity radiation problem, it could not reproduce the empirically observed curve.

The theory, by Rayleigh and Jean, led to the unrealistic prediction of the ultraviolet catastrophe. It predicted that as you go towards on the higher frequencies side on the graph of the spectrum, the power being emitted by a given frequency (or the infinitesimally small band of frequencies around it) would go on increasing without any upper bound.

Physically speaking, the thermal energy used in keeping the cavity walls heated would be drained by the radiation field in such a fantastic way that the absolute temperature of the internal surface of the cavity wall would have to approach zero. (Notice, the cavity wall is a part of the system here; the environment contains the heat source but not the wall.) No amount of heat supplied at the system surface would be enough to fill the hunger of the aether to convert more and more of the thermal energy of the wall into radiation within itself. Mind you, this circumstance was being proposed by an analysis that was based on a thermodynamic equilibrium. Given a finite supply of heat from the surroundings to the system, the total increase in the internal energy of the system would be finite during any time interval. But since all thermal energy of the wall is converted into light of ever high (even infinitely high) frequencies, none would be left at the internal surface of the wall. In short, a finite wall would develop an infinite temperature gradient at the internal boundary surface.

This was the background when Planck, an accomplished thermodynamicist, picked up this problem for his attention.


2. Planck’s hypothesis:

Note again, in analysis, cavity walls are regarded as being at a thermodynamic equilibrium with the surroundings (so that they maintain a constant and uniform temperature on the entirety of the system boundary just outside of the wall), and also with the light field contained within the cavity (so that an analysis of the electrical oscillators inside the metallic wall of the cavity can provide a clue for the unexpected spectrum of the light).

That’s why, the starting point of Planck’s theorization was not the light field itself but the electrical oscillators in the metal of the wall. The analysis would be conducted for the solid metal, even if, eventually, predictions would be made only for the light field.

On the basis of statistical mechanics, and some pretty abstract “curve-fitting” of sorts [he was not just a skilled “Data Scientist”; he was a gifted one], Planck found that if the energy of the material oscillators were to be, somehow, quantized using the relation:

E = h\nu = \hbar \omega,

then the resulting energy distribution over the various frequencies would be identical to what was observed experimentally. Here, h is the constant Planck used for his abstract “curve-fitting”, \nu is the frequency of the cavity light, \hbar = \dfrac{h}{2\pi} is the modified Planck constant (aka the “reduced” Planck constant because its value is smaller than h), and \omega = 2\pi\nu is the angular frequency of the cavity light.

Ontologically, the significant fact to be noted here is that this analysis deals with the material oscillators, but ends up making an assertion—a quantitative prediction—about the nature of light. The analysis can be justified because the wall and the aether are assumed to be in a thermodynamic equilibrium.

Planck’s original formula was: E = (n+1/2) h \nu where n = 1, 2, 3, \dots. However, in the interest of simplicity (of isolating the relevant ontological issues), we have for now set n = 1 and ignored the 1/2 constant.

Homework: Try to relate the ignored quantities with phenomena such as quantum superposition, zero-temperature energy, etc. Hint: Don’t worry about many particles, whether distinguishable or indistinguishable, or phenomena like entanglement specific to many-particle systems.


3. Photoelectric effect: Einstein’s relation for a monochromatic light:

3.1 The Einstein relation:

On the basis of Planck’s energy-quantization hypothesis, which was eventually regarded as governing the light phenomenon itself (and not just the energies of the electrical oscillators inside the cavity wall), Einstein derived the relation:

\vec{p} = \hbar \vec{k}

where \vec{p} is the momentum associated with a monochromatic light wave, and \vec{k} = \dfrac{2\pi}{\lambda}\hat{e} is the wavevector, \lambda is the wavelength, and \hat{e} is the unit vector in the direction in which the wave travels.

How did Einstein arrive at this relation?

3.2 Einstein postulates particles of light to explain photo-electric effect:

As seen above, what Planck had postulated (ca 1900) was the quantization of energy for the cavity wall oscillators; hence for the wall-to-light field energy exchange; hence for the light-field itself. Yet, Planck did not propose particles of light in place of the continuous field of light in the cavity.

It was Einstein who postulated (ca. 1905) that the spatially continuous field of light be replaced by hypothetical, spatially discrete, particles of light. (It was G. N. Lewis, the then Dean of Chemistry at Berkeley, who, 21 years later, in 1926, coined the term “photon” for them.)

Einstein thought that a particulate nature of light was necessary in order to explain the existence of discrete steps in the phenomenon he studied, viz., the photo-electric effect. [This is not true; the photo-electric effect involves not just light per say, but energy transfers between light and matter; see our comment near the end of this section.]

Einstein then arrived at an expression for the momentum of a photon.

3.2 Momentum of the photon using the light particle postulate:

According to the theory of special relativity (i.e. the classical EM of Maxwell and Lorentz reformulated by Poincare et al. and published ca. 1905 also by Einstein), the energy for a free (unforced) relativistic massive particle is given by the so-called “E = mc^2” equation that even hippies know about; see, for instance, here [^] for clarification of the mass term involved in it:

E = mc^2

So,

E^2 = (m\,c^2)^2 = (\gamma\,m_0\,c^2)^2 = (p\,c)^2 + (m_0\,c^2)^2,

where E is the relativistic energy of a classical massive particle, m is its relativistic mass, c is the speed of light, \gamma = \dfrac{1}{\sqrt{1-(\dfrac{v}{c})^2}} is the Lorentz factor (which indicates physical phenomena like the Lorentz contraction and time dilation, the Lorentz boost, etc.), m_0 is the rest mass of the particle, and p = \gamma m_0\,v is the relativistic momentum of the particle.

According to Einstein, the theory of special relativity must apply to his light particle just as well as it does to the massive bodies. So, he would have the above-given equation govern his light particle’s dynamics, too. However, realizing that such a particle would have to be massless, Einstein set m_0 = 0 for the light particle. Thus, the above equation became, for his photon,

E^2 = (pc)^2, i.e.,

E = pc.

3.3 Momentum of light waves using the Maxwellian EM:

Actually, the same equation can also be derived assuming the electromagnetic wave nature for light (using Pyonting’s vector etc.). Then the distinctive character of the EM waves highlighted by this relation becomes apparent.

The classical NM-ontological waves (like the transverse waves on strings) do not result in a net transport of momentum, though they do transport energy. That’s because the scalar of energy varies as the square of the wave displacement, but the vector of momentum varies as the displacement vector itself. That’s in the classical NM-ontological waves.

In contrast, the “classical” EM-ontological waves transport momentum too, not just energy. That’s their distinctive feature. See David Morin’s online book on waves for explanations (I think chapter 8).

In short, the relation E = pc is basically mandated by the Maxwell’s theory itself.

The special relativistic relations are just a direct consequence of Maxwell’s theory. (The epistemological scope of the special relativity is identical to that of “classical” EM, not greater.)

All in all, you don’t have to assume a particle of light to have the energy-momentum relation for light, in short.

Einstein, however, took this relation apply to the massless particle of the photon hypothesized by him, as detailed in the preceding discussion.

3.4. Einstein lifts the expression for energy of classical waves, and directly uses it for his particles of light—without any pause or explanation:

Now, from the classical wave theory, \omega = ck for any classical wave, where k = \dfrac{2\,\pi}{\lambda} is the wavevector, and \lambda is the wavelength.

Notice that this relation applies only to the oscillations of the material oscillators in the cavity wall as also to aether-waves, but not to structureless particles of light. Did it bother Einstein? I think not.

Einstein did not put forth any argument to show why or how his light particle would obey the same relation. He gave no explanation for how \omega is to be interpreted in the context of his photons.

The fact of the matter is, if you assume a structure-less photon in an empty space, then you cannot explain how the frequency—a property of waves—can at all be an attribute of a point-particle of a photon. Mark my words: structure-less. Nature performs no local changes over time unless there is an internal structure to a spatially discrete particle. A solid body may have angular momentum, but each infinitesimal point-particle comprising it doesn’t. Something similar, for the photon. Einstein gave no description of the structure of the photon or the physical mechanism why it should carry a frequency attribute. … I should know, because I followed this Einstein-Feynman approach for too long, including during my PhD. The required maths of the wave-vector additions involved in a photon’s propagation through space won’t work unless you presume some internal structure to the photon, some device of keeping track of the net wave-vector by the photon.

Thus, Einstein had \omega = ck for his particles of light too—somehow.

3.5. Einstein reaches the momentum–wavevector relation known by his name:

Einstein then accepted Planck’s quantum hypothesis as being valid for his light-particles too, including the exact relation Planck had for the oscillatory phenomena (including waves):

E = \hbar \omega.

Einstein then substituted the relativistic light particle‘s equation E = pc on the left hand-side of Planck’s hypothesis (even though in Planck’s theory, this E was for oscillations/waves), and the classical wave relation \omega = ck on the right hand-side. Accordingly, he got, for his particle of light:

pc = \hbar ck,

i.e., cancelling out c,

p = \hbar k.

This is called the Einstein relation in QM.

3.6. Einstein as the physicist who introduced the wave-particle duality in physics:

Go through the subsections 3.4 and 3.5 again, and take a moment to realize the nature of what Einstein had done.

Einstein became the first man to put forth the wave-particle duality as an acceptable feature of a theory (and not just a conjecture). In effect, he put forth this duality as an essential feature of physics, because it was introduced at the most fundamental levels of theory. And, he did so without bothering to explain what he meant by it.

I gather that Einstein did not experience hesitation while doing so. (He was, you know Einstein! (That hair! That smile!! That very scientist-ness!!!))


4. Some comments on Einstein’s hypothesis of light particles:

4.1 Waves can explain the photo-electric effect; you don’t need Einstein’s particles to explain it:

To explain the photoelectric effect, it is enough to suppose that the light absorption process occurs in the following way: (1) Light is continuously spread in space as a field (as a “wave”), but its emission or absorption occurs only in spatially discrete regions—these processes occur at atoms. (2) An instance of an absorption process remains ongoing only for a finite period of time, but during this interval, it occurs continuously throughout. (3) The nature of the absorption process is such that it either goes to full completion, or it completely reverses as if no energy exchange had at all occurred.

To anticipate our development in the next post, and to give a caricature of the actual physics involved here: The energy is continuously transferred to an atom from the surrounding field. (Physically, this means that the infinitely spread field of light gets further concentrated at the nucleus of the atom, which serves as the reference point due to the singularity of fields at it.). After the process of the continuous energy transfer gets going, the process, for some reason to be supplied (by solving the measurement problem), snaps to one of the energy eigenvalues in the end (like an electrical switch snaps to either on or off position). With this snapping, the energy transfer process comes to a definite end. Hence the quantum nature of the eigenvalue-to-eigenvalue snapping-out–snapping-in process.

In this way, the absorption process, if it at all goes to completion, still results in only a certain quantum of energy being imparted to the absorber; an arbitrary amount of energy (say one-third of a quantum of energy) cannot be transferred in such a process.

In short, a quantized energy transfer process can still be realized without there being a spatially delimited particle of light travelling in space—as Einstein imagined.

4.2. No one highlighted Einstein’s error of introducing particles for light, because his theory had the same maths:

Notice that Einstein begins with the relativistic equation for a massive particle. In the cavity radiation set-up, this can only mean the electric charges (like the protons and electrons) in the cavity wall.

Even for waves in classical material media (like acoustic waves through air/metal), inertia is only a parameter, and not a variable of the wave dynamics. It co-determines the wave-speed in the medium, but beyond that, it has no other role to play. Inertia does not determine forces being exchanged by the wave phenomenon.

The aether anyway does not show any inertia in any electromagnetic phenomenon. So, Einstein’s assumption of the zero inertial effects in the expression for the energy of the photon is perfectly OK—if at all there is a particulate nature for light.

The existence of the thermodynamic equilibrium between the oscillating material charges in the wall and the waves in the aether implies that out of the total relativistic energy of a massive charge, only the pc part gets exchanged with the aether (in Einstein’s view, with the photon); the m_0\,c^2 part must remain with the massive charged particle in the wall.

The aether in cavity has no other means to acquire energy except as through an exchange with the EC Objects in the wall.

Therefore, the internal energy of the aether increases by a quantity that is numerically equal to the pc component of the energy lost by the massive charge.

Overall, Einstein’s assumption of a spatially discrete particle of light is not at all justified, even though the maths he proposed on that conceptual basis still makes perfect sense—it gives the same expression as that for light waves. See the Nobel laureate W. E. Lamb’s paper “Anti-photon” for fascinating discussion [^].

And, yes, Einstein is the original inventor of the mystical idea of the wave-particle duality.


5. Some remarks on Bohr:

We will skip going into Bohr’s theory of the hydrogen atom, primarily because there are hardly any ontological remarks to be made in reference to this theory other than that it was a very ad-hoc kind of a model—though it did predict to great accuracy some of the most interesting and salient features of the hydrogen atom.

Bohr’s was not an ordinary achievement. What he built was a good theoretical model in place of, and to explain, the mere algebraic correlations of the atomic spectra series as given by those formulae by Balmer, Paschen, et al.

If Bohr’s contribution to QM were to end at his 1913 model, he would have made for an ontologically very uninteresting a figure. Who remembers Jean Perin when it comes to the ontological discussions of the continuum vs. particles-based viewpoints? Perin won a physics Nobel for proving the atomic nature of matter, and yet, no one remembers him, because though Perin did fundamental work, he didn’t raise controversies. The knowledge he created has been silently absorbed in the integrated view of physics. Unlike Bohr and Einstein.

That’s why, the fact that Bohr’s model does not invite too many ontological remarks (other than that it is a very tentative, ad-hoc kind of a model) precisely also is the reason why it is best to ignore Bohr at this stage.

Regardless of the physics issues clarified and raised by his Nobel-winning work, we can’t regard Bohr’s model itself as being irritating—certainly not from an ontological viewpoint.

But Bohr, qua a father figure of the mainstream QM as it happened to get developed, of course is very irritating! All in all, any irritance we experience because of him, must be located in his other thoughts, not in his model of the hydrogen atom.


6. de Broglie’s hypothesis of matter-waves:

6.1. de Broglie postulates matter waves:

Light had long been thought (since the ca. 1801 experiment of interference of light-waves by Young) to have a wave nature—i.e., a spatially continuous phenomenon. So, following Einstein’s hypothesis, what was always a spatially continuous phenomenon now also acquired a spatially discrete character. Light always was waves, but also became particles.

The atomic nature of matter was well-established by now. Einstein was the leading physicists of those who must be credited to have helped this theory gain wide acceptance.

Bohr even had a theory for explaining emission / absorption spectra of the hydrogen atom—a model with spatially discrete nucleus and spatially discrete electrons. So, atoms were not just a hypothesis; they were an established fact. And, all parts of them were spatially discrete and finite in extent too.

Matter was particulate in nature. Discrete clumps of clay etc.

Then, following Einstein’s lead, a young Frenchman by name de Broglie put forth the hypothesis, in his PhD thesis, that what is regarded as particulate matter should also have a wave character. Accordingly, there should be waves of matter.

6.2. de Broglie supplies a physical explanation for the stability of the Bohr atom:

de Broglie went even further, and suggested that a massive particle like the electron in the Bohr atom must obey the same relations as are given by the Planck-Einstein relations for light. [Mark this point well; we will shortly make a comment on it. But to continue in the meanwhile…] He then proceeded to do calculations on this basis.

In Bohr’s theory, stationary orbits for the massive electron had been only postulated; they had not been explained on the basis of any physical principle or explanation that was more fundamental or wider in scope. Bohr’s orbits were stationary—by postulate. And, only Bohr’s orbits were stationary—by postulate.

On the basis of his matter-waves hypothesis, de Broglie could now explain the stability of the Bohr orbits (and of only the Bohr orbits). de Broglie pointed that the orbits being closed circles, the matter-waves associated with an electron must form standing waves on them. But standing waves are possible only for certain values of radii, which means that only certain values of angular momenta or energies were allowed for the electrons.

de Broglie thus became the first physicist to employ the eigenvalue paradigm for the dynamics of the electron in a stable hydrogen atom.

6.3. de Broglie’s limitations—mathematical, and ontological:

However, as the later theory of Schrodinger would show (which came within a year and a half), de Broglie’s analysis was too simple. de Broglie was wrong on two counts:

(i) The transverse matter-waves, according to de Broglie, existed with reference to a 1D curve (the Bohr circle) embedded in the 3D space as the reference neutral axis. He couldn’t think of filling the entire 3D space with his matter waves—which is what Schrodinger eventually did.

(ii) de Broglie also altered the ontological character of electrons from massive point-particles to the unexplained “hybrid” or “composite” of: massive point-particles and matter-waves.

From the ontological viewpoint, thus, de Broglie is the originator of the wave-particle duality for the massive particles of electrons, just the way Einstein was the originator of the wave-particle duality for the massless particles of light.

Einstein, of course, beat de Broglie by some 19 years in proposing any such a duality in the first place. (That hair! That smile!! That very scientist-ness!!!)

6.4. What no one notices about de Broglie’s relations:

de Broglie’s relations are nothing but the same old Planck- and Einstein-relations, but now seen as being applied to matter waves, not light. Thus the same equations

E = \hbar \omega and

p = \hbar k

are now known as de Broglie’s relations.

Notice the curious twist here: The equations de Broglie proposed for matter waves were actually derived for the massless phenomenon of light. The m_0-containing term was set to zero in deriving them.

I do not know if any one raised any objections on this basis or not, and whether or how de Broglie answered those objections. However, this issue sure is ontologically interesting. I will leave pursuing the questions it raises as a homework for you, at least for now. [I may cover it in a later post, if required.]

Homework: Why should the equations of a massless phenomenon (viz. the light) apply to the waves of matter?

(Hint: Look at our far too prolonged discussions and seemingly endless repetition of the fact that cavity radiation analysis applies at thermal equilibrium. Also refer to our preference for the term “cavity radiation,” and not “black-body radiation.” … That should be enough of a hint.)

…As a side remark: The quantum theory anyway had begun to get developed at a furious pace by 1924, and a relativistic theory for quantum mechanics would be given by Dirac just a few years later. Relativistic QM is out of the scope of our present series of posts.


7. Before getting into the derivation of Schrodinger’s equation:

7.1. The place of Schrodinger’s equation in the quantum theory:

We now come to the equation that has held sway over physicists’ imagination for almost a century (95 years, to be precise), viz., the linear partial differential equation (PDE) that was inductively derived by Schrodinger.

[Ignore Feynman here when he says that Schrodinger’s procedure is not a derivation. It is a derivation, but it is an inductive derivation, not a deductive one. Feynman artificially constrained the concept of derivation only to deduction. Expected of him.]

In terms of the commanding position of Schrodinger’s equation, every valid implication of QM, every non-intuitive feature of it, every interpretational issue about it, every debate in the QM history,… they all trace themselves back to some or the other term in this equation, or some or the other aspect of it, or some or the other fact assumed or implied by its analysis scheme—its overall nature.

If you have a nagging issue about QM, and if you can’t trace it back to Schrodinger’s equation (at least with a form of it, as in the relativistic QM), then the issue, we could even say, does not exist! All the empirical evidence we have so far points in that direction.

Either your issue is there explicitly in Schrodinger’s equation, or at least implicitly in its context, or in one of its concrete or abstract implications. Or, the issue simply isn’t there—physically.

Including the worst riddle of QM, viz., the measurement problem. This problem is a riddle precisely because of a mathematical nature of Schrodinger’s equation, viz., that it’s a linear PDE.

So, we want to highlight this fact:

Even if all that you want to do is to “just” solve the measurement problem, you still have to work with the Schrodinger equation—including its inductive context, the ontology it presupposes (at least implicitly), its mathematical structure and form, and all their implications.

The reason is: there is only one primary-unknown variable in the Schrodinger equation, viz., \Psi(x,t). And, there is only one more field, viz., V(x,t) in it. The rest are either constants or the space- and time-variables over which the fields are defined.

There is no place for any additional variable in QM—known or unknown. The reason for this, in turn, is: Schrodinger’s equation predicts all the known QM phenomena with astounding accuracy. That’s why, there is no place for hidden variables either—the very idea itself is plain wrong.You don’t have to make an appeal to a detail like Bell’s theorem. The mathematical nature of Schrodinger’s equation, and the predictive success, together say that.

Therefore, solving the measurement problem must “only” require some ontological, physical, or mathematical reorganization involving the same old \Psi(x,t) and V(x,t) variables, and the same old constants (not to mention the same x and t variables over which the two fields are defined).

That’s why it is important to develop a good intuition about each term in this equation, about how the terms are put together, etc. To start developing such an intuition (which we will formalize in the ontology of Schrodinger’s QM in the next post), it is necessary to look into the logical scheme of its inductive derivation.

Without any loss of the essential physical meaning, (and perhaps with a greater clarity about physical meaning), we will use only the energy-based analysis in the derivation here, not the full-fledged variationally-based analysis which Schrodinger had originally performed in deriving his equation (by appeal to an analogy of mechanics with geometrical optics). We will more or less directly follow David Morin’s presentation. (It’s the best in the “town” for the purposes of a learner.)

7.1. Energy analysis with single numbers (or the aspatial, system-level, variables):

In energy analysis, the total energy content of an isolated mechanical system of objects that exert only conservative forces on each other, can be given as a sum of the kinetic and potential energies.

E = T + \Pi

where E denotes the total internal energy number of the system (here, not the magnitude of the electric force field), T is the kinetic energy number, and \Pi is the potential energy number.

Notice that as stated just as above, and without any further addition to the equation, this is not a statement of energy conservation principle; it is a statement that the internal energy for an isolated system having conservative forces consists of two and only two forms: kinetic and potential. (We ignore the heat, for instance.)

Speaking properly, a statement for energy conservation here would have been:

\oint \text{d}E = 0 = \oint \text{d}T + \oint \text{d}\Pi.

That is, a cyclic change for the total energy number for an isolated system is zero, and therefore, the sum of cyclic changes in its kinetic and potential energy numbers also must be zero—assuming that potentials are produced by conservative forces (as the electrostatic forces are). Now, for conservative forces, \oint \text{d}\Pi turns out to be zero, and so, the cyclic change in the kinetic energy too must be zero. Thus:

\oint \text{d}E = 0 = \oint \text{d}\Pi =  \oint \text{d}T.

However, notice, for non-cyclic changes, the most informative statement would be a differential equation; it would read:

\text{d}E = 0 = \text{d}T + \text{d}\Pi

This is because \text{d}E = 0 for any change in an isolated system. However, in general, notice that:

\text{d}T = - \text{d}\Pi \neq 0.

By integration between any two arbitrary states, we get

E = T + \Pi = \text{a constant},

which says that the E number stays the same for an isolated system—it is conserved—in any arbitrary change. Notice that this is a statement of energy conservation—due to the addition of the last equality. The addition of the last equality looks trivial, but it is in fact necessary to be noted explicity. We will work with this form of the equation.

Both T and \Pi are still aspatial numbers here. Since we have thrashed out this topic thoroughly in the previous posts of this series, we will not go into the distinction of the aspatial variables vs. the spatially defined quantities/fields once again. We will simply proceed to bring the aspatial variables down from their Platonic Lagrangian “heaven” to our analysis formulated in reference to the physical space (which is, in practice, 3D).

7.2. Energy analysis for the mechanics of a massive point-particle moving on a curve:

Now, in classical mechanics, for a massive-point particle,

T(x,t) = \dfrac{1}{2}mv(x,t)^2 = \dfrac{1}{2}\dfrac{p(x,t)^2}{2m}.

So,

E = \dfrac{p(x,t)^2}{2m} + \Pi(x,t)

In particle mechanics, p(x,t)^2 = \vec{p}(x,t)\cdot\vec{p}(x,t) always lies with the instantaneous position x(t) of the massive particle. In the above equation, \Pi(x,t) should still remain an aspatial variable, but it’s common practice in the Variational/Lagrangian/Hamiltonian mechanics to assume that this function is specified without any particular reference to particle position, and therefore,

\Pi(x,t) not is only a known quantity, it also is independent of p(x,t).

Under this scope-narrowing assumption, it is OK to think of a 1D field for the potential energy function (e.g. a curved wire over which the bead slides under gravity with the time-dependent geometry of the wire not depending on the position or the kinetic energy function of the bead).

Accordingly the potential energy number \Pi(x,t) can be represented via a 1D field of V(x,t).

Thus we have:

E = \dfrac{p(x,t)^2}{2m} + V(x,t),

where  V(x,t) and p(x,t) are not functions of each other. Note,

In classical mechanics of point-particles (and their interactions with fields), though V(x,t) is a field, p(x,t) still remains a point-property at the particle’s position. So, E(x,t) may also be taken to be a point-property of the particle.

This may look like hair-splitting to most modern physicists. However, it is not. The reason we went to such great lengths in identifying the conditions under which an energy can be regarded as an aspatial attribute of the system, and the conditions under which it can be regarded to have an identifiable existence in space—whether at the position of a point-particle or all over the domain as a field—are matters having crucial bearing on the kind of ontology there is assumed for the objects in the system.

In Schrodinger’s equation, it eventually turns out that V(x,t) is assumed to be a field, and not only that, but, effectively, also is the momentum function p(x,t). In developing Schrodinger’s equation, we also have to be careful not to directly assign the aspatial variable E to successive points of space—thus, hold on before you convert E to E(x,t). The reason is, Schrodinger’s equation deals with fields, not particles. The equation E = T +\Pi applies equally well to systems of particles as well as to systems of particles and fields, and to systems of only fields. Be careful. (I am correcting some of my own slightly misleading statements below.)


8. Specific steps comprising the essential scheme of Schrodinger’s derivation:

8.1. There should be a wave PDE for de Broglie’s matter-waves:

Schrodinger became intrigued by de Broglie’s theory, and within months, gave a seminar on it at his university. Debye (of the Debye-Scherrer camera fame, among other achievements) was in attendance, and casually remarked:

if the electron is a matter wave, then there must be a wave equation for it.

Debye actually meant a partial differential equation when he said “a wave equation.” A wave equation relates some spatio-temporal changes in the wave variable. The V is a field (following our EM ontology; see previous posts in this series), and the wave variable also must be a 3D field.

8.2. The wave ansatz:

The simplest ansatz to assume for a wave function (i.e. a field) in the 3D physical space, is the plane-wave:

\Psi(x,t) = A e^{i(kx - \omega t)}

With the negative sign put only on \omega but not on k, we get a (co)sinusoidal wave that travels to the right (i.e., in the direction of the positive x-axis). We will consider only the plane-wave traveling in the x-direction, for simplicity; however, realize, in a 3D physical space, two more plane-waves, one each in y– and z-directions, will be required.

8.3. The energy conservation equation that reflects the de Broglie relations:

We need to somehow relate k and \omega to the the Planck-Einstein relations as used in de Broglie’s theory (i.e. as applying to the massive electron). To do so, take the specialized energy conservation statement explained above, viz.

E = \dfrac{p(x,t)^2}{2m} + V(x,t)

which is an equation for a particle at x(t), with its momentum also located at its position.

We then substitute de Broglie’s relations for matter waves in it—i.e., we use the mathematical equation of Planck’s for E on the left hand-side, and Einstein’s for p on the right hand-side. We thus get:

\hbar \omega = \dfrac{\hbar^2 k^2}{2m} + V(x,t).

Notice, we have brought in the wave-particle duality, implicit in the de Broglie relations, now into an equation which in classical mechanics was only for particles.

Update: As an after-thought, a better way to look at is to begin with the aspatial-variables equation:

E = \dfrac{p^2}{2m} + \Pi = \text{a constant},

then make an electrostatic field for \Pi by substituting V(x,t) in its place, so as to arrive at:

E = \dfrac{p^2}{2m} + V(x,t) = \text{a constant},

and then, without worrying about whether E, T or p are defined in the physical space or not, to take this equation as applying to the system as a whole, and proceed to the next step. Accordingly, I am also slightly modifying the discussion below.

8.4. Making the energy conservation equation with the de Broglie terms, refer to the wave ansatz:

To relate the above equation to the plane-wave ansatz, multiply all the terms by \Psi(x,t), and get:

\hbar \omega \Psi(x,t) = \dfrac{\hbar^2 k^2}{2m}\Psi(x,t) + V(x,t)\Psi(x,t) = \text{a constant}\Psi(x,t).

The preceding step might look a bit quizzical, but doing so comes in handy soon enough to keep the mathematics sensible.

Physically, what the step does is to convert the first or the total term (total energy, which is conserved) from the aspatial variable E (or a system-attribute of \text{a constant}) to a spatially distributed entity—because of the multiplication by \Psi(x,t), a field. \Psi basically distributes the system-wide global variable (an aspatial variable) to all points in the physical space. BTW, this is the basic physical reason why \Psi(x,t) has to be normalized—we don’t want to change the value of the conserved quantity of the total energy.

Similarly, the same step also converts the second term (the kinetic energy, now expressed using the momentum) from the aspatial variable p^2/2m to a spatially distributed field.

The \Psi(x,t) variable refers to matter waves in 1D, but can be easily generalized to 3D—unlike de Broglie’s standing waves on the circles of the Bohr orbits.

Why do we not have to offer a physical mechanism for the multiplications by \Psi(x,t)? The answer is: it is its absence which is ontologically impossible to interpret. What comes as physically existing in the physical 3D space are the fields, and \Psi(x,t) helps in pinning their quantities. It is the aspatial variables/numbers that are devices of calculations, not the fields.

8.5. Transforming the energy conservation equation having the de Broglie terms into a partial differential equation:

Now, to get to the wave equation, we note the partial differentiations of the wave ansatz:

\dfrac{\partial \Psi}{\partial x} = ik\,A e^{i(kx - \omega t)} = ik\,\Psi(x,t),

and so,

\dfrac{\partial^2 \Psi}{\partial x^2} = -k^2\,\Psi

which implies that

k^2\Psi = - \dfrac{1}{\Psi}\dfrac{\partial^2\Psi}{\partial x^2}.

On the time-side, the first-order differential is enough, if eliminating \omega is our concern:

\dfrac{\partial \Psi}{\partial t} = -i\,\omega\,A e^{i(kx - \omega t)} = -i\,\omega\,\Psi

which implies that

\omega\Psi(x,t) = \dfrac{1}{-i}\dfrac{\partial \Psi}{\partial t} = i \dfrac{\partial \Psi}{\partial t}

Now, simple: Plug and chug! The energy conservation equation with the de Broglie terms goes from:

\hbar \omega \Psi(x,t) = \dfrac{\hbar^2 k^2}{2m}\Psi(x,t) + V(x,t)\Psi(x,t).

to

i\,\hbar \dfrac{\partial \Psi(x,t)}{\partial t} =\ -\, \dfrac{\hbar^2}{2m}\dfrac{\partial^2\Psi(x,t)}{\partial x^2} + V(x,t)\Psi(x,t)

That’s the most general (time-dependent) Schrodinger equation for you!

8.6. A few comments:

1. There is the imaginary root of unity, viz. i on the left hand-side. So, the general solution must be complex-valued.

2. The PDE obtained has the space-derivative to the second order, but the time-derivative only to the first order.

In classical waves (i.e. the NM-ontological waves like the waves on strings, as well as in EM-waves), the both the space- and time-derivatives are to the second order. That’s because the classical waves are real-valued. If you have complex-valued waves, then the first order derivative is enough to get oscillations in time. Complex-valued waves are mandated because we inserted de Broglie’s relations in the energy conservation equation. In classical NM-ontological mechanics, we would have kept the \Psi real-valued, and so, would have to take its second-order time differential. But then, none of the energy terms in the energy conservation equation would obey Planck’s hypothesis, hence Einstein’s relation, and hence, de Broglie’s relations. In short, the complex-valued nature of \Psi is mandated, ultimately, by Planck’s hypothesis and a wave ansatz.

3. For later reference, note that:

The kinetic energy is a field, given by:

T(x,t) =\ -\,\dfrac{\hbar^2}{2m} \dfrac{1}{\Psi}\dfrac{\partial^2\Psi}{\partial x^2},

which suggests the following definition for the momentum in the system:

\vec{p}(x,t) =\ i\,\hbar \dfrac{1}{\Psi(x,t)} \nabla \Psi(x,t).

Thus, both kinetic energy and momentum are fields in the Schrodinger equation. The mainstream view regards them as fields defined on the abstract, 3ND configuration space.

In contrast, we take all fields of the Schrodinger equation: V(x,t), \Psi(x,t), and hence, T(x,t) as well as \vec{p}(x,t) as the 3D fields in aether. (The last two become fields because their terms involve \Psi).

The ontological and mathematical justification for our view that they are fields in the 3D physical space should be too simple and obvious by now (at this stage in this series of posts). The only thing to look into, now, is to justify that \Psi(x,t) field remains a 3D field even when there are two or more particles. We touch upon this issue in the next post, when we come to the ontology of QM.


9. Some comments on the development of Schrodinger’s equation, from our ontological viewpoint:

Notice very carefully the funny circling around going on here, with respect to light and matter, waves and particles, massless aether and massive objects. (We touched upon many of these points above and before, so they will get repeated, unfortunately! However, it’s important to put them together in one place for easy reference later on.)

Both Planck and Einstein began with the energy analysis of massive charged objects (roughly, the EC Objects of our EM ontology). They then ascribed the quantities of E = \hbar \omega and p = \hbar k to the massless aether. This procedure is justifiable because of the equality in the energy or momentum exchanges at the thermodynamic equilibrium.

As to the ontology, Planck had his own doubts about the “transfer” of quantization in the energy states of oscillators to the quantization of the EM fields. He regarded the quantization of energy of the radiation field as only a hypothesis. With the enormous benefit of hindsight (of more than a century), we can say that his hesitation for quantizing energy fields themselves was not justified. Schrodinger showed how they could be continuous (and continuously changing) entities and still obey the energy-eigenvalue equations for stationary states.

In contrast to Planck, Einstein was daring, actually brazen. He didn’t have any issue with quantization. In fact, he went much beyond, actually overboard in our opinion, and introduced also the spatial quantization to light, by introducing particles of light.

Notice, the sizes of attributes, i.e. the magnitudes of energy (or momentum), involved in the exchanges between the material point-charges and the aether is the same. However,

The fact that an exchange of energy is possible (that the sizes of the respective attributes can undergo changes in a mutually compensating way) does not alter the very ontological nature of the respective objects which enter into the interactions.

Ontologically, an EC object still remains a point-particle (more on this in the next post), and the aether still remains a spatially spread-out, non-mechanical, object—even when they interact, and therefore, even when their abstract measures like energies can change, and these changes be equated.

In our view, energy and momentum are point-properties when possessed by point-particles of EC Objects; they should be seen as moving in space with these objects (more on it, in the next post); they never exist at any other locations. In contrast, energy and momentum are field properties when possessed by the aether; they remain spread over all space at all times; they never concentrate in one place.

To equate the sizes of attributes is not to change the ontological character.

We do not blur the point-particle into a smear over space, neither do we collapse a field to a point. We simply say that from a more abstract, thermodynamic-systems perspective, the quantities of attributes called energy and momentum, in case of both types of objects, come to have the same magnitudes under equilibrium exchanges, that’s all.

In converting some quantity of steel into a piece of gold or vice versa, the respective physical objects remain the same; they only change hands, that’s all. The quantity of steel does not become golden, nor can a coin of gold be used in building a car, just because they got exchanged.

Einstein however confused this ontological issue and prescribed an exchange of not just quantities but also of the basic ontological characters. He put forth the idea of spatially discrete particles of light (later called photons).

de Broglie then entered the scene, compounded Einstein’s ontological error on the other hand of the ontological division, and prescribed an ontological wave character to the matter particles. He in effect smeared out matter into space, and also made the smear dance everywhere as a wave—a “symmetrical” counterpart to how Einstein was the original “inventor” of “anti-smearing” fields in space to a point-object, and then, of making this point-object (of the photon) go everywhere while a carrying wave attribute with it—but without any explanation about the internal structure which might lead to its having the wave attribute.

In effect, Einstein and de Broglie were the initiators of the wave-particle duality. Both their works implied, in the absence of any satisfactory explanation coming forth from either of them for their ontological transgressions, the riddles: The riddle of how the wave field “collapses” to the wave-attribute of the photon in Einstein’s theory, and how the mass and charge smear out in de Broglie’s theory.

Both must have been influence by bad elements of philosophy, including ontology. But the Copenhagen camp went further, much further.

The logical-positivistically minded Bohr, Heisenberg et al. of the Copenhagen camp then seized the moment, and formalized the measurement problem via the wavefunction “collapse,” the Complementarity Principle, etc. etc. etc. And despite all the “celebrated” debates, neither Einstein nor de Broglie ever realized that, as far as physics was concerned, it was they who had set the ball rolling in the first place!

Schrodinger didn’t think of questioning these ontological transgressions—neither did any one else. He merely improved the maths of it—by generalizing the eigenvalue problem from the original de Broglie waves on 1D curves to a similar problem for his wavefunction \Psi, initially in the 3D space. Then, in the absence of sufficient clarity regarding the nature of Lagrangian abstractions (and their nature to the physical 3D space), Schrodinger even took \Psi (following Lorentz’s objection) to the abstract 3ND configuration space.

A quarter of a century later, John von Neumann used his formidable skills in mathematical abstractions, and, as might be expected, also equipped with a perfect carelessness about ontology, took all the QM-related confusions, and cast them all into the concrete, by situating the entire theoretical structure QM on the “floating grounds” of an infinite-dimensional Hilbert space, with \Psi and V of course “living” in abstract 3ND configuration space.

Oh, BTW, regardless of his otherwise well-earned reputation, there were errors in von Neumann’s proofs too. It took decades before a non-mainstream non-American QM physicist, named John S. Bell, discovered an important one. Bell said:

The proof of von Neumann is not merely false but foolish!” [^].

I am tempted to ask Bell:

“Why just von Neumann, John? Weren’t they all at least partly both?”

The only way to counter all their errors is to clearly understand all the aspects of all such issues—by and for yourself. You must understand the epistemology and ontology involved in the issues (yes, this one, first!), also physics (both “classical” and QM), and then, also the relation of mathematics to physics to ontology and epistemology in general. But once you do that, you find that all their silly errors and objections have evaporated away.


10. Operators are not ontologically important:

I do not know who began to emphasize operators in QM. Dirac? von Neumann? Still others?

But the notion has become entrenched in the mainstream QM. A lot of store is set up on the idea that the classical variables must be represented, in theory, by operators—by objects that are, in Feynman’s memorable words “hungry” forever. The operators for the momentum and energy, for instance, are respectively given as:

\hat{p} = i\,\hbar \nabla

and

\hat{H} =\ -\ \dfrac{\hbar^2}{2m} \dfrac{\partial^2}{\partial x^2} + V(x,t).

To somehow have everything fit their operator-primary theory, they also carefully formulated the notion that the operator of a number, a variable, or a field function, when it “acts” on \Psi, results in just plain multiplication of that mathematical object with the \Psi—without any explanation or justification on the physical grounds. Why multiply when Nature does no multiplications—without there being a mechanism acting to that effect? Blank out.

It all is more than just a bit weird, but it’s there—the operator-primacy theory of formulating QM. And I am sure that it has some carefully crafted and elegant-looking mathematical basis intelligently created for it too—complete with carefully noted notations, definitions, lemmas, theorems, proofs, etc. All forming such a hugely abstract and obfuscating structure that errors in proofs are kept well hidden for decades.

For now, just note that the very notion of operators itself is not very important when it comes to ontological discussions. Naturally, the finer distinctions about it like the linear operators, Hermitian operators, etc., also is not at all important. Just my personal opinion. But it’s been reached with a good ontological understanding of the issues, I think.


11. A preview of the things to come next:

OK. In this post, we touched on many of the finer points having ontological implications. The next time, we will provide our answer regarding the proper ontology of QM.

We will refer to the physics of only the simplest quantum system, viz. the hydrogen atom (and comparable quantum models, notably, the particle in the box (PIB), the quantum harmonic oscillator, and similar 1-particle quantum systems). We will make a formal list all the objects used in the QM ontology, and also indicate the kind of 3D aetherial field there has to be, for the system wavefunction. We will also discuss some analogies that help understand the nature of the \Psi field. For instance, we will point out the fact that \Psi exists even in the PIB model, i.e., even at places in the domain where V is zero. There are some interesting repercussions arising out of this fact.

We will also touch upon the fact that the action-at-a-distance is absent in our EM ontology, and hence, it should also be absent in our QM ontology. However, the presence of the direct-action does not mean that there cannot be changes that occur simultaneously everywhere in the aether. The two phenomena are slightly different, and we will delineate them. The non-relativistic QM theory, in particular, requires the latter.

We will, however, not touch upon the measurement problem. Understanding the measurement problem requires understanding two new physics topics to a certain depth and with sufficient scope: QM of many-particle systems, and the physics of the nonlinear differential equations. Both are vast topics in themselves. Further, tackling the measurement problem doesn’t change the list of different ontological objects that are involved in QM or their basic nature. In fact, measurement problem is rather a specifically detailed physics problem. Solving it, IMHO, does require a very good clarity on the QM ontology, but the basic ontological scheme remains the same as for the single hydrogen atom. That’s why, we won’t be touching on that topic. Experts in QM may refer to the Outline document I have already put out, earlier this year, at iMechanica [^]. All the rest: Well, you have wait, or ask the experts, what else?

One particular aspect of the many-particle quantum systems which is very much in vogue these days is the entanglement. Since we won’t be covering the many-particle systems in this series, we also wouldn’t be touching on the physics of quantum mechanical entanglement. However, at least as of today, I do not very clearly see if the phenomenon of entanglement requires us to make any substantial changes in the ontology of QM. In fact, I think not. Entanglement complicates only the physics of QM, but not its ontology. Hence the planned omission of entanglement from this series.

So, all in all, our description of the QM ontology itself would get completed right in the next post. And, with it, also this ontological series would come to an end.

Of course, my blogging would continue, as usual. So, I might write occasional posts on these topics: many-particle QM systems, nonlinearity proposed by me in the Schrodinger equation, the measurement problem, the quantum entanglement, and then, perhaps, also the quantum spin. However, there won’t be a continuously executed project of a series of posts as such, to cover these topics. I will simply write on these topics on a more or less “random” and occasional basis—whenever I feel like.

So there. Check out the next—i.e. the last—post in this series, when it comes, say in a week’s time or so. In the meanwhile, go through the previous posts if you have joined late.

Also, have a happy Diwali!

Alright, take care, and bye for now…


A song I like:

(Hindi) “dheere dheere machal ae dil-e-beqarar”
Music: Hemant Kumar
Singer: Lata Mangeshkar
Lyrics: Kaifi Aazmi
[Credits listed in a random order]


History:

— First published: 2019.10.26 18:37 IST
— Corrected typos, added sub-section headings, revised some contents (without touching the points), added a few explanations, etc., by 2019.10.27 11:28 IST. Will now this post (~7,500 words!) as is. At least until this series gets over.
Update on 2019.10.28 10:50 IST: Still corrected some misleading passages, added notes for better clarification, corrected typos, etc. (~8,825 words!). Let’s leave it. I need to really turn to writing the next post.

Ontologies in physics—8: Correct view of the EM “V” in the Schrodinger equation. Necessity of aether.

0. Prologue:

The EM textbook view of the electric vector field (\vec{E}, in volt per meter or newton per coulomb) of a charge, and so the electric potential field (P, in volt) is an ontologically misleading construct, even if happens to be mathematically consistent with the physics of EM. In this post, we will point out the ontologically correct view to take of the electrostatic phenomena. Corrections are implied at suitable places throughout our earlier description of the EM ontology.

In developing our new ontological view of EM, we will also be using the evidence which came to light after the Maxwellian EM was already formulated and recast by people like Lorentz at al.

A basic point to keep in mind for this post is what we discovered on the fly, right in this series: Nature does no multiplications (without there being some physical mechanism for it).

Let’s get going.


1. Some basic facts pertaining to the EM physics:

The first basic fact we note is the following:

The total amount of charge contained in the universe is zero.

We reach this ontologically important conclusion by generaling from the available physical evidence—not from “purely” mathematical considerations such as “symmetry”.

One direct outcome: We can’t associate an ontology based on a preferential direction for some imaginary polarity-conversion process. In fact, going by another bit of empirical evidence, we don’t even have to entertain any polarity-conversion process in the first place. The polarity of a charge is immutable.

The second basic fact to be noted is this:

Elementary charges have the same absolute magnitude; they differ only in their signs.

Putting these two general facts together, we can say, as a general inference, that

The number of elementary charges is equally split into the two polarities: positive and negative.

We don’t know any particular count for the number of charges in the universe. (And, we don’t have to know, given the 1/r^2 nature of their forces, which decays rather rapidly so that even the spatial extent of the already known itself is far too big for an electron’s force-field to stay numerically relevant at such large scales). But, yes, we do know the fact that there as many positive charges as there are negative charges.

Hence, if a system description is to be a good representation of the behaviour of the universe, it should be regarded as having an equal number of positive and negative charges.

So, as the first quantitative implication:

For an arbitrary system having a total N number of charges, the number N should always be so selected as to be an even number. In such a case, there will be N/2 number of positive charges, and N/2 negative charges. Thus, there will be N(N-1)/2 number of pairs of interacting charges in all, and N(N-1) number of different separation vectors.

Homework: Given such a system, find the total number of the different possible separation vectors that connect: (i) two unlike charges, (ii) two positive charges, (iii) two negative charges.

As another basic fact, we note that:

The electrostatic interaction forces being conservative in nature, they can be superposed.

Therefore, the forces in a system can be quite generally analyzed in reference to just a single arbitrary pair of charges of arbitrary polarities.


2. The three steps to reach the correct ontological view of the electrostatic fields:

Take two elementary charges of arbitrary polarities q_i and q_j respectively at \vec{r}_i and \vec{r}_j.

2.1. Step 1: The empirical context:

Start with Coulomb’s law which gives the two forces, respectively acting on the two charges, with each acting in the direction of a separation vector:

\vec{f}_{ij} = \dfrac{q_i\,q_j}{r_{ij}^2} \hat{r}_{ij}
and
\vec{f}_{ji} = \dfrac{q_j\,q_i}{r_{ji}^2} \hat{r}_{ji}

In ontology of physics, it’s always very sensible and fully valid—and also equally lovely—to begin with forces, and not with the Lagrangian or the Hamiltonian. Forces “force” you to think of the individual objects that do the forcing, or of the changes which are made to the dynamical states of individual objects due to the forcing. So, you just can’t escape identifying the actual physical objects involved in forceful interactions. If you start with forces, you just can’t escape into some abstract system-wide defined numbers, and then find it easy to cut your tie from reality. That’s why. (As to any possible non-forceful interactions, tell me, who really worries about them in physics? Certainly not Noether’s theorem.)

From this point on, I will work out with just one of the forces, viz. \vec{f}_{ij}, and leave the other one for you as homework.

2.2. Step 2: Assign to the aether the role played by the attribute of the charge of the EC Object:

Following the textbook treatment of EM, the ontology for the EC Object which we developed has been the following: It was essentially the NM Object now with an additional attribute of the electric charge. Thus, an EC Object is a massive point-particle that carries an elementary charge as its additional attribute. Qua attributes of a point-existent, both mass and charge must be seen as being located at all times at the same point where the EC Object is—and nowhere else.

We must now modify a part of this notion.

We realize that the idea of the electric charge is helpful only inasmuch as it helps in formulating the quantitative force law of Coulomb’s.

The charge, in the text-book treatment, is associated with a massive particle. But there is no direct empirical evidence to the effect that a quantity which captures the forcing effects arising due to the attribute of the charge, therefore, has to be a property of the EC Object itself.

Only a physics / ontology which says that there has to be an absolutely “empty space” in the universe (devoid of any existent in it), can require the charge to be attributed to the massive point-particles. In our new view, this is an instance of misattribution.

Accordingly, as our second step,

Remove the charge-attribute from the EC Object at \vec{r}_j and re-assign it to the elemental CV of the aether around it.

To get rid of any possible notational confusion, introduce q_A in place of q_j into the statement of Coulomb’s law. Here, q_A is the magnitude of that quality or attribute of the aether which allows the aether itself to electrostatically interact with the first charge q_i and to experience \vec{f}_{ij} at the specific point \vec{r}_j. The subscript _A serves to remind the Aether.

Accordingly,

\vec{f}_{ij} = \dfrac{q_i\,q_A}{r_{ij}^2} \hat{r}_{ij}

Notice, the maths has remained the same. However, the ontology has changed for both the EC Object, as well as for the single elementary CV around the point \vec{r}_j.

The EC Object now has no classical q_j charge on it. The elementary CV too is without a point-charge—if the term is taken in the text-book sense of the term. That is to say, two adjacent elementary CVs do not exert very high electrostatic forces on each other as if they were sources of Coulombic forces. q_A captures a charge-like attribute only on the receiving side.

Instead of the EC Object, now, the aether itself is seen to carry some attribute whereby it experiences the same electrostatic force as a point-charge q_j of the textbook description would. From the viewpoint of Coulomb’s law, the relevant measure of this attribute therefore is q_A = q_j.

2.3 Step 3: Generalize to all space:

Now, as the third step, generalize.

Since q_A now is an attribute of the aether, all parts of it must possess the same attribute too. After all, the entire aether is ontologically a single object. The idea of the aether as a single object is valid because in places where the aether is not, we would have to have some still other physical object present there. Further, there is no evidence which says that the force-producing condition be present only at \vec{r}_j but at no other parts of the aether.

Accordingly, generalize the above equation from \vec{r}_j to any arbitrary location in the aether \vec{r}_A:

\vec{\mathcal{F}}_{iA} = \dfrac{q_i\,q_A}{r_{iA}^2} \hat{r}_{iA}

where \vec{r}_A is a variable that at once applies to all space.

With this generalization, we have obtained a field of q_A in the aether—one that is uniform everywhere, being numerically equal to q_j (complete with the latter’s sign). As a consequence, a local force is produced by q_i at every arbitrary elemental CV of the aether \vec{r}_A. Accordingly, there is a field of local forces too.

Since this is a big ontological change, we have changed the symbol on the left hand-side too. Thus, \vec{\mathcal{F}}_{iA} represents a field of force whereas \vec{f}_{ij}, which appeared in the original Coulomb’s law, has been just a point-force at one specific location.

We call \vec{\mathcal{F}}_{iA} the aetherial force-field. It can be used to yield the force on the second EC Object when it is present at any arbitrary position \vec{r}_j.


3. Implications for the ontologies of EC Objects and the aether:

With that change, what is now left of the original EC Object q_j?

3.1. An EC Object suffers force not due to its own charge but because of the electric aether:

Well, metaphorically speaking, the EC Object now realizes that even though its very near (and possibly dear) charge has now left it. However, it also realizes that it still is being forced just the same way.

Earlier, in the textbook EM, the poor little chappie of the EC Object q_j was a silent sufferer of a force; it still remains so. But the force it receives now is not due to a point-concentration of charge with it, but due to a force imparted to it by the aether—which now has that extra attribute which measures to q_A.

All in all, the EC Object (or the classical “charge”) now comes to better understands itself and its position in the world. It now realizes that the causal agent of its misery was not a part of its own nature, of its own identity; it always was that (“goddamn”) portion of the aether in direct contact with it.

So, from now on, it will never forget the aether. It has grown up.

3.2. An EC Object causes a force-field to come into the aether too

But not everything is lost. An EC Object is not all that miserable, really speaking. The force field \vec{\mathcal{F}}_{iA} was anyway created by an EC Object—the one at q_i.

Thus, the same EC Object fulfills two roles: as creator of a field, and as a sufferer of a force-fields created by all other EC Objects.

3.3. Evey EC Object still remains massive:

Every EC Object still gets to retain its mass just as before. So, if unhindered, it can even accelerate in space just as before. Hey, no one has cut down on its travelling allowance, alright?

So, regardless of this revision in the ontology of the EC Object, it still remains a massive particle. Being a “charge-less” object, in fact, makes it more consistent from an ontological perspective: the EC Object now interacts only with the aether, not directly with the other charge(s) (through action-at-a-distance), as we’ve always wanted.

3.4. The aether remains without inertia:

As to the aether: Though having a quality of charge, it still remains without any inertia. At least, it doesn’t have that inertia which comes “up” in its electrostatic interactions.

Hence, even though it is the one that primarily experiences the force \vec{\mathcal{F}}_{iA} at \vec{r}_{iA}, this force does not translate into its own acceleration, velocity, or displacement. It just stays put where it is. But if a massive particle strays at its location, then that particular aetherial CV has no option but to pass along this force, by direct contact, to that massive particle.

3.5. The aether allows the EC Object to pass through it:

As we shall see in the section below, the aether poses no drag force to the passage of an EC Object. The aether also is all pervading, and no part of it undergoes displacements—that is, it does not move away to make a way for the EC Object to go through (the way the public makes way for a ministers caravan, in India).

We might not be too mistaken if we believe that the reason for this fact is that the aether has no inertia coming into picture in its electric interactions.

Thus, we have to revise our entire ontology of what exactly an EC Object is, what we mean by charge, and what exactly the aether is.


4. Ontological implications arising out the divergence of force fields:

4.1. Zero divergence everywhere except for at the location of the forcing charge:

It can be shown that an inverse-square force-field like \vec{\mathcal{F}}_{iA} (or \vec{\mathcal{F}}_{jA}) has zero divergence everywhere, except around the point \vec{r}_i (or \vec{r}_j respectively) where the field is singular. There, the divergence equals the charge q_i (or q_j, respectively).

Notice, the \vec{\mathcal{F}}_{iA} field has a zero divergence even around the forced object \vec{r}_j—which was used in defining it. Make sure you understand it. The other field, viz., \vec{\mathcal{F}}_{jA} does have a singularity at \vec{r}_j and a divergence equal to q_j. But \vec{\mathcal{F}}_{iA} doesn’t—not at \vec{r}_j. Similarly for the other field.

4.2. Non-zero forces everywhere:

However, notice that the elemental CV at the location \vec{r}_j of the forced charge still carries a finite force at that point—exactly as everywhere else in the aether.

Remember: Divergence is about how a force-field changes in the infinitesimal neighbourhood of a given CV; not about what force-field is present in that CV. It is about certain kind of a spatial change, not about the very quantity whose change it represents.

4.3. Static equilibrium everywhere:

Since the divergence of the force field \vec{\mathcal{F}}_{iA} is zero everywhere in the aether (excepting for the single point of the singularity at \vec{r}_i), no CV in the aether—finite or infinitesimal—exchanges a net surface-force with a CV completely enclosing it. (A seed of a fruit is completely enclosed by the fruit.) Thus, every aetherial CV is in static equilibrium with its neighbours. The static equilibrium internal to the aether always prevails, regardless of how the EC Objects at \vec{r}_i or \vec{r}_j move, i.e., regardless of how the fields \vec{\mathcal{F}}_{iA} or \vec{\mathcal{F}}_{jA} move. No finite change of local force conditions is able to disturb the prevailing static equilibrium internal to the aether

However, a pair of equal and opposite local surface-intensities of forces do come to exist at every internal surface in the aether. Hence, a state of stress may be associated with the aether. These stresses are to be taken by way of analogy alone. Their nature is different from the stresses in the NM Ontological continuous media.

Note again, the force-field is non-uniform (it varies as the inverse-square of separation)—i.e. non-zero. So there still is a non-zero force being exerted on an aetherial CV—even if there is no net force on any surface between any two adjacent CVs.

4.4. The direct contact governs the force exchanges internal to the aether:

A direct consequence of the inverse-square law and the divergence theorem also is that the force field must be seen as arising due purely to a direct contact between the neighbouring aetherial CVs.

There is no transfer of momentum from one CV to another distant CV via any action-at-a-distance directly between the two—by “jumping the queue” of the other parts of the intervening aether, so to speak.

4.5. The direct contact governs the force exchanges between the aether and the forced EC Object:

With the presence of a non-zero \vec{\mathcal{F}}_{iA} force acting on it, an aetherial CV which is in direct contact with a massive EC object, transmits a surface-force to the EC Object via the internal surface common to them.

So, while the CV itself does not move, it does force the EC Object.

4.6. Inertia-less aether implies no drag force on the EC Object:

The aetherial force-fields are conservative, and the description provided by Coulomb’s law is logically complete for electrostatics. Given these two premises, the aether must act as a drag-free medium for the passage of an EC Object.

An aetherial CV does not exert any resistive or assistive forces, over and above the forces of the \vec{\mathcal{F}}_{iA} field, on the massive EC Object at q_j.

There is a force through direct contact between an aether and an EC Object too—just as in NM ontology. However, quite unlike in NM ontology, there also is no force for the passage of an EC Object through the aether. (Can this be explained because the aether has no inertia to show at the level of electric phenomena?)

All in all, the only difference between the forces at two neighouring points in space are the two local point-forces of the field. Hence, if the \vec{\mathcal{F}}_{iA} field is non-uniform (and the Coulombic fields anyway are non-uniform), the massive point-particle of the forced EC Object (the one at \vec{r}_j) always slides “down-hill” of \vec{\mathcal{F}}_{iA}.

Note, our description here differs from the textbook description. The textbook description implies that a negative charge always climbs up the hill of a positive \vec{E} field, whereas a positive charge climbs down the same hill. In our description, we use \vec{\mathcal{F}}_{iA} in place of \vec{E}, and the motion of the EC object always goes downhill.

4.7. The electric aether as the unmoved (or unmovable) mover:

Since an aetherial CV surrounding a given EC Object forces it, but doesn’t move itself, we may call the EM aether the unmoved (or unmovable) mover.

Aristotle, it would seem, had an idea or two right at the fundamental levels also of physics—not just in metaphysics or logic. This idea also seems to match well with certain, even more ancient, Upanishadic (Sanskrit: उपनिषदीय) passages as well.

But all that is strictly as side remarks. We couldn’t possibly have started with those ancient passages and come to build the force-fields of the precisely required divergence properties, without detailed investigations into the physical phenomena. Mystically oriented physicists and philosophers are welcome to stake a claim to the next Nobel in physics, if they want. But they wouldn’t actually get a physics Nobel, because the alleged method simply doesn’t work for physics.

For building theoretical contents of physics, philosophical passages can be suggestive at best. The actual function of philosophy in physics is to provide broad truths, and guidelines. For instance, consider the fact that there has to be an ontology, at least just an implied one, for every valid theory of physics. This piece of truth itself comes from, and is established in, only philosophy—not in physics. So, philosophy is logically required. It can also be useful in being suggestive of metaphors. But even then, physics is a special science that refers to scientific observations, and uses experimental method, and quantitative laws.


5. No discontinuity in the \vec{\mathcal{F}}_{iA} field around \vec{r}_j:

Oh, BTW, did you notice that the force-field \vec{\mathcal{F}}_{iA} is continuous everywhere—including at the location \vec{r}_j of the forced EM Object? Looks like our problem from the last post has got solved, does’t it? Well, yes, it is!

Even if \vec{\mathcal{F}}_{iA} is discontinuous at its singularity, this singular point happens to be at the other (forcing) EC Object’s location \vec{r}_i. We can ignore it from our analysis because the maths of differential equations anyway excludes any singular point. Our difficulty, as noted in the last post, was not at the proton’s position (i.e. the singularity at \vec{r}_i) but with the discontinuity at the electron’s position (i.e. \vec{r}_j).

Now we can see that, ignoring the singularity of \vec{\mathcal{F}}_{iA} at \vec{r}_i, this aetherial force field keeps on forcing the EC Object q_j continuously everywhere in the field.

No matter where the j-th EC Object goes, it can’t hide from \vec{\mathcal{F}}_{iA}, and must get forced from that position too. Similarly, no matter where the i-th EC Object goes, it can’t hide from \vec{\mathcal{F}}_{jA}, and must get forced from that position too. That is, following electrostatics alone.

(EM Dynamics keeps the electrostatic description as is, but also adds the force of magnetism, which complicates the whole thing. QM the electrostatic description as is, removes the magnetic fields, but introduces \Psi field, which raises such issues that we ended up writing this very lengthy series on just the ontologically important parts of them!)


6. Overall framework: Pairs of charges, and hence of force-fields:

An isolated charge by itself does not exist in the universe. There always are pairs of them.

Mathematically, a \vec{\mathcal{F}}_{iA} field acquires its particular sign, which depends on the specific polarities of the respective charges forming a pair in question, right from the time the two are “brought” “from” infinity. Ontologically, this is a big difference between our view and that of the textbook EM.

The textbook EM captures interactions via \vec{E} field which is found in reference to a positive test charge of unit magnitude. The force-field is thus severed from the sign of the forced charge; it reflects only the forcing charge. In our view, both the charges have equal say in determining both \vec{\mathcal{F}}_{iA} and \vec{\mathcal{F}}_{jA} fields.

Physically, both these fields from a pair of charges have always existed, exist, and will always exist, at all times. There is no way to annihilate only one of the two.

This feature of the EM physical reality is remarkably similar to the necessity of there physically being only a pair of forces, and not an isolated physical force in the NM ontology, following Newton’s third law.

If there are two charges in an isolated system, there are two force fields. If there are three charges, there are six force-fields. The number of force-fields equals the number of separation vectors. See the homework above.

Due to the conservative nature of the Coulombic forces, all the force-fields superpose at every point in the aether.

The massive particles of EC Objects merely accelerate under the action of the net field present at their respective positions. That’s on the acceleration side, i.e., the role that a given EC Objects plays as a forced charge. However, the same EC Object also plays a role as a forcing charge. The locus of this role moves too.

Thus,

All singularities in all the force-fields also move when the EC Objects where they are present, move.


7. An essentially pairs-wise description of the fields still remains objective:

Each of the two force-fields \vec{\mathcal{F}}_{iA} and \vec{\mathcal{F}}_{jA} represents an interaction between two objects, sure. However, this fact does not make the description devoid of objectivity; it does not make it inherently relativistic, Machian, or anything of that sort.

The force-fields are due to interaction between pairs of charges, and not due to mere presence of the individual charges. Yet,

The individual charges still retain the primary ontological status.

Force-fields do not have a primary standing. Their ontological standing is: (i) as attributes of the aether—which is a primary existent, and (ii) as effects produced by the EC Objects—which again are the primary existents.

Force fields acquire signs as per the properties of the two EC Objects taken together. But the same EC Object contributes exactly the same sign in every conceivable pair it forms with the other charges in the universe. Thus, the sign is an objective property of a single EC Object, not of a pair of them.

Each singularity resides in a point-particle of the EC Object. Given the same forced charge q_j, there are N-1 number of \vec{\mathcal{F}}_{iA} force-fields pulling or pushing q_j in different directions. Each of these N-1 singularities resides at specific point-positions q_is of the forcing EC Objects. Further, each forced charge also acts as a forcing EC Object in the same pair. Thus, the two force-fields \vec{\mathcal{F}}_{iA} and \vec{\mathcal{F}}_{jA} imply two point-phenomena: the two singularities.

Each singularity at the forcing charge q_i also moves when the object at its location moves under the action of the other force field \vec{\mathcal{F}}_{jA} acting on it.

The ontologically mandated simultaneous existence of the two force-fields is akin to the simultaneous existence of the action-reaction pair from the good old Newtonian mechanics. The fact that forces due to direct contact come only in pairs does not imply that “everything is relative,” or properties that can be objectively isolated and attributed to individual objects cease to exist just because two objects participate in an interaction. For more on what causality means and what interaction means, see my earlier post [^] in this series.


8. Potential energy numbers as aspatial attributes of a system:

8.1. A note on the notation:

I would have liked to have left this topic for homework, but there is a new notation to be introduced here, too. So, let’s cover this topic, although as fast as possible.

So, first, a bit about the notation we will adopt here, and henceforth.

Since we are changing the ontology of the EM physics, we should ideally make changes to the notation used for the potential energies too. However, I want to minimize the changes in notation when it comes to writing down the Schrodinger equation.

So, I will make the appropriate changes in the discussion of the energy analysis that precedes the Schrodinger equation, but I will keep the notation of the Schrodinger equation intact. (I don’t want reviewers to glance at my version of the Schrodinger equation, and throw it in the dust-bin because it doesn’t follow the standard textbook usage.)

So, let’s get going. We will make the notations as we go along.

8.2. The single number of potential energy of two point-charges as an aspatial attribute:

Let \Pi( q_i, q_j) be the potential energy of the system due to two charges q_i and q_j being present at \vec{r}_i and \vec{r}_j, respectively.

This is a system-wide global number, a single number that changes as either q_i, or q_j, or both, are shfited. It is numerically equal to the work done on the system in variationally shifting the two EC Objects from infinity to their stated positions. Using Coulomb’s law, and the datum of zero potential energy “at” infinity, it can be shown that this quantity is given by:

\Pi( q_i, q_j) = \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_i\,q_j}{r_{ij}}.

Even if the above formula makes reference to the magnitude of the separation vector, there is nothing in it which will allow us to locate it at any one point of space. So, \Pi( q_i, q_j) is an aspatial attribute. You can’t point out a specific location for it in space. It is a device of analysis alone.

8.3 Potential energy numbers obtained by keeping one charge fixed:

Let \Pi(q_j) be the potential energy (a single number) imparted to the system due to the work done on the system in variationally shifting q_j from infinity to its current position \vec{r}_j, while keeping q_i fixed at q_i.

Notice, the charge in the parentheses is the movable charge. When only one charge is listed, the other one is assumed fixed. Since here q_i is fixed, there is no work done on the system at \vec{r}_i. Hence, the single number that is the system potential energy, increases only due to a variational shifting of q_j alone.

Similarly, let \Pi(q_i) be the potential energy (a single number) imparted to the system due to the work done on the system in variationally shifting q_i from infinity to its current position \vec{r}_i, while keeping q_j fixed at q_j.

(It’s fun to note that it doesn’t matter whether you bring any of the charges from +\infty or -\infty. Set up the integrals, evaluate them, and convince yourself. You can also take a short-cut via the path-independence property of the conservative forces.)


9. Obtaining a spatial field for the “potential” energy of Schrodinger’s equation:

It can be shown that:
\Pi( q_i, q_j ) = \Pi( q_i) = \Pi(q_j).

Consider now the problem of the last post, viz., the hydrogen atom.

OK. It is obvious that:
\Pi(q_j) = \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_i\,q_j}{r_{ij}}

Follow the earlier procedure of ontologically re-assigning the effects due to a point-charge to the the local elemental CV of the aether at q_j, thereby introducing q_A in place of q_j; and then generalizing from \vec{r}_j to \vec{r}_A, we get to a certain field of an internal energy. Let’s call give it the symbol V(q_j)

Thus,
V(q_j) = \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_i\,q_A}{r_{iA}}

We thus got the “potential” energy field we wanted for use in Schrodinger’s equation. It’s continuous even at \vec{r}_j.

Following the mainstream QM, we will continue calling the V(q_j) field function the “potential” energy field for q_j.

However, as mentioned in a previous post, the field denoted by V(q_j) means the same as the system’s potential energy only when the 3D field is concretized to its value for a specific point \vec{r}_j. But taken in its entirety, what V(q_j) denotes is an internal energy content that is infinitely larger than that part which can be converted into work and hence stands to be properly called a potential energy.

It is true that the entire internal energy content moves out of the system when both the charges are taken to infinity. However, such a passage of the energy out of the system does not imply that all of it gets exchanged at the moving boundaries, because the boundaries here are point positions.

So, strictly speaking, the V(q_j) field does not qualify to be called a potential energy field. Yet, to avert confusions from an already skpetical physicist community, we will keep this technical objection of ours aside, and call V the potential energy field.

If both the proton and the electron are to be regarded as movable, then we have to follow a procedure of splitting up the total, as shown below:

Split up \Pi( q_i, q_j) into two equal halves:

\Pi( q_i, q_j) = \dfrac{1}{2} \Pi( q_i, q_j ) + \dfrac{1}{2} \Pi( q_i, q_j )

Substitute on the right hand-side the two single-movable-charge terms:
\Pi( q_i, q_j) = \dfrac{1}{2} \Pi( q_j ) + \dfrac{1}{2} \Pi( q_i )

Now first aetherize and then generalize \Pi( q_j ) to V(q_j), and similarly go from \Pi( q_i) to V(q_i).

We thus get:
V( q_i, q_j) = \dfrac{1}{2} \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_i\,q_A}{r_{iA}}+ \dfrac{1}{2} \dfrac{1}{4\,\pi\,\epsilon_0}\dfrac{q_j\,q_A}{r_{jA}}

In the short and sweet form:
V( q_i, q_j) = \dfrac{1}{2} V( q_j ) + \dfrac{1}{2} V( q_i )
where all Vs are field-quantities (in joule, not volt).

We thus have solved the problem of discontinuity in the potential energy fields.


10. A detailed comment on \vec{E}(q_i) vs. \vec{\mathcal{F}}_{iA}:

Mathematically (in electrostatics), Lorentz’ law says:
\vec{\mathcal{F}}_{iA} = q_j \vec{E}(q_i)

But ontologically, there is no consistent interpretation for the textbook EM term of \vec{E}(q_i) (the electric vector field). It is supposed, in the textbook EM, to be a property of q_i alone. But such a thing is ontologically impossible to interpret. The only physically consistent way in which it can be interpreted is to regard \vec{E}(q_i) as that hypothetical force-field which would arise due to q_i if a positive unit charge q_j were to be present, one at a time, at each point in the universe. We would then add these point-forces defined at all different points together and treat it as a field. So, the best physical interpretation for it is a hypothetical one. This also is a reason why such a formulation definitely implants a seed of a doubt regarding the very physical-ness of the fields idea.

As a hypothetical field, \vec{E} tries to give the “independent” force-field of q_i and so, its sign depends on the sign of q_i. Which leads to the trouble of discontinuity with the associated potential energy field too.

In contrast, \vec{\mathcal{F}}_{iA} and \vec{\mathcal{F}}_{jA} are easy to intepret ontologically. They are the two aetherial fields which simultaneously exist due to the existence of q_i and q_j—and vice-versa. \vec{\mathcal{F}}_{iA} doesn’t exist without there also being \vec{\mathcal{F}}_{iA} and vice-versa.

The aetherial field \vec{\mathcal{F}}_{jA} has a singularity at q_i. So, it forces q_j, but not q_i, due to symmetry of the field around the latter. Similarly, the aetherial field \vec{\mathcal{F}}_{iA} has a singularity at q_j. It forces q_i but not q_j due to symmetry of the field around the latter.

Based on these observations, similar statements can be made for V(q_i) and V(q_j) as well.

The total atherial field due to a pair of charges, V(q_i, q_j) has two singularities, one each at \vec{r}_i and \vec{r}_j. It changes with any change in the position of either of the two charges. It too ontologically exists. However, in calculations, we almost never have the occasion to take the total field. It’s always the force on, or the partial potential energy due to, a single charge at a time.


11. The V field “lives” in the ordinary 3D space for any arbitrary system of N charges:

Oh, BTW, did you notice one thing that might have gone almost unnoticed?

Even if \Pi(q_i, q_j) is defined on the space of separation vectors, and even if these separation vectors are defined only on the six-dimensional configuration space existing in the mystical Platonic “heaven”, we have already brought it down to our usual 3D world. We did it by expressing \Pi(q_i, q_j) as (i) a concretization at \vec{r}_i and \vec{j} of (ii) a sum of two instantaneous 3D fields (each having the right sign, and each without any point-sharp discontinuity),

So, given a pair of charges, what ontologically exist are the two 3D fields: \vec{\mathcal{F}}_{iA} + \vec{\mathcal{F}}_{jA}. They move when \vec{r}_i and \vec{r}_j move, respectively.

To think of a 6D configuration space of all possible values for \vec{r}_i and \vec{r}_j is to think of the set of all possible locations for the two singularities of the two 3D fields.

Just the fact that each singularity can physically be in any arbitary place does not imply that there is no 3D field associated with it.

The two descriptions (one on the configuration space and the other using 3D fields) are not only equivalent in the sense they reproduce the same consequences, our description is richer in terms of ontological clarity and richness: it speaks of 3D fields each part of which can interact with another 3D field, namely \Psi(\vec{r},t), thereby forming a conceptual bridge to the otherwise floating abstractions of the mainstream QM too.


12. The aether as a necessity in the physics of EM, and hence, also of QM:

One last point before we close for today.

In making the generalization from \vec{r}_A defined only at the elemental CV in the aether surrounding q_j‘s position \vec{r}_j to the entire space, we did not justify the generalization procedure itself on the ontological grounds.

The relevant physics fact is simple: The aether (the non-inertial one, as first put forth by Lorentz, and also one that appeared very naturally in our independent development) is all pervading.

12.1 Philosophical reason for the necessity of the aether:

Philosophically, the aether, qua a physical existent, replaces the notion of the perfectly “empty” space—i.e. a notion of the physical space that, despite being physical, amounts to a literal nothing. Such a theory of physics, thereby, elevates the nothing—the zero, the naught—to the same level as that of something, anything, everything, the existence as such.

But following ancient reasoning, a nothing is only a higher-level abstraction, not a physical existent, that denotes the absence of something. It has no epistemological status as apart from that whose absence it denotes. It certainly cannot be regarded as if it were a special something. If it were to be a special something, in physics theory, it would be an ontological object.

So, something has to be there where the point-phenomena of massive particles of EC Objects are not.

That’s the philosophical justification.

12.2 Mathematical considerations supporting the idea of the aether:

Now let’s look at a few mathematical considerations which point to the same conclusion.

12.2.i. Discontinuity in force-field if the aether is not there:

As seen right in this post, if aether is removed, then the force-experiencing and the force-producing aspects of an EC Object—viz. the charge q_j has to be attributed to the EC Object itself.

However, this introduces the discontinuity of the kind we discussed at length in the last post. The only way to eliminate such a discontinuity is to have a field phenomenon, viz. q_A, in place of the point-phenomenon, viz. q_j.

12.2.ii. Nature of the spatial differential operators (grad, div, curl):

The electromagnetic theory in general uses differential operators of grad, div, and curl. Even just in electrostatics, we had to have a differential element at \vec{r}_j to be able to calculate the work done in moving a charge. The \text{d}r is an infinitesimal element of a separation vector, not a zero. Similarly, you need the grad operator for going from the aetherial potential energy V(q_j) field to the aetherial force field \vec{\mathcal{F}}_{iA}. Similarly, for the div and the curl (in dynamical theory of EM). In QM, we use the Laplacian too.

All these are spatial differential operators. They refer to two different values of a field variable at two different points, and then take an appropriate ratio in the limit of the vanishing size.

Now the important characteristics to bear in mind here comes from a basic appreciation of calculus. An infinitesimal of something does not denote a definite change. The infinitesimal arises in the context of two definite quantities at two distinctly identifiable points, and then applies a limiting process. So, the proper order of knowledge development is: two distinct points lying at the endpoints of a continuous interval; quantities (variables) defined at the two points; then a limiting process.

This context also gets reflected in the purpose of having infinitesimally defined quantities in the first place. The only purpose of a derivative is to have the ability to integrate in the abstract, and vice versa (via the fundamental theorem of calculus of derivatives and anti-derivatives). Sans ability to integrate, a differential equation serves no purpose; it has no place even in proper mathematics let alone in proper physics. Ontology of calculus demands so.

If you want to apply grad to the V(q_j) field, for instance, and if you insist on a formulation of an empty space and a point-particle for an EC Object (a point-particle itself being an abstraction from a finite particle, but let’s leave that context-drop aside), then you have to assume that the charge-attribute operates not just at q_j but also at infinitesimally small distances away from it. Naturally, at least an infinitesimally small distance around the \vec{r}_j point cannot be bereft of everything; there must be something to it—something which is not covered in the abstraction of the EC Object itself. What is it? Blank out.

An infinitesimal is not a definite quantity; its only purpose is to be able to formulate integral expressions which, when concretized between definite limits, yield calculations of the unknown variables. So, an infinitesimal can be specified only in such a context where definite end-points can be specified.

If so, what is the definite point in the physical space up to which the infinitesimal abstraction, say of \nabla V(q_j) remains valid? How do you pick out such a point? Blank out, again!

The straight-forward solution is to say that the infinitesimal expresses a continuous variation over all finite portions of space away from \vec{r}_j, and that this finite portion is indefinitely large. The reason is, the governing law itself is such that the abstraction of infinitely large separations are required before you can at all reach a meaningful datum of V(q_j)_f approaching V(q_j)_i.

Naturally, the nature of the mathematics, as used in the usual EM physics, itself demands that whatever it is which is present in the infinitesimal neighbourhood of q_j must also be present everywhere.

That’s your mathematically based argument for an all pervading aether, in other words.

12.3. Polemics: No, “nature abhors vacuum” is not the reason we have put forth:

The line that physicists opposed to the idea of the aether love to quote most, is: “nature abhors vacuum.” As if the physical nature had some intrinsic, mystical, purpose; a consciousness in some guise that also had some refined tastes developed with it.

They choose to reproduce this line because it is the weakest expression of the original formulation (which, in the Western philosophy, is due to Parmenidus).

To obfuscate the issue is to ignore his presence in the context altogether; pretend that Aristotle’s estimated three quarters of extinct material might have had something important or better to offer—strictly in reference to the one quarter extant material (these estimates being merely estimates); to ignore every other formulation of the simple underlying truth that the nothing cannot have any valid epistemological status; and try to win the argument via the method of intimidation.

Trying to play turf-battles by using ideas from mysticism, and thence mystefying everything.

Idiots. And, anti-physics zealots.

Anyway, to end this section on a better, something-encapsulating way:

12.4 Polemics: Aether is not an invisible medium; it is not the thing that does nothing:

It’s wrong to think that the EC Object is what you do see and the aether what you don’t. If you have that idea of the aether, know that you are wrong: you have reduced the elementary EC Object to a non-elementary NM-ontological one, and the aether to the Newtonian absolute space. The fact of the matter is: it is both the EC Objects and the aether which together act to let any such a thing as seeing possible.

In EM, they both lie at the most fundamental level. Hence, they both serve as the basis for every physical mechanism, including that involved in perception (light and its absorption in the eye, the neural signals, etc.) Drop any one of them and you won’t be able to see at all—whether the NM Objects or the “empty” space. In short, the aether is not a philosophical slap on just to make Aritstotle happy!

Both the aether and the massive point-particle of EC Objects are abstractions from some underlying physical reality; they both are at the same ontological standing in the EM (and QM) theory. You can’t have just one without the other. Any attempt to do so is bound to fail. Even if you accept Lorentz’ idea (I only vaguely know it via a recent skimming) that all there is just one single object of the aether, to build any meaningful quantitative laws, you still have to make a reference to the singularity conditions in the aetherial fields. That’s nothing but your plain EC Object, now coming in via another back-door entry. Why not keep them both?

We can always take mass of the spring and dump it with the point-mass of a ball, and take the stresses in the finite ball and dump them in the spring. That’s how we get the mass-spring system. Funny, people never try to have only the the spring or only the mass. But they mostly insist on having only the EC Objects but not the aether. Or, otherwise, advocate the position that everything is the aether, its modern “avataar” being: “everything is fields, dude!”

Oh.

OK. Just earmark to keep them both. Matter cannot act where it is not.


13. A preview of the things to come:

The next time, we will take a look at some of the essence of the derivation of Schrodinger’s equation. The assigned reading for the next post is David Morin’s chapter on quantum mechanics, here (PDF, 495 kB) [^].

Also, refreshen up a bit about the simple harmonic motion and all, preferably, from Resnick and Halliday.

No, contrary to a certain gossip among physicists, physics is not a study of the pendulum in various disguises. It is: the study of the mass-spring system!

A pendulum displays only the dynamical behaviour, not static (and it does best only for the second-order relations). On the other hand, a spring-mass system shows also the fact that forces can be defined via a zeroth-order relation: \vec{f} = - k (\vec{x} - \vec{x}_0). So, it works for purely static equilibria too. That is, apart from reproducing your usual pendulum behaviour anyway. It in fact has a very neat visual separation of the place where the potential energy is stored and the place that does the two and fro. So, it’s even easier to see the potential and kinetic energies trying to best each other. And it all rests on a restoring force that cannot be defined via \vec{f} = \dfrac{\text{d}\vec{p}}{\text{d}t}. The entire continuum mechanics begins with the spring-mass system. If I asked you to explain me the pendulum of the static equilibrium, could you? So there. Just develop a habit of ignoring what physicists say, as always. After all, you want to study physics, don’t you?

Alright, go through the assigned reading, come prepared the next time (may be after a week), take care in the meanwhile, and bye for now.


A song I like:

(Hindi) “humsafar mere humsafar, pankh tum…”
Singers: Lata Mangeshkar and Mukesh
Lyrics: Gulzaar
Music: Kalyanji-Anandji

 

Ontologies in physics—7: To understand QM, you have to first solve yet another problem with the EM

In the last post, I mentioned the difficulty introduced by (i) the higher-dimensional nature of \Psi, and (ii) the definition of the electrostatic V on the separation vectors rather than position vectors.

Turns out that while writing the next post to address the issue, I spotted yet another issue. Its maths is straight-forward (the X–XII standard one). But its ontology is not at all so easy to figure out. So, let me mention it. (This upsets the entire planning I had in mind for QM, and needs small but extensive revisions all over the earlier published EM ontology as well. Anyway, read on.)


In QM, we do use a classical EM quantity, namely, the electrostatic potential energy field. It turns out that the understanding of EM which we have so painfully developed over some 4 very long posts, still may not be adequate enough.

To repeat, the overall maths remains the same. But it’s the physics—rather, the detailed ontological description—which has to be changed. And with it, some small mathematical details would change as well.

I will mention the problem here in this post, but not the solution I have in mind; else this post will become huuuuuge. (Explaining the problem itself is going to take a bit.)


1. Potential energy function for hydrogen atom:

Consider a hydrogen atom in an otherwise “empty” infinite space, as our quantum system.

The proton and the electron interact with each other. To spare me too much typing, let’s approximate the proton as being fixed in space, say at the origin, and let’s also assume a 1D atom.

The Coulomb force by the proton (particle 1) on the electron (particle 2) is given by \vec{f}_{12} = \dfrac{(e)(-e)}{r^2} \hat{r}_{12}. (I have rescaled the equations to make the constants “disappear” from the equation, though physically they are there, through the multiplication by 1 in appropriate units.) The potential energy of the electron due to the proton is given by: V_{12} = \dfrac{(e)(-e)}{r}. (There is no prefactor of 1/2 because the proton is fixed.)

The potential energy profile here is in the form of a well, vaguely looking like the letter `V’;  it goes infinitely deep at the origin (at the proton’s position), and its wings asymptotically approach zero at \pm \infty.

If you draw a graph, the electron will occupy a point-position at one and only one point on the r-axis at any instant; it won’t go all over the space. Remember, the graph is for V, which is expressed using the classical law of Coulomb’s.


2. QM requires the entire V function at every instant:

In QM, the measured position of the electron could be anywhere; it is given using Born’s rule on the wavefunction \Psi(r,t).

So, we have two notions of positions for the supposedly same existent: the electron.

One notion refers to the classical point-position. We use this notion even in QM calculations, else we could not get to the V function. In the classical view, the electronic position can be variable; it can go over the entire infinite domain; but it must refer to one and only one point at any given instant.

The measured position of the electron refers to the \Psi, which is a function of all infinite space. The Schrodinger evolution occurs at all points of space at any instant. So, the electron’s measured position could be found at any location—anywhere in the infinite space. Once measured, the position “comes down” to a single point. But before measurement, \Psi sure is (nonuniformly) spread all over the infinite domain.

Schrodinger’s solution for the hydrogen atom uses \Psi as the unknown variable, and V as a known variable. Given the form of this equation, you have no choice but to consider the entire graph of the potential energy (V) function into account at every instant in the Schrodinger evolution. Any eigenvalue problem of any operator requires the entire function of V; a single value at a time won’t do.

Just a point-value of V at the instantaneous position of the classical electron simply won’t do—you couldn’t solve Schrodinger’s equation then.

If we have to bring the \Psi from its Platonic “heaven” to our 1D space, we have to treat the entire graph of V as a physically existing (infinitely spread) field. Only then could we possibly say that \Psi too is a 1D field. (Even if you don’t have this motivation, read on, anyway. You are bound to find something interesting.)

Now an issue arises.


3. In Coulomb’s law, there is only one value for V at any instant:

The proton is fixed. So, the electron must be movable—else, despite being a point-particle, it is hard to think of a mechanism which can generate the whole V graph for its local potential energies.

But if the electron is movable, there is a certain trouble regarding what kind of a kinematics we might ascribe to the electron so that it generates the whole V field required by the Schrodinger equation. Remember, V is the potential energy of the electron, not of proton.

By classical EM, V at any instant must be a point-property, not a field. But Schrodinger’s equation requires a field for V.

So, the only imaginable solutions are weird: an infinitely fast electron running all over the domain but lawfully (i.e. following the laws at every definite point). Or something similarly weird.

So, the problem (how to explain how the V function, used in Schrodinger’s equation) still remains.


4. Textbook treatment of EM has fields, but no physics for multiplication by signs:

In the textbook treatment of EM (and I said EM, not QM), the proton does create its own force-field, which remains fixed in space (for a spatially fixed proton). The proton’s \vec{E} field is spread all over the infinite space, at any instant. So, why not exploit this fact? Why not try to get the electron’s V from the proton’s \vec{E}?

The potential field (in volt) of a proton is denoted as V in EM texts. So, to avoid confusion with the potential energy function (in joule) of the electron, let’s denote the proton’s potential (in volt) using the symbol P.

The potential field P does remain fixed and spread all over the space at any instant, as desired. But the trouble is this:

It is also positive everywhere. Its graph is not a well, it is a peak—infinitely tall peak at the proton’s position, asymptotically approaching zero at \pm \infty, and positive (above the zero-line) everywhere.

Therefore, you have to multiply this P field by the negative charge of electron e, so that P turns into the required V field of the electron.

But nature does no multiplications—not unless there is a definite physical mechanism to “convert” the quantities appropriately.

For multiplications with signed quantities, a mechanism like the mechanical lever could be handy. One small side goes down; the other big side goes  up but to a different extent; etc. Unfortunately, there is no place for a lever in the EM ontology—it’s all point charges and the “empty” space, which we now call the aether.

Now, if multiplication of constant magnitudes alone were to be a problem, we could have always taken care of it by suitably redefining P.

But the trouble caused by the differing sign still remains!

And that’s where the real trouble is. Let me show you how.

If a proton has to have its own P field, then its role has to stay the same regardless of the interactions that the proton enters into. Whether a given proton interacts with an electron (negatively charged), or with another proton (positively charged), the given proton’s own field still has to stay the same at all times, in any system—else it will not be its own field but one of interactions. It also has to remain positive by sign—even if P is rescaled to avoid multiplications.

But if V has to be negative when an electron interacts with it, and if V also has to be positive when another proton interacts with it, then a multiplication by the signs (by \pm 1 must occur. You just can’t avoid multiplications.

But there is no mechanism for the multiplications mandated by the sign conversions.

How do we resolve this issue?


5. The weakness of a proposed solution:

Here is one way out that we might think of.

We say that a proton’s P field stays just the same (positive, fixed) at all times. However, when the second particle is positively charged, then it moves away from the proton; when the second particle is negatively charged, then it moves towards the proton. Since V does require a dot product of a force with a displacement vector, and since the displacement vector does change signs in this procedure, the problem seems to have been solved.

So, the proposed solution becomes: the direction of the motion of the forced particle is not determined only by the field (which is always positive here), but also by the polarity of that particle itself. And, it’s a simple change, you might argue. There is some unknown physics to the very abstraction of the point charge itself, you could say, which propels it this way instead of that way, depending on its own sign.

Thus, charges of opposing polarities go in opposite directions while interacting with the same proton. That’s just how charges interact with fields. By definition. You could say that.

What could possibly be wrong with that view?

Well, the wrong thing is this:

If you imagine a classical point-particle of an electron as going towards the proton at a point, then a funny situation ensues while using it in QM.

The arrows depicting the force-field of the proton always point away from it—except for the one distinguished position, viz., that of the electron, where a single arrow would be found pointing towards the proton (following the above suggestion).

So, the action of the point-particle of the electron introduces an infinitely sharp discontinuity in the force-field of the proton, which then must also seep into its V field.

But a discontinuity like that is not basically compatible with Schrodinger’s equation. It will therefore lead to one of the following two consequences:

It might make the solution impossible or ill-defined. I don’t know enough about maths to tell if this could be true.  But what I can tell is this: Even if a solution is possible (including solutions that possibly may be asymptotic, or are approximate but good enough) then the presence of the discontinuity will sure have an impact on the nature of the solution. The calculated \Psi wouldn’t be the same as that for a V without the discontinuity. That’s inevitable.

But why can’t we ignore the classical point-position of the electron? Well, the answer is that in a more general theory which keeps both particles movable, then we have to calculate the proton’s potential energy too. To do that, we have to take the electric potential (in volts) P of the electron, and multiply it by the charge of the proton. The trouble is: The electric potential field of the electron has singularity at its classical position. So, classical positions cannot be dropped out of the calculations. The classical position of a given particle is necessary for calculating the V field of the other particle, and, vice-versa.

In short, to ensure consistency in the two ways of the interaction, we must regard the singularities as still being present where they are.

And with that consideration, essentially, we have once again come back to a repercussion of the idea that the classical electron has a point position, but its potential energy field in the electrostatic interaction with the proton is spread everywhere.

To fulfill our desire of having a 3D field for \Psi, we have to have a certain kind of a field for V. But V should not change its value in just one isolated place, just in order to allow multiplication by -1, because doing so introduces a very bad discontinuity. It should remain the same smooth V that we have always seen in the textbooks on QM.


6. The problem statement, in a nutshell:

So, here is the problem statement:

To find a physically realizable way such that: even if we use the classical EM properties of the electron while calculating V, and even if the electron is classically a point-particle, its V function (in joules) should still turn out to be negative everywhere—even if the proton has its own potential field (P, in volts) that is positive everywhere in the classical EM.

In short, we have to change the way we look at the physics of the EM fields, and then also make the required changes to any maths, as necessary. Without disturbing the routine calculations either in EM or in QM.

Can it be done? Well, I think the answer is “yes.”


7. A personal note:

While I’ve been having some vague sense of there being some issue “to be looked into later on” for quite some time (months, at least), it was only over the last week, especially over the last couple of days (since the publication of the last post), that this problem became really acute. I always used to skip over this ontology/physics issue and go directly over to using the EM maths involved in the QM. I used to think that the ontology of such EM as it is used in the QM, would be pretty easy to explain—at least as compared to the ontology of QM. Looks like despite spending thousands of words (some 4–5 posts with a total of may be 15–20 K words) there still wasn’t enough of a clarity—about EM.

Not if we adopt the principle, which I discovered on the fly, right while in the middle of writing this series, that nature does no multiplications without there being a physical mechanism for it.

Fortunately, the problem did become clear. Clear enough that, I think, I also found a satisfactory enough solution to it too. Right today (on 2019.10.15 evening IST).

Would you like to give it a try? (I anyway need a break. So, take about a week’s time or so, if you wish.)


Bye for now, take care, and see you the next time.


A song I like:

(Hindi) “jaani o jaani”
Singer: Kishore Kumar
Music: Laxmikant-Pyarelal
Lyrics: Anand Bakshi


History:

— Originally published: 2019.10.16 01:21 IST
— One or two typos corrected, section names added, and a few explanatory sentences added inline: 2019.10.17 22:09 IST. Let’s leave this post right in this form.

 

Ontologies in physics—6: A basic problem: How the mainstream QM views the variables in Schrodinger’s equation

1. Prologue:

From this post, at last, we begin tackling quantum mechanics! We will be covering those topics from the physics and maths of it which are absolutely necessary from developing our own ontological viewpoint.

We will first have a look at the most comprehensive version of the non-relativistic Schrodinger equation. (Our approach so far has addressed only the non-relativistic version of QM.)

We will then note a few points concerning the way the mainstream physics (MSMQ) de facto approaches it—which is remarkably different from how engineers regard their partial differential equations.

In the process, we will come isolate and pin down a basic issue concerning how the two variables \Psi and V from Schrodinger’s equation are to be seen.

We regard this issue as a problem to be resolved, and not as just an unfamiliar kind of maths that needs no further explanation or development.

OK. Let’s get going.


2. The N-particle Schrodinger’s equation:

Consider an isolated system having 3D infinite space in it. Introduce N number of charged particles (EC Objects in our ontological view) in it. (Anytime you take arbitrary number of elementary charges, it’s helpful to think of them as being evenly spread between positive and negative polarities, because the net charge of the universe is zero.) All the particles are elementary charges. Thus, -|q_i| = e for all the particles. We will not worry about any differences in their masses, for now.

Following the mainstream QM, we also imagine the existence of something in the system such that its effect is the availability of a potential energy V.

The multi-particle time-dependent Schrodinger equation now reads:

i\,\hbar \dfrac{\partial \Psi(\vec{R},t)}{\partial t} = - \dfrac{\hbar^2}{2m} \nabla^2 \Psi(\vec{R},t) + V(\vec{R},t)\Psi(\vec{R},t)

Here, \vec{R} denotes a set of particle positions, i.e., \vec{R} = \lbrace \vec{r}_1, \vec{r}_2, \vec{r}_3, \dots, \vec{r}_N \rbrace. The rest of the notation is standard.


3. The mainstream view of the wavefunction:

The mainstream QM (MSMQ) says that the wavefunction \Psi(\vec{R},t) exists not in the physical 3-dimensional space, but in a much bigger, abstract, 3N-dimensional configuration space. What do they mean by this?

According to MSQM, a particle’s position is not definite until it is measured. Upon a measurement for the position, however, we do get a definite 3D point in the physical space for its position. This point could have been anywhere in the physical 3D space spanned by the system. However, measurement process “selects” one and only one point for this particle, at random, during any measurement process. … Repeat for all other particles. Notice, the measured positions are in the physical 3D.

Suppose we measure the positions of all the particles in the system. (Actually, speaking in more general terms, the argument applies also to position variables before measurement concretizes them to certain values.)

Suppose we now associate the measured positions via the set \vec{R} = \lbrace \vec{r}_1, \vec{r}_2, \vec{r}_3, \dots, \vec{r}_N \rbrace, where each \vec{r}_i refers to a position in the physical 3D space.

We will not delve into the issue of what measurement means, right away. We will simply try to understand the form of the equation. There is a certain issue associated with its form, but it may not become immediately apparent, esp. if you come from an engineering background. So, let’s make sure to know what that issue is:

Following the mainstream QM, the meaning of the wavefunction \Psi is this: It is a complex-valued function defined over an abstract 3N-dimensional configuration space (which has 3 coordinates for each of the N number of particles).

The meaning of any function defined over an abstract 3ND configuration space is this:

If you take the set of all the particle positions \vec{R} and plug them into such a function, then it evaluates to some single number. In case of the wavefunction, this number happens to be a complex number, in general. (Remember, all real numbers anyway are complex numbers, but not vice-versa.) Using the C++ programming terms, if you take real-valued 3D positions, pack them in an STL vector of size N, and send the vector into the function as an argument, then it returns just one specific complex number.)

All the input arguments (the N-number of 3D positions) are necessary; they all taken at once produce the value of the function—the single number. Vary any Cartesian component (x, y, or z) for any particle position, and \Psi will, in general, give you another complex number.

Since a 3D space can accommodate only 3 number of independent coordinates, but since all 3N components are required to know a single \Psi value, it can only be an abstract entity.

Got the argument?

Alright. What about the term V?


4. The mainstream view of V in the Schrodinger equation:

In the mainstream QM, the V term need not always have its origin in the electrostatic interactions of elementary point-charges.

It could be any arbitrary source that imparts a potential energy to the system. Thus, in the mainstream QM, the source of V could also be gravitational, magnetic, etc. Further, in the mainstream QM, V could be any arbitrary function; it doesn’t have to be singularly anchored into any kind of point-particles.

In the context of discussions of foundations of QM—of QM Ontology—we reject such an interpretation. We instead take the view that V arises only from the electrostatic interactions of charges. The following discussion is written from this viewpoint.

It turns out that, speaking in the most fundamental and general terms, and following the mainstream QM’s logic, the V function too must be seen as a function that “lives” in an abstract 3ND configuration space. Let’s try to understand a certain peculiarity of the electrostatic V function better.

Consider an electrostatic system of two point-charges. The potential energy of the system now depends on their separation: V = V(\vec{r}_2 - \vec{r}_1) \propto q_1q_2/|\vec{r}_2 - \vec{r}_1|. But a separation is not the same as a position.

For simplicity, assume unit positive charges in a 1D space, and the constant of proportionality also to be 1 in suitable units. Suppose now you keep \vec{r}_1 fixed, say at x = 0.0, and vary only \vec{r}_2, say to x = 1.0, 2.0, 3.0, \dots, then you will get a certain series of V values, 1.0, 0.5, 0.33\dots, \dots.

You might therefore be tempted to imagine a 1D function for V, because there is a clear-cut mapping here, being given by the ordered pairs of \vec{r}_2 \Rightarrow V values like: (1.0, 1.0), (2.0, 0.5), (3.0, 0.33\dots), \dots. So, it seems that V can be described as a function of \vec{r}_2.

But this conclusion would be wrong because the first charge has been kept fixed all along in this procedure. However, its position can be varied too. If you now begin moving the first charge too, then using the same \vec{r}_2 value will gives you different values for V. Thus, V can be defined only as a function of the separation space \vec{s} = \vec{r}_2 - \vec{r}_1.

If there are more than two particles, i.e. in the general case, the multi-particle Schrodinger equation of N particles uses that form of V which has N(N-1) pairs of separation vectors forming its argument. Here we list some of them: \vec{r}_2 - \vec{r}_1, \vec{r}_3 - \vec{r}_1, \vec{r}_4 - \vec{r}_1, \dots, \vec{r}_1 - \vec{r}_2, \vec{r}_3 - \vec{r}_2, \vec{r}_4 - \vec{r}_2, \dots, \vec{r}_1 - \vec{r}_3, \vec{r}_2 - \vec{r}_3, \vec{r}_4 - \vec{r}_1, \dots, \dots. Using the index notation:

V = \sum\limits_{i=1}^{N}\sum\limits_{j\neq i, j=1}^{N} V(\vec{s}_{ij}),

where \vec{s}_{ij} = \vec{r}_j - \vec{r}_i.

Of course, there is a certain redundancy here, because the s_{ij} = |\vec{s}_{ij}| = |\vec{s}_{ji}| = s_{ji}. The electrostatic potential energy function depends only on s_{ij}, not on \vec{s}_{ij}. The general sum formula can be re-written in a form that avoids double listing of the equivalent pairs of the separation vectors, but it not only looks a bit more complicated, but also makes it somewhat more difficult to understand the issues involved. So, we will continue using the simple form—one which generates all possible N(N-1) terms for the separation vectors.

If you try to embed this separation space in the physical 3D space, you will find that it cannot be done. You can’t associate a unique separation vector for each position vector in the physical space, because associated with any point-position, there come to be an infinity of separation vectors all of which have to be associated with it. For instance, for the position vector \vec{r}_2, there are an infinity of separation vectors \vec{s} = \vec{a} - \vec{r}_2 where \vec{a} is an arbitrary point (standing in for the variable \vec{r}_1). Thus, the mapping from a specific position vector \vec{r}_2 to potential energy values becomes an 1: \infty mapping. Similarly for \vec{r}_1. That’s why V is not a function of the point-positions in the physical space.

Of course, V can still be seen as proper 1:1 mapping, i.e., as a proper function. But it is a function defined on the space formed by all possible separation vectors, not on the physical space.

Homework: Contrast this situation from a function of two space variables, e.g., F = F(\vec{x},\vec{y}). Explain why F is a function (i.e. a 1:1 mapping) that is defined on a space of position vectors, but V can be taken to be a function only if it is seen as being defined on a space of separation vectors. In other words, why the use of separation vector space makes the V go from a 1:\infty mapping to a 1:1 mapping.


5. Wrapping up the problem statement:

If the above seems a quizzical way of looking at the phenomena, well, that precisely is how the multi-particle Schrodinger equation is formulated. Really. The wavefunction \Psi is defined on an abstract 3ND configuration space. Really. The potential energy function V is defined using the more abstract notion of the separation space(s). Really.

If you specify the position coordinates, then you obtain a single number each for the potential energy and the wavefunction. The mainstream QM essentially views them both as aspatial variables. They do capture something about the quantum system, but only as if they were some kind of quantities that applied at once to the global system. They do not have a physical existence in the 3D space-–even if the position coordinates from the physical 3D space do determine them.

In contrast, following our new approach, we take the view that such a characterization of quantum mechanics cannot be accepted, certainly not on the grounds as flimsy as: “That’s just how the math of quantum mechanics is! And it works!!” The grounds are flimsy, even if a Nobel laureate or two might have informally uttered such words.

We believe that there is a problem here: In not being able to regard either \Psi or V as referring to some simple ontological entities existing in the physical 3D space.

So, our immediate problem statement becomes this:

To find some suitable quantities defined on the physical 3D space, and to use them in such a way, that our maths would turn out to be exactly the same as given for the mainstream quantum mechanics.


6. A preview of things to come: A bit about the strategy we adopt to solve this problem:

To solve this problem, we begin with what is easiest to us, namely, the simpler, classical-looking, V function. Most of the next post will remain concerned with understanding the V term from the viewpoint of the above-noted problem. Unfortunately, a repercussion would be that our discussion might end up looking a lot like an endless repetition of the issues already seen (and resolved) in the earlier posts from this series.

However, if you ever suspect, I would advise you to keep the doubt aside and read the next post when it comes. Though the terms and the equations might look exactly as what was noted earlier, the way they are rooted in the 3D reality and combined together, is new. New enough, that it directly shows a way to regard even the \Psi field as a physical 3D field.

Quantum physicists always warn you that achieving such a thing—a 3D space-based interpretation for the system-\Psi—is impossible. A certain working quantum physicist—an author of a textbook published abroad—had warned me that many people (including he himself) had tried it for years, but had not succeeded. Accordingly, he had drawn two conclusions (if I recall it right from my fallible memory): (i) It would be a very, very difficult problem, if not impossible. (ii) Therefore, he would be very skeptical if anyone makes the claim that he does have a 3D-based interpretation, that the QM \Psi “lives” in the same ordinary 3D space that we engineers routinely use.

Apparently, therefore, what you would be reading here in the subsequent posts would be something like a brand-new physics. (So, keep your doubts, but hang on nevertheless.)

If valid, our new approach would have brought the \Psi field from its 3N-dimensional Platonic “heaven” to the ordinary physical space of 3 dimensions.

“Bhageerath” (भगीरथ) [^] ? … Well, I don’t think in such terms. “Bhageerath” must have been an actual historical figure, but his deeds obviously have got shrouded in the subsequent mysticism and mythology. In any case, we don’t mean to invite any comparisons in terms of the scale of achievements. He could possibly serve as an inspiration—for the scale of efforts. But not as an object of comparison.

All in all, “Bhageerath”’s deed were his, and they anyway lie in the distant—even hazy—past. Our understanding is our own, and we must expend our own efforts.

But yes, if found valid, our approach will have extended the state of the art concerning how to understand this theory. Reason good enough to hang around? You decide. For me, the motivation simply has been to understand quantum mechanics right; to develop a solid understanding of its basic nature.

Bye for now, take care, and sure join me the next time—which should be soon enough.


A song I like:

[The official music director here is SD. But I do definitely sense a touch of RD here. Just like for many songs from the movie “Aaraadhanaa”, “Guide”, “Prem-Pujari”, etc. Or, for that matter, music for most any one of the movies that the senior Burman composed during the late ’60s or early ’70s. … RD anyway was listed as an assistant for many of SD’s movies from those times.]

(Hindi) “aaj ko junali raat maa”
Music: S. D. Burman
Singer: Lata Mangeshkar, Mohammad Rafi
Lyrics: Majrooh Sultanpuri


History:
— First published 2019.10.13 14:10 IST.
— Corrected typos, deleted erroneous or ill-formed passages, and improved the wording on home-work (in section 4) on the same day, by 18:29 IST.
— Added the personal comment in the songs section on 2019.10.13 (same day) 22:42 IST.