Fluxes, scalars, vectors, tensors…. and, running in circles about them!

0. This post is written for those who know something about Thermal Engineering (i.e., fluid dynamics, heat transfer, and transport phenomena) say up to the UG level at least. [A knowledge of Design Engineering, in particular, the tensors as they appear in solid mechanics, would be helpful to have but not necessary. After all, contrary to what many UGC and AICTE-approved (Full) Professors of Mechanical Engineering teaching ME (Mech – Design Engineering) courses in SPPU and other Indian universities believe, tensors not only appear also in fluid mechanics, but, in fact, the fluids phenomena make it (only so slightly) easier to understand this concept. [But all these cartoons characters, even if they don’t know even this plain and simple a fact, can always be fully relied (by anyone) about raising objections about my Metallurgy background, when it comes to my own approval, at any time! [Indians!!]]]

In this post, I write a bit about the following question:

Why is the flux \vec{J} of a scalar \phi a vector quantity, and not a mere number (which is aka a “scalar,” in certain contexts)? Why is it not a tensor—whatever the hell the term means, physically?

And, what is the best way to define a flux vector anyway?


1.

One easy answer is that if the flux is a vector, then we can establish a flux-gradient relationship. Such relationships happen to appear as statements of physical laws in all the disciplines wherever the idea of a continuum was found useful. So the scope of the applicability of the flux-gradient relationships is very vast.

The reason to define the flux as a vector, then, becomes: because the gradient of a scalar field is a vector field, that’s why.

But this answer only tells us about one of the end-purposes of the concept, viz., how it can be used. And then the answer provided is: for the formulation of a physical law. But this answer tells us nothing by way of the very meaning of the concept of flux itself.


2.

Another easy answer is that if it is a vector quantity, then it simplifies the maths involved. Instead of remembering having to take the right \theta and then multiplying the relevant scalar quantity by the \cos of this \theta, we can more succinctly write:

q = \vec{J} \cdot \vec{S} (Eq. 1)

where q is the quantity of \phi, an intensive scalar property of the fluid flowing across a given finite surface, \vec{S}, and \vec{J} is the flux of \Phi, the extensive quantity corresponding to the intensive quantity \phi.

However, apart from being a mere convenience of notation—a useful shorthand—this answer once again touches only on the end-purpose, viz., the fact that the idea of flux can be used to calculate the amount q of the transported property \Phi.

There also is another problem with this, second, answer.

Notice that in Eq. 1, \vec{J} has not been defined independently of the “dotting” operation.

If you have an equation in which the very quantity to be defined itself has an operator acting on it on one side of an equation, and then, if a suitable anti- or inverse-operator is available, then you can apply the inverse operator on both sides of the equation, and thereby “free-up” the quantity to be defined itself. This way, the quantity to be defined becomes available all by itself, and so, its definition in terms of certain hierarchically preceding other quantities also becomes straight-forward.

OK, the description looks more complex than it is, so let me illustrate it with a concrete example.

Suppose you want to define some vector \vec{T}, but the only basic equation available to you is:

\vec{R} = \int \text{d} x \vec{T}, (Eq. 2)

assuming that \vec{T} is a function of position x.

In Eq. 2, first, the integral operator must operate on \vec{T}(x) so as to produce some other quantity, here, \vec{R}. Thus, Eq. 2 can be taken as a definition for \vec{R}, but not for \vec{T}.

However, fortunately, a suitable inverse operator is available here; the inverse of integration is differentiation. So, what we do is to apply this inverse operator on both sides. On the right hand-side, it acts to let \vec{T} be free of any operator, to give you:

\dfrac{\text{d}\vec{R}}{\text{d}x} = \vec{T} (Eq. 3)

It is the Eq. 3 which can now be used as a definition of \vec{T}.

In principle, you don’t have to go to Eq. 3. In principle, you could perhaps venture to use a bit of notation abuse (the way the good folks in the calculus of variations and integral transforms always did), and say that the Eq. 2 itself is fully acceptable as a definition of \vec{T}. IMO, despite the appeal to “principles”, it still is an abuse of notation. However, I can see that the argument does have at least some point about it.

But the real trouble with using Eq. 1 (reproduced below)

q = \vec{J} \cdot \vec{S} (Eq. 1)

as a definition for \vec{J} is that no suitable inverse operator exists when it comes to the dot operator.


3.

Let’s try another way to attempt defining the flux vector, and see what it leads to. This approach goes via the following equation:

\vec{J} \equiv \dfrac{q}{|\vec{S}|} \hat{n} (Eq. 4)

where \hat{n} is the unit normal to the surface \vec{S}, defined thus:

\hat{n} \equiv \dfrac{\vec{S}}{|\vec{S}|} (Eq. 5)

Then, as the crucial next step, we introduce one more equation for q, one that is independent of \vec{J}. For phenomena involving fluid flows, this extra equation is quite simple to find:

q = \phi \rho \dfrac{\Omega_{\text{traced}}}{\Delta t} (Eq. 6)

where \phi is the mass-density of \Phi (the scalar field whose flux we want to define), \rho is the volume-density of mass itself, and \Omega_{\text{traced}} is the volume that is imaginarily traced by that specific portion of fluid which has imaginarily flowed across the surface \vec{S} in an arbitrary but small interval of time \Delta t. Notice that \Phi is the extensive scalar property being transported via the fluid flow across the given surface, whereas \phi is the corresponding intensive quantity.

Now express \Omega_{\text{traced}} in terms of the imagined maximum normal distance from the plane \vec{S} up to which the forward moving front is found extended after \Delta t. Thus,

\Omega_{\text{traced}} = \xi |\vec{S}| (Eq. 7)

where \xi is the traced distance (measured in a direction normal to \vec{S}). Now, using the geometric property for the area of parallelograms, we have that:

\xi = \delta \cos\theta (Eq. 8)

where \delta is the traced distance in the direction of the flow, and \theta is the angle between the unit normal to the plane \hat{n} and the flow velocity vector \vec{U}. Using vector notation, Eq. 8 can be expressed as:

\xi = \vec{\delta} \cdot \hat{n} (Eq. 9)

Now, by definition of \vec{U}:

\vec{\delta} = \vec{U} \Delta t, (Eq. 10)

Substituting Eq. 10 into Eq. 9, we get:

\xi = \vec{U} \Delta t \cdot \hat{n} (Eq. 11)

Substituting Eq. 11 into Eq. 7, we get:

\Omega_{\text{traced}} = \vec{U} \Delta t \cdot \hat{n} |\vec{S}| (Eq. 12)

Substituting Eq. 12 into Eq. 6, we get:

q = \phi \rho \dfrac{\vec{U} \Delta t \cdot \hat{n} |\vec{S}|}{\Delta t} (Eq. 13)

Cancelling out the \Delta t, Eq. 13 becomes:

q = \phi \rho \vec{U} \cdot \hat{n} |\vec{S}| (Eq. 14)

Having got an expression for q that is independent of \vec{J}, we can now use it in order to define \vec{J}. Thus, substituting Eq. 14 into Eq. 4:

\vec{J} \equiv \dfrac{q}{|\vec{S}|} \hat{n} = \dfrac{\phi \rho \vec{U} \cdot \hat{n} |\vec{S}|}{|\vec{S}|} \hat{n} (Eq. 16)

Cancelling out the two |\vec{S}|s (because it’s a scalar—you can always divide any term by a scalar (or even  by a complex number) but not by a vector), we finally get:

\vec{J} \equiv \phi \rho \vec{U} \cdot \hat{n} \hat{n} (Eq. 17)


4. Comments on Eq. 17

In Eq. 17, there is this curious sequence: \hat{n} \hat{n}.

It’s a sequence of two vectors, but the vectors apparently are not connected by any of the operators that are taught in the Engineering Maths courses on vector algebra and calculus—there is neither the dot (\cdot) operator nor the cross \times operator appearing in between the two \hat{n}s.

But, for the time being, let’s not get too much perturbed by the weird-looking sequence. For the time being, you can mentally insert parentheses like these:

\vec{J} \equiv \left[ \left( \phi \rho \vec{U} \right) \cdot \left( \hat{n} \right) \right] \hat{n} (Eq. 18)

and see that each of the two terms within the parentheses is a vector, and that these two vectors are connected by a dot operator so that the terms within the square brackets all evaluate to a scalar. According to Eq. 18, the scalar magnitude of the flux vector is:

|\vec{J}| = \left( \phi \rho \vec{U}\right) \cdot \left( \hat{n} \right) (Eq. 19)

and its direction is given by: \hat{n} (the second one, i.e., the one which appears in Eq. 18 but not in Eq. 19).


5.

We explained away our difficulty about Eq. 17 by inserting parentheses at suitable places. But this procedure of inserting mere parentheses looks, by itself, conceptually very attractive, doesn’t it?

If by not changing any of the quantities or the order in which they appear, and if by just inserting parentheses, an equation somehow begins to make perfect sense (i.e., if it seems to acquire a good physical meaning), then we have to wonder:

Since it is possible to insert parentheses in Eq. 17 in some other way, in some other places—to group the quantities in some other way—what physical meaning would such an alternative grouping have?

That’s a delectable possibility, potentially opening new vistas of physico-mathematical reasonings for us. So, let’s pursue it a bit.

What if the parentheses were to be inserted the following way?:

\vec{J} \equiv \left( \hat{n} \hat{n} \right) \cdot \left( \phi \rho \vec{U} \right) (Eq. 20)

On the right hand-side, the terms in the second set of parentheses evaluate to a vector, as usual. However, the terms in the first set of parentheses are special.

The fact of the matter is, there is an implicit operator connecting the two vectors, and if it is made explicit, Eq. 20 would rather be written as:

\vec{J} \equiv \left( \hat{n} \otimes \hat{n} \right) \cdot \left( \phi \rho \vec{U} \right) (Eq. 21)

The \otimes operator, as it so happens, is a binary operator that operates on two vectors (which in general need not necessarily be one and the same vector as is the case here, and whose order with respect to the operator does matter). It produces a new mathematical object called the tensor.

The general form of Eq. 21 is like the following:

\vec{V} = \vec{\vec{T}} \cdot \vec{U} (Eq. 22)

where we have put two arrows on the top of the tensor, to bring out the idea that it has something to do with two vectors (in a certain order). Eq. 22 may be read as the following: Begin with an input vector \vec{U}. When it is multiplied by the tensor \vec{\vec{T}}, we get another vector, the output vector: \vec{V}. The tensor quantity \vec{\vec{T}} is thus a mapping between an arbitrary input vector and its uniquely corresponding output vector. It also may be thought of as a unary operator which accepts a vector on its right hand-side as an input, and transforms it into the corresponding output vector.


6. “Where am I?…”

Now is the time to take a pause and ponder about a few things. Let me begin doing that, by raising a few questions for you:

Q. 6.1:

What kind of a bargain have we ended up with? We wanted to show how the flux of a scalar field \Phi must be a vector. However, in the process, we seem to have adopted an approach which says that the only way the flux—a vector—can at all be defined is in reference to a tensor—a more advanced concept.

Instead of simplifying things, we seem to have ended up complicating the matters. … Have we? really? …Can we keep the physical essentials of the approach all the same and yet, in our definition of the flux vector, don’t have to make a reference to the tensor concept? exactly how?

(Hint: Look at the above development very carefully once again!)

Q. 6.2:

In Eq. 20, we put the parentheses in this way:

\vec{J} \equiv \left( \hat{n} \hat{n} \right) \cdot \left( \phi \rho \vec{U} \right) (Eq. 20, reproduced)

What would happen if we were to group the same quantities, but alter the order of the operands for the dot operator?  After all, the dot product is commutative, right? So, we could have easily written Eq. 20 rather as:

\vec{J} \equiv \left( \phi \rho \vec{U} \right) \cdot \left( \hat{n} \hat{n} \right) (Eq. 21)

What could be the reason why in writing Eq. 20, we might have made the choice we did?

Q. 6.3:

We wanted to define the flux vector for all fluid-mechanical flow phenomena. But in Eq. 21, reproduced below, what we ended up having was the following:

\vec{J} \equiv \left( \phi \rho \vec{U} \right) \cdot \left( \hat{n} \otimes \hat{n} \right) (Eq. 21, reproduced)

Now, from our knowledge of fluid dynamics, we know that Eq. 21 seemingly stands only for one kind of a flux, namely, the convective flux. But what about the diffusive flux? (To know the difference between the two, consult any good book/course-notes on CFD using FVM, e.g. Jayathi Murthy’s notes at Purdue, or Versteeg and Malasekara’s text.)

Q. 6.4:

Try to pursue this line of thought a bit:

Start with Eq. 1 again:

q = \vec{J} \cdot \vec{S} (Eq. 1, reproduced)

Express \vec{S} as a product of its magnitude and direction:

q = \vec{J} \cdot |\vec{S}| \hat{n} (Eq. 23)

Divide both sides of Eq. 23 by |\vec{S}|:

\dfrac{q}{|\vec{S}|} = \vec{J} \cdot \hat{n} (Eq. 24)

“Multiply” both sides of Eq. 24 by \hat{n}:

\dfrac{q} {|\vec{S}|} \hat{n} = \vec{J} \cdot \hat{n} \hat{n} (Eq. 25)

We seem to have ended up with a tensor once again! (and more rapidly than in the development in section 4. above).

Now, looking at what kind of a change the left hand-side of Eq. 24 undergoes when we “multiply” it by a vector (which is: \hat{n}), can you guess something about what the “multiplication” on the right hand-side by \hat{n} might mean? Here is a hint:

To multiply a scalar by a vector is meaningless, really speaking. First, you need to have a vector space, and then, you are allowed to take any arbitrary vector from that space, and scale it up (without changing its direction) by multiplying it with a number that acts as a scalar. The result at least looks the same as “multiplying” a scalar by a vector.

What then might be happening on the right hand side?

Q.6.5:

Recall your knowledge (i) that vectors can be expressed as single-column or single-row matrices, and (ii) how matrices can be algebraically manipulated, esp. the rules for their multiplications.

Try to put the above developments using an explicit matrix notation.

In particular, pay particular attention to the matrix-algebraic notation for the dot product between a row- or column-vector and a square matrix, and the effect it has on your answer to question Q.6.2. above. [Hint: Try to use the transpose operator if you reach what looks like a dead-end.]

Q.6.6.

Suppose I introduce the following definitions: All single-column matrices are “primary” vectors (whatever the hell it may mean), and all single-row matrices are “dual” vectors (once again, whatever the hell it may mean).

Given these definitions, you can see that any primary vector can be turned into its corresponding dual vector simply by applying the transpose operator to it. Taking the logic to full generality, the entirety of a given primary vector-space can then be transformed into a certain corresponding vector space, called the dual space.

Now, using these definitions, and in reference to the definition of the flux vector via a tensor (Eq. 21), but with the equation now re-cast into the language of matrices, try to identify the physical meaning the concept of “dual” space. [If you fail to, I will sure provide a hint.]

As a part of this exercise, you will also be able to figure out which of the two \hat{n}s forms the “primary” vector space and which \hat{n} forms the dual space, if the tensor product \hat{n}\otimes\hat{n} itself appears (i) before the dot operator or (ii) after the dot operator, in the definition of the flux vector. Knowing the physical meaning for the concept of the dual space of a given vector space, you can then see what the physical meaning of the tensor product of the unit normal vectors (\hat{n}s) is, here.

Over to you. [And also to the UGC/AICTE-Approved Full Professors of Mechanical Engineering in SPPU and in other similar Indian universities. [Indians!!]]

A Song I Like:

[TBD, after I make sure all LaTeX entries have come out right, which may very well be tomorrow or the day after…]

Advertisements

See, how hard I am trying to become an Approved (Full) Professor of Mechanical Engineering in SPPU?—3

I was looking for a certain book on heat transfer which I had (as usual) misplaced somewhere, and while searching for that book at home, I accidentally ran into another book I had—the one on Classical Mechanics by Rana and Joag [^].

After dusting this book a bit, I spent some time in one typical way, viz. by going over some fond memories associated with a suddenly re-found book…. The memories of how enthusiastic I once was when I had bought that book; how I had decided to finish that book right within weeks of buying it several years ago; the number of times I might have picked it up, and soon later on, kept it back aside somewhere, etc.  …

Yes, that’s right. I have not yet managed to finish this book. Why, I have not even managed to begin reading this book the way it should be read—with a paper and pencil at hand to work through the equations and the problems. That was the reason why, I now felt a bit guilty. … It just so happened that it was just the other day (or so) when I was happily mentioning the Poisson brackets on Prof. Scott Aaronson’s blog, at this thread [^]. … To remove (at least some part of) my sense of guilt, I then decided to browse at least through this part (viz., Poisson’s brackets) in this book. … Then, reading a little through this chapter, I decided to browse through the preceding chapters from the Lagrangian mechanics on which it depends, and then, in general, also on the calculus of variations.

It was at this point that I suddenly happened to remember the reason why I had never been able to finish (even the portions relevant to engineering from) this book.

The thing was, the explanation of the \delta—the delta of the variational calculus.

The explanation of what the \delta basically means, I had found right back then (many, many years ago), was not satisfactorily given in this book. The book did talk of all those things like the holonomic constraints vs. the nonholonomic constraints, the functionals, integration by parts, etc. etc. etc. But without ever really telling me, in a forth-right and explicit manner, what the hell this \delta was basically supposed to mean! How this \delta y was different from the finite changes (\Delta y) and the infinitesimal changes (\text{d}y) of the usual calculus, for instance. In terms of its physical meaning, that is. (Hell, this book was supposed to be on physics, wasn’t it?)

Here, I of course fully realize that describing Rana and Joag’s book as “unsatisfactory” is making a rather bold statement, a very courageous one, in fact. This book is extraordinarily well-written. And yet, there I was, many, many years ago, trying to understand the delta, and not getting anywhere, not even with this book in my hand. (OK, a confession. The current copy which I have is not all that old. My old copy is gone by now (i.e., permanently misplaced or so), and so, the current copy is the one which I had bought once again, in 2009. As to my old copy, I think, I had bought it sometime in the mid-1990s.)

It was many years later, guess some time while teaching FEM to the undergraduates in Mumbai, that the concept had finally become clear enough to me. Most especially, while I was going through P. Seshu’s and J. N. Reddy’s books. [Reflected Glory Alert! Professor P. Seshu was my class-mate for a few courses at IIT Madras!] However, even then, even at that time, I remember, I still had this odd feeling that the physical meaning was still not clear to me—not as as clear as it should be. The matter eventually became “fully” clear to me only later on, while musing about the differences between the perspective of Thermodynamics on the one hand and that of Heat Transfer on the other. That was some time last year, while teaching Thermodynamics to the PG students here in Pune.

Thermodynamics deals with systems at equilibria, primarily. Yes, its methods can be extended to handle also the non-equilibrium situations. However, even then, the basis of the approach summarily lies only in the equilibrium states. Heat Transfer, on the other hand, necessarily deals with the non-equilibrium situations. Remove the temperature gradient, and there is no more heat left to speak of. There does remain the thermal energy (as a form of the internal energy), but not heat. (Remember, heat is the thermal energy in transit that appears on a system boundary.) Heat transfer necessarily requires an absence of thermal equilibrium. … Anyway, it was while teaching thermodynamics last year, and only incidentally pondering about its differences from heat transfer, that the idea of the variations (of Cov) had finally become (conceptually) clear to me. (No, CoV does not necessarily deal only with the equilibrium states; it’s just that it was while thinking about the equilibrium vs. the transient that the matter about CoV had suddenly “clicked” to me.)

In this post, let me now note down something on the concept of the variation, i.e., towards understanding the physical meaning of the symbol \delta.

Please note, I have made an inline update on 26th December 2016. It makes the presentation of the calculus of variations a bit less dumbed down. The updated portion is clearly marked as such, in the text.


The Problem Description:

The concept of variations is abstract. We would be better off considering a simple, concrete, physical situation first, and only then try to understand the meaning of this abstract concept.

Accordingly, consider a certain idealized system. See its schematic diagram below:

mechanicalengineering_1d_cov

 

 

 

 

There is a long, rigid cylinder made from some transparent material like glass. The left hand-side end of the cylinder is hermetically sealed with a rigid seal. At the other end of the cylinder, there is a friction-less piston which can be driven by some external means.

Further, there also are a couple of thin, circular, piston-like disks (D_1 and D_2) placed inside the cylinder, at some x_1 and x_2 positions along its length. These disks thus divide the cylindrical cavity into three distinct compartments. The disks are assumed to be impermeable, and fitting snugly, they in general permit no movement of gas across their plane. However, they also are assumed to be able to move without any friction.

Initially, all the three compartments are filled with a compressible fluid to the same pressure in each compartment, say 1 atm. Since all the three compartments are at the same pressure, the disks stay stationary.

Then, suppose that the piston on the extreme right end is moved, say from position P_1 to P_2. The final position P_2 may be to the left or to the right of the initial position P_1; it doesn’t matter. For the current description, however, let’s suppose that the position P_2 is to the left of P_1. The effect of the piston movement thus is to increase the pressure inside the system.

The problem is to determine the nature of the resulting displacements that the two disks undergo as measured from their respective initial positions.

There are essentially two entirely different paradigms for conducting an analysis of this problem.


The “Vector Mechanics” Paradigm:

The first paradigm is based on an approach that was put to use so successfully by Newton. Usually, it is called the paradigm of vector analysis.

In this paradigm, we focus on the fact that the forced displacement of the piston with time, x(t), may be described using some function of time that is defined over the interval lying between two instants t_i and t_f.

For example, suppose the function is:
x(t) = x_0 + v t,
where v is a constant. In other words, the motion of the piston is steady, with a constant velocity, between the initial and final instants. Since the velocity is constant, there is no acceleration over the open interval (t_i, t_f).

However, notice that before the instant t_i, the piston velocity was zero. Then, the velocity suddenly became a finite (constant) value. Therefore, if you extend the interval to include the end-instants as well, i.e., if you consider the semi-closed interval [t_i, t_f), then there is an acceleration at the instant t_i. Similarly, since the piston comes to a position of rest at t = t_f, there also is another acceleration, equal in magnitude and opposite in direction, which appears at the instant t_f.

The existence of these two instantaneous accelerations implies that jerks or pressure waves are sent through the system. We may model them as vector quantities, as impulses. [Side Exercise: Work out what happens if we consider only the open interval (t_i, t_f).]

We can now apply Newton’s 3 laws, based on the idea that shock-waves must have begun at the piston at the instant t = t_i. They must have got transmitted through the gas kept under pressure, and they must have affected the disk D_1 lying closest to the piston, thereby setting this disk into motion. This motion must have passed through the gas in the middle compartment of the system as another pulse in the pressure (generated at the disk D_1), thereby setting also the disk D_2 in a state of motion a little while later. Finally, the pulse must have got bounced off the seal on the left hand side, and in turn, come back to affect the motion of the disk D_2, and then of the disk D_1. Continuing their travels to and fro, the pulses, and hence the disks, would thus be put in a back and forth motion.

After a while, these transients would move forth and back, superpose, and some of their constituent frequencies would get cancelled out, leaving only those frequencies operative such that the three compartments are put under some kind of stationary states.

In case the gas is not ideal, there would be damping anyway, and after a sufficiently long while, the disks would move through such small displacements that we could easily ignore the ever-decreasing displacements in a limiting argument.

Thus, assume that, after an elapse of a sufficiently long time, the disks become stationary. Of course, their new positions are not the same as their original positions.

The problem thus can be modeled as basically a transient one. The state of the new equilibrium state is thus primarily seen as an effect or an end-result of a couple of transient processes which occur in the forward and backward directions. The equilibrium is seen as not a primarily existing state, but as a result of two equal and opposite transient causes.

Notice that throughout this process, Newton’s laws can be applied directly. The nature of the analysis is such that the quantities in question—viz. the displacements of the disks—always are real, i.e., they correspond to what actually is supposed to exist in the reality out there.

The (values of) displacements are real in the sense that the mathematical analysis procedure itself involves only those (values of) displacements which can actually occur in reality. The analysis does not concern itself with some other displacements that might have been possible but don’t actually occur. The analysis begins with the forced displacement condition, translates it into pressure waves, which in turn are used in order to derive the predicted displacements in the gas in the system, at each instant. Thus, at any arbitrary instant of time t > t_i (in fact, the analysis here runs for times t \gg t_f), the analysis remains concerned only with those displacements that are actually taking place at that instant.

The Method of Calculus of Variations:

The second paradigm follows the energetics program. This program was initiated by Newton himself as well as by Leibnitz. However, it was pursued vigorously not by Newton but rather by Leibnitz, and then by a series of gifted mathematicians-physicists: the Bernoulli brothers, Euler, Lagrange, Hamilton, and others. This paradigm is essentially based on the calculus of variations. The idea here is something like the following.

We do not care for a local description at all. Thus, we do not analyze the situation in terms of the local pressure pulses, their momenta/forces, etc. All that we focus on are just two sets of quantities: the initial positions of the disks, and their final positions.

For instance, focus on the disk D_1. It initially is at the position x_{1_i}. It is found, after a long elapse of time (i.e., at the next equilibrium state), to have moved to x_{1_f}. The question is: how to relate this change in x_1 on the one hand, to the displacement that the piston itself undergoes from P_{x_i} to P_{x_f}.

To analyze this question, the energetics program (i.e., the calculus of variations) adopts a seemingly strange methodology.

It begins by saying that there is nothing unique to the specific value of the position x_{1_f} as assumed by the disk D_1. The disk could have come to a halt at any other (nearby) position, e.g., at some other point x_{1_1}, or x_{1_2}, or x_{1_3}, … etc. In fact, since there are an infinity of points lying in a finite segment of line, there could have been an infinity of positions where the disk could have come to a rest, when the new equilibrium was reached.

Of course, in reality, the disk D_1 comes to a halt at none of these other positions; it comes to a halt only at x_{1_f}.

Yet, the theory says, we need to be “all-inclusive,” in a way. We need not, just for the aforementioned reason, deny a place in our analysis to these other positions. The analysis must include all such possible positions—even if they be purely hypothetical, imaginary, or unreal. What we do in the analysis, this paradigm says, is to initially include these merely hypothetical, unrealistic positions too on exactly the same footing as that enjoyed by that one position which is realistic, which is given by x_{1_f}.

Thus, we take a set of all possible positions for each disk. Then, for each such a position, we calculate the “impact” it would make on the energy of the system taken as a whole.

The energy of the system can be additively decomposed into the energies carried by each of its sub-parts. Thus, focusing on disk D_1, for each one of its possible (hypothetical) final position, we should calculate the energies carried by both its adjacent compartments. Since a change in D_1‘s position does not affect the compartment 3, we need not include it. However, for the disk D_1, we do need to include the energies carried by both the compartments 1 and 2. Similarly, for each of the possible positions occupied by the disk D_2, it should include the energies of the compartments 2 and 3, but not of 1.

At this point, to bring simplicity (and thereby better) clarity to this entire procedure, let us further assume that the possible positions of each disk forms a finite set. For instance, each disk can occupy only one of the positions that is some -5, -4, -3, -2, -1, 0, +1, +2, +3, +4 or +5 distance-units away from its initial position. Thus, a disk is not allowed to come to a rest at, say, 2.3 units; it must do so either at 2 or at 3 units. (We will thus perform the initial analysis in terms of only the integer positions, and only later on extend it to any real-valued positions.) (If you are a mechanical engineering student, suggest a suitable mechanism that can ensure only integer relative displacements.)

The change in energy E of a compartment is given by
\Delta E = P A \Delta x,
where P is the pressure, A is the cross-sectional area of the cylinder, and \Delta x is the change in the length of the compartment.

Now, observe that the energy of the middle compartment depends on the relative distance between the two disks lying on its sides. Yet, for the same reason, the energy of the middle compartment does depend on both these positions. Hence, we must take a Cartesian product of the relative displacements undergone by both the disks, and only then calculate the system energy for each such a permutation (i.e. the ordered pair) of their positions. Let us go over the details of the Cartesian product.

The Cartesian product of the two positions may be stated as a row-by-row listing of ordered pairs of the relative positions of D_1 and D_2, e.g., as follows: the ordered pair (-5, +2) means that the disk D_1 is 5 units to the left of its initial position, and the disk D_2 is +2 units to the right of its initial position. Since each of the two positions forming an ordered pair can range over any of the above-mentioned 11 number of different values, there are, in all, 11 \times 11 = 121 number of such possible ordered pairs in the Cartesian product.

For each one of these 121 different pairs, we use the above-given formula to determine what the energy of each compartment is like. Then, we add the three energies (of the three compartments) together to get the value of the energy of the system as a whole.

In short, we get a set of 121 possible values for the energy of the system.

You must have noticed that we have admitted every possible permutation into analysis—all the 121 number of them.

Of course, out of all these 121 number of permutations of positions, it should turn out that 120 number of them have to be discarded because they would be merely hypothetical, i.e. unreal. That, in turn, is because, the relative positions of the disks contained in one and only one ordered pair would actually correspond to the final, equilibrium position. After all, if you conduct this experiment in reality, you would always get a very definite pair of the disk-positions, and it this same pair of relative positions that would be observed every time you conducted the experiment (for the same piston displacement). Real experiments are reproducible, and give rise to the same, unique result. (Even if the system were to be probabilistic, it would have to give rise to an exactly identical probability distribution function.) It can’t be this result today and that result tomorrow, or this result in this lab and that result in some other lab. That simply isn’t science.

Thus, out of all those 121 different ordered-pairs, one and only one ordered-pair would actually correspond to reality; the rest all would be merely hypothetical.

The question now is, which particular pair corresponds to reality, and which ones are unreal. How to tell the real from the unreal. That is the question.

Here, the variational principle says that the pair of relative positions that actually occurs in reality carries a certain definite, distinguishing attribute.

The system-energy calculated for this pair (of relative displacements) happens to carry the lowest magnitude from among all possible 121 number of pairs. In other words, any hypothetical or unreal pair has a higher amount of system energy associated with it. (If two pairs give rise to the same lowest value, both would be equally likely to occur. However, that is not what provably happens in the current example, so let us leave this kind of a “degeneracy” aside for the purposes of this post.)

(The update on 26 December 2016 begins here:)

Actually, the description  given in the immediately preceding paragraph was a bit too dumbed down. The variational principle is more subtle than that. Explaining it makes this post even longer, but let me give it a shot anyway, at least today.

To follow the actual idea of the variational principle (in a not dumbed-down manner), the procedure you have to follow is this.

First, make a table of all possible relative-position pairs, and their associated energies. The table has the following columns: a relative-position pair, the associated energy E as calculated above, and one more column which for the time being would be empty. The table may look something like what the following (partial) listing shows:

(0,0) -> say, 115 Joules
(-1,0) -> say, 101 Joules
(-2,0) -> say, 110 Joules

(2,2) -> say, 102 Joules
(2,3) -> say, 100 Joules
(2,4) -> say, 101 Joules
(2,5) -> say, 120 Joules

(5,0) -> say, 135 Joules

(5,5) -> say 117 Joules.

Having created this table (of 121 rows), you then pick each row one by and one, and for the picked up n-th row, you ask a question: What all other row(s) from this table have their relative distance pairs such that these pairs lie closest to the relative distance pair of this given row. Let me illustrate this question with a concrete example. Consider the row which has the relative-distance pair given as (2,3). Then, the relative distance pairs closest to this one would be obtained by adding or subtracting a distance of 1 to each in the pair. Thus, the relative distance pairs closest to this one would be: (3,3), (1,3), (2,4), and (2,2). So, you have to pick up those rows which have these four entries in the relative-distance pairs column. Each of these four pairs represents a variation \delta on the chosen state, viz. the state (2,3).

In symbolic terms, suppose for the n-th row being considered, the rows closest to it in terms of the differences in their relative distance pairs, are the a-th, b-th, c-th and d-th rows. (Notice that the rows which are closest to a given row in this sense, would not necessarily be found listed just above or below that given row, because the scheme followed while creating the list or the vector that is the table would not necessarily honor the closest-lying criterion (which necessarily involves two numbers)—not at least for all rows in the table.

OK. Then, in the next step, you find the differences in the energies of the n-th row from each of these closest rows, viz., the a-th, b-th, c-th and c-th rows. That is to say, you find the absolute magnitudes of the energy differences. Let us denote these magnitudes as: \delta E_{na} = |E_n - E_a|\delta E_{nb} = |E_n - E_b|\delta E_{nc} = |E_n - E_c| and \delta E_{nd} = |E_n - E_d|.  Suppose the minimum among these values is \delta E_{nc}. So, against the n-th row, in the last column of the table, you write the value \delta E_{nc}.

Having done this exercise separately for each row in the table, you then ask: Which row has the smallest entry in the last column (the one for \delta E), and you pick that up. That is the distinguished (or the physically occurring) state.

In other words, the variational principle asks you to select not the row with the lowest absolute value of energy, but that row which shows the smallest difference of energy from one of its closest neighbours—and these closest neighbours are to be selected according to the differences in each number appearing in the relative-distance pair, and not according to the vertical place of rows in the tabular listing. (It so turns out that in this example, the row thus selected following both criteria—lowest energy as well as lowest variation in energy—are identical, though it would not necessarily always be the case. In short, we can’t always get away with the first, too dumbed down, version.)

Thus, the variational principle is about that change in the relative positions for which the corresponding change in the energy vanishes (or has the minimum possible absolute magnitude, in case the positions form a discretely varying, finite set).

(The update on 26th December 2016 gets over here.)

And, it turns out that this approach, too, is indeed able to perfectly predict the final disk-positions—precisely as they actually are observed in reality.

If you allow a continuum of positions (instead of the discrete set of only the 11 number of different final positions for one disk, or 121 number of ordered pairs), then instead of taking a Cartesian product of positions, what you have to do is take into account a tensor product of the position functions. The maths involved is a little more advanced, but the underlying algebraic structure—and the predictive principle which is fundamentally involved in the procedure—remains essentially the same. This principle—the variational principle—says:

Among all possible variations in the system configurations, that system configuration corresponds to reality which has the least variation in energy associated with it.

(This is a very rough statement, but it will do for this post and for a general audience. In particular, we don’t look into the issues of what constitute the kinematically admissible constraints, why the configurations must satisfy the field boundary conditions, the idea of the stationarity vs. of a minimum or a maximum, i.e., the issue of convexity-vs.-concavity, etc. The purpose of this post—and our example here—are both simple enough that we need not get into the whole she-bang of the variational theory as such.)

Notice that in this second paradigm, (i) we did not restrict the analysis to only those quantities that are actually taking place in reality; we also included a host (possibly an infinity) of purely hypothetical combinations of quantities too; (ii) we worked with energy, a scalar quantity, rather than with momentum, a vector quantity; and finally, (iii) in the variational method, we didn’t bother about the local details. We took into account the displacements of the disks, but not any displacement at any other point, say in the gas. We did not look into presence or absence of a pulse at one point in the gas as contrasted from any other point in it. In short, we did not discuss the details local to the system either in space or in time. We did not follow the system evolution, at all—not at least in a detailed, local way. If we were to do that, we would be concerned about what happens in the system at the instants and at spatial points other than the initial and final disk positions. Instead, we looked only at a global property—viz. the energy—whether at the sub-system level of the individual compartments, or at the level of the overall system.


The Two Paradigms Contrasted from Each Other:

If we were to follow Newton’s method, it would be impossible—impossible in principle—to be able to predict the final disk positions unless all their motions over all the intermediate transient dynamics (occurring over each moment of time and at each place of the system) were not be traced. Newton’s (or vectorial) method would require us to follow all the details of the entire evolution of all parts of the system at each point on its evolution path. In the variational approach, the latter is not of any primary concern.

Yet, in following the energetics program, we are able to predict the final disk positions. We are able to do that without worrying about what all happened before the equilibrium gets established. We remain concerned only with certain global quantities (here, system-energy) at each of the hypothetical positions.

The upside of the energetics program, as just noted, is that we don’t have to look into every detail at every stage of the entire transient dynamics.

Its downside is that we are able to talk only of the differences between certain isolated (hypothetical) configurations or states. The formalism is unable to say anything at all about any of the intermediate states—even if these do actually occur in reality. This is a very, very important point to keep in mind.


The Question:

Now, the question with which we began this post. Namely, what does the delta of the variational calculus mean?

Referring to the above discussion, note that the delta of the variational calculus is, here, nothing but a change in the position-pair, and also the corresponding change in the energy.

Thus, in the above example, the difference of the state (2,3) from the other close states such as (3,3), (1,3), (2,4), and (2,2) represents a variation in the system configuration (or state), and for each such a variation in the system configuration (or state), there is a corresponding variation in the energy \delta E_{ni} of the system. That is what the delta refers to, in this example.

Now, with all this discussion and clarification, would it be possible for you to clearly state what the physical meaning of the delta is? To what precisely does the concept refer? How does the variation in energy \delta E differ from both the finite changes (\Delta E) as well as the infinitesimal changes (\text{d}E) of the usual calculus?


Note, the question is conceptual in nature. And, no, not a single one of the very best books on classical mechanics manages to give a very succinct and accurate answer to it. Not even Rana and Joag (or Goldstein, or Feynman, or…)

I will give my answer in my next post, next year. I will also try to apply it to a couple of more interesting (and somewhat more complicated) physical situations—one from engineering sciences, and another from quantum mechanics!

In the meanwhile, think about it—the delta—the concept itself, its (conceptual) meaning. (If you already know the calculus of variations, note that in my above write-up, I have already supplied the answer, in a way. You just have to think a bit about it, that’s all!)


An Important Note: Do bring this post to the notice of the Officially Approved Full Professors of Mechanical Engineering in SPPU, and the SPPU authorities. I would like to know if the former would be able to state the meaning—at least now that I have already given the necessary context in such great detail.

Ditto, to the Officially Approved Full Professors of Mechanical Engineering at COEP, esp. D. W. Pande, and others like them.

After all, this topic—Lagrangian mechanics—is at the core of Mechanical Engineering, even they would agree. In fact, it comes from a subject that is not taught to the metallurgical engineers, viz., the topic of Theory of Machines. But it is taught to the Mechanical Engineers. That’s why, they should be able to crack it, in no time.

(Let me continue to be honest. I do not expect them to be able to crack it. But I do wish to know if they are able at least to give a try that is good enough!)


Even though I am jobless (and also nearly bank balance-less, and also cashless), what the hell! …

…Season’s greetings and best wishes for a happy new year!


A Song I Like:

[With jobless-ness and all, my mood isn’t likely to stay this upbeat, but anyway, while it lasts, listen to this song… And, yes, this song is like, it’s like, slightly more than 60 years old!]

(Hindi) “yeh raat bhigee bhigee”
Music: Shankar-Jaikishan
Singers: Manna De and Lata Mangeshkar
Lyrics: Shailendra


[E&OE]

My planning for the upcoming summer vacation

0. Yes, I have deleted my previous post. As I took a second look at it, I thought it was a bit too on-the-fly, and perhaps not worth keeping. (It was about these Lok Sabha elections!) Though I have deleted it, if the need be, I will write a better post touching on the same topic, including my further thoughts about the matter.

For the time being, let me get back to engineering.

* * *

As this academic term nears its end, I have already begun planning for things to do this summer vacation. A few things are on the top of my mind. Let me jot down these, so that I could look back a couple of months hence and see how I did on those matters (or, how the matters turned out anyway).

1. Journal papers on my past research: I need to convert at least one or two of my conference papers into journal papers. This is really on the top of the list because I haven’t had a journal publication during my Ph.D. The reason for that, in turn, wasn’t that my research wasn’t worth publishing in journals. In fact, not to immediately publish in journals was a deliberate choice, which was decided after discussion with my guide, the late Prof. S. R. Kajale.

The reason was twofold: (i) Journal papers tend to undergo a more thorough peer-review, and even if not, in any case, are longer. Since I am naturally so talkative (in a way almost carefree), I was afraid whether I might not end up giving out too many details if it is a journal paper, and at that time (mid-naughties) as now, IPR was (and is) an important consideration. (ii) I didn’t have very good library (eJournals) access back then. I was jobless, would take trips to IIT Bombay for literature review, and both money and the time to go through eJournals was very severely limited (a few hours on one or two days at the most, at a time).

The situation has changed since. I now do have a job in hand, and in fact, I now work in Mumbai. So, more frequent trips to the IIT Bombay library for a longer period of literature review is an easier possibility.

Anyway, the above two reasons are not independent; they are inter-related. As it turns out, I learnt after publishing my conference papers, that an approach very close to what I had taken, had already been developed to much more extent than I was aware of, back then. The method in question is: LBM (the lattice Boltzmann method.) LBM, as some of you might know, has since my PhD times been commercialized, with at least two commercial software packages and at least one Open Source + consulting model software having come on the scene. (And, thus, it turns out that the prudence in withholding details was right—there was commercial value to those ideas, even if it turns out that I was not the first to think of them. (Of course, since I honestly can say that I developed my approach fully independently, there happen to be a few (relatively minor) ideas which I had, and which still haven’t been published.))

Another thing. I have derived greater confidence about the new observation that I had made regarding the diffusion equation. This could come about only after a better literature search.

All in all, I think I am ready to write my journal paper on the diffusion equation now.

2. Journal papers on some more recent ideas: Since my PhD (2009), I also had a few extended abstracts accepted at international conferences (some 4 papers in 3 different conferences), but for some reason or the other, I had to withdraw. (Lack of time, or lack of money to complete the experimental part.) I could begin directly writing journal papers on these ideas now.

3. Short-term vacation courses: I am also proposing to conduct a couple of short-term courses on FEM and CFD.

3.1 On FEM: By now, I have taught introductory courses on FEM 4 times: twice to UG, once to PG, and once to practising engineers. I have enjoyed teaching my latest offering this semester. Since the syllabus at the University of Mumbai was different, there was an opportunity for me to look at FEM from a different perspective than what I had taken. I think I could now synthesize my understanding in a (really) improved (if not “new”) short-term course.

So, I am planning to offer a short-term course of about 7–10 days duration. The audience could be any graduate engineer: (i) PG students, (ii) working engineers, (iii) junior faculty from engineering colleges.

3.2 A novel course on CFD: Another course which I have never taught but which I am deeply interested in, is, of course, CFD. So, I am planning to offer a special vacation-time and short-term course on that topic, too.

Ideally, I would like to keep this course more for those who are interested in deeper insights, via self-study. If there are enough people interested in such a course, then I would rather like to keep the number of topics few, and the focus more on the fundamentals.

Of course, fewer topics doesn’t mean less material. Indeed, in many ways, my planned CFD short-term course would have much more material than a traditional one.

I would be ready cover all three methods side by side: FDM, FVM, and FEM—provided the audience already knows FEM in the context of the usual linear structural (or self-adjoint) kind of problems.

Similarly, in my course, I would like to include at least conceptual introductions to what are considered to be “advanced” topics like moving boundary problems, multiphase (VOF) problems, etc.

Thus, my planned CFD course wouldn’t be tied to (or, actually, be subservient to the needs of) only the aerodynamics problems of the aerospace department. It could easily apply to issues like free-surface flows and cavity-filling issues (if not also droplet formation/interaction—which could perhaps be covered, though I am not sure. (It would have been easier to cover if LBM were to be a part of the course offering, but I guess for an introductory/first course that is also short-term, introducing all the main continuum-based methods of FDM, FVM and FEM is a challenge by itself. No need to complicate it further by also introducing a particles-based approaches like LBM/SPH.)

4. More about the above short-term vacation courses:

4.1 My current view is that for a one week course, 4 hours of class-room teaching in the morning and 1–2 hours of hands on sessions in the afternoon for 3–4 days, will be enough.

4.2. The fees will be reasonable, by today’s market standards (though not just a few hundred rupees, if that’s what I understand by the word “reasonable.”). Since I do have a professor’s job, I am not looking at these courses as my primary career. The fees mainly have to cover the course organization expenses, most of which are beyond my control. On my part, an honorarium sort of payment also would be OK by me—strictly because, to repeat, I do have a continuing job that does pay me now.  That’s why. And, the course-fees do stand to drop if the audience is bigger, though I plan not to take more than 25–30 students per course.

4.3. So there. Drop me a line if you are from Mumbai and are interested in attending one of these courses this summer vacation.

Yet, some final clarifications still are due:

4.4 The courses will not follow the syllabus of any university. Drop me a line or follow this blog if you wish to know the details of the course contents. But, essentially, these are not your usual vacation-time coaching classes.

(There! Right there I kill my entire potential market of student-customers.)

4.5 No software package will at all be covered. If you wish to learn, say, ANSYS, or Fluent, there are numerous vendors out there. For OpenFOAM, there is a group in IIT Bombay, and a company in Pune. Contact them directly. (And no, I don’t even know who are better, or just more reputable, among them. (As far as I am concerned neither ANSYS nor Fluent nor OpenFOAM nor ESI gave me a job even if I was competent, when I was most desparate. Now, I couldn’t care less for them bastards. (And, in a class-room, I usually am far more cultured and civilized than expressions of that sort.))) In my course, I may use some programs written by me in C++ or Python or so. (No, Java continues to be a “no” as far as I am concerned!) But no training on software packages as such.

(There! Right there I kill my entire potential market of working engineers looking for in-house company trainings!)

Alright. More, later. [Of course, as in the recent past, my blogging will continue to remain rather infrequent. But what I mean to say here is that once the ideas of the short-term courses take a more concrete form, I will sure write another blog post to give you those details.]

* * * * *   * * * * *   * * * * *

A Song I Like:
(Hindi) “chandaa ki kiranon se liptee hawaayen”
Singer: Kishore Kumar
Music: Chitragupt
Lyrics: Verma Malik

[E&OE]

 

MWR for the first- and third-order differential equations

I am teaching an introductory course on FEA this semester. Teaching always involves learning—at least on the teacher’s side.

No, there was no typo in there. I did mean what I just said. It’s based on my own personal observation. Teaching actually involves (real) learning on the part of the teacher—and hopefully, if he is effective enough in his teaching (and if the student, too, is attentive and hard-working enough), then, also on the part of the student.

When you teach a course, in thinking about how to simplify the ideas involved, how to present them better, you have to mentally go over the topics again and again; you have to think and re-think about the material; you have to see if rearranging the ideas and the concepts involved or seeing them in a different light might make it any easier to “get” it or even just to retain it, and so on. … The end result is that you often actually end up deriving at least new mnemonics if not establishing new connections about the topics. In any case, you derive better conceptual integrations or strengthen them better. You end up mastering the material better than at the beginning of the course. … Or at least that’s what happens to me. I always end up learning at least a bit more about what I am teaching.

And, sometimes, the teacher even ends up deriving completely new ideas this way. At least, it seems, I just did—about the nature of FEA and computational mechanics in general. The idea is new, at least to me. But anyway, talking about this new idea is for some other day. … I have to first rigorously think about it. The idea, as of today, is just at that nascent phase (it struck me right this evening). I plan to put it to the paper soon, work out its details, refine the idea, and put it in a more rigorous form, etc. That will take time. And then, second, I have to also check whether someone has already published something of that kind or not. … As someone—was it Mark Twain?—said, the best of my ideas were stolen by the ancients… So, that part—checking the literature—too, will take quite some time. My own anticipation is that someone must have written something about it. In any case, it’s not all that big an idea. Just a simple something.

But, anyway, in the meanwhile, for this blog post, let me note down something different. An item, not of my knowledge, and not one of even potentially new knowledge, but of my ignorance, which got highlighted recently, during my lecture preparations.

I realized that if one of my students poses a question about it, I don’t know the reason why MWR (the method of weighted residuals) isn’t effective, or at least isn’t often used, and may be even cannot be relied on, for the first- and the third-order differential equations.  (See, see, see, I don’t even know whether it’s a “cannot” or an “isn’t”!) I don’t know the answer to that question.

Of course, as it so happens, most differential equations of engineering importance are only of the second and the fourth order. Whether linear or non-linear, they simply aren’t of the third-order. I haven’t myself seen a single third-order differential equation in any of the course-work I have ever done so far. Sure, I have seen such equations, but only in a mathematical handbook on the differential equations—never in a text-book or a monograph on engineering sciences as such. And, even if of the first-order, in physics and engineering, they often come as coupled equations, and thus, (almost nonchalantly, right in front of your eyes) jump into the usual class of the second-order differential equations—e.g. the partial differential wave equation.

Anyway, coming back to this MWR-related issue, I checked up the text-books by Reddy and Finlayson, but didn’t find the reason mentioned. I hope that someone knows the answer—someone would. So, I am going to raise this issue at iMechanica, right today.

That’s about all for this blog post, folks. Once I post my question at iMechanica, may be I will come back and add a link to it from here, but that’s about it. More, some other time.

[And, yes, I promise to blog about the new idea once I am done working it out and checking about it a bit. It just struck me just today, and it still is purely in the conceptual terms. The idea itself is such that it can (very) easily be translated into proper mathematical terms, but the point is: that’s something I haven’t done yet. Let me do that over, say the next few weeks/months, and then, sure, I will come back and blog about it a bit. I mean, I will sure blog about it way, way before sending any paper to any journal or so. That’s a promise. So, bye for now…]

* * * * *  * * * * *  * * * * *
I Song I Like:
(Marathi) “yaa bakuLichyaa zaaDaakhaali…”
Singer: Sushma Shreshtha
Lyrics: Vasant Bapat
Music: Bhanukant Luktuke

[E&OE]