In this post, let me jot down an idea that is suitable for a final year project by the CS/IT undergraduates (or, in a suitably expanded form, also by the CS/IT post-graduates) in engineering schools. Or, students from other similar courses like the MCA, MSc (CS) etc. The idea is (to me) neat and useful, and not at all difficult to implement. Do consider it in place of those usual topics like railway reservation/airline booking system, payroll system, etc.
The idea is to create an online searchable database of the history of science topics, that gives out customized timelines as query outputs or reports over the ‘net.
The idea occurred to me while doing literature search for my book. (BTW, idiots in those humanities departments degrade the word “research,” by using it for such things as plain literature search. The management and marketing folks do worse: they call activities like the Gallup polls, “research.”)
So long as a time-line involves only tens of items, it is easy to manage its contents manually. As an example of a small time-line, see my previous blog post. (In fact, I got the items in that time-line in an almost fully ready-made form from Manjit Kumar’s book: “Quantum.”)
However, while preparing for the pre-requisites part of my planned QM book, I found that no suitable time-line existed in a ready-made form for those classical physics topics.
One reason is that the pre-requisites of QM are so very many and so very diverse. For a neat visualization of physics as it existed some 70 years ago, see the “1939 map of physics”[^].
Another reason is that the sources of the detailed historical accounts are, again, so very numerous, and not all of them are equally accurate, authentic, or detailed.
While accuracy and authenticity are important, as far as the design work for this particular project goes, I think that the issue of the non-uniformity of the available details is of much greater importance. The original sources of information themselves are not equally detailed. For instance, see the Wiki timeline on the luminiferous aether [^], and ask yourself: would any one of the Encyclopedia Britannica’s timelines concerning the aether ever be as detailed as this one?
Of course, the administrators of the database may not rely only on the existing timelines. They may consult books to feed the data to create more detailed timelines. Once again, issues like the unevenness of details and the matter of ascribing proper dates at a detailed level, come up.
For instance, consider the Fourier analysis. Most books mention 1822 as the date of its beginning. It’s wrong.
We don’t have any specific record to answer the question of the precise time when Fourier first thought of this idea of spectral decomposition. However, we do have some evidence which indicates that he began working on the problem of heat propagation in solid bodies as early as 1804, and that he had presented a paper on this topic to the Paris Institute as early as on 21st December 1807. Though Fourier wanted to publish the paper in print, he didn’t (or perhaps couldn’t) do so, because it ran into some criticism. (Off hand, I would suppose, from Poisson.) Fourier had only the sines, not also the cosines, in that first 1807 paper of his. Though important, relatively speaking, it’s a minor omission. What is more important is the fact that he had already had the big idea right, viz. that any arbitrary function could be expanded as (possibly an infinite) series. Now the funny thing is this. While Fourier didn’t know the necessity to include the cosines, neither did his critics. They were rather against the completeness aspect of his idea, viz., that any arbitrary function could be expanded in this way, even the discontinuous functions. On that count, he was on the right track. However, it was hard to convince the established critics. One way or the other, the paper didn’t get published that same year. He sent a revised draft to a competition of the French Academy in 1811, and won it. The critics, however, still were not satisfied. Fourier ultimately wrote and published the further revised account some 15 years later, in 1822.
Most books will tell you that the Fourier analysis began in 1822. However, unless you take into account the 1804–1807 genesis of this branch of analysis, the overall historical progression does not appear as smooth as it should be.
The question then becomes, in a time line, what date would you include for reporting? 1804? 1807? 1811? 1822? All the four?
Some cases are even more intriguing, and possibly of more than trivial interest to a researcher. For instance, consider Schrodinger’s formulation of quntum mechanics. His very first paper (written in January 1926) is concerned about the famous non-relativistic wave equation, now bearing his name. Yet, even before that very first paper was written, in fact barely just a month before, in December 1925, Schrodinger had initially worked on a relativistic wave equation. It doesn’t work fully satisfactorily, and so Schrodinger worked out the non-relativistic version before sending it out for publication. But wouldn’t the fact that Schrodinger had those relativistic ideas right in the last month of 1925, be important while tracing the development of the relativistic QM? After all, physicists in that era were unusually open in their communications and collaborations. For example, Pauli had begun working on his paper merely on the basis of a draft of Heisenberg’s paper. (I mean the manual draft as well as detailed physics notes Heisenberg exchanged with him—not even the printer’s proofs let alone what today we would call the preprint.) Similarly, Pauli’s (1924?) idea regarding a two-valued quantum number was known to Goudsmit and Uhlenbeck when they proposed the electronic spin (October, 1925).
Should such details be entered into the database? Should there be an attribute specifying the granularity of information that is being carried by a database entry? Should the queries allow various levels of granularities to be included in the search results?
And, going back to Fourier’s case, suppose that your database initially includes an entry for only one of the three dates (1804—1807, 1811 and 1822). Someone points out the other dates to you (i.e. to the system administrators populating the database) only later on. How would you allow for the conciliations of data items like these, in a mechanical sort of a way? What software mechanism would you provide?
Would you handle it via some attributes of the individual entries in the database? Would these attributes be finalized beforehand, say a pre-determined set like: “Initial Idea/Presentation/Publication/Revision” etc? Or would it be possible and desirable to keep the set of such attributes extensible—say, on lines similar to how you apply attributes or tags to your blog posts, complete with an intellisense feature that shows the already applied attributes? Can the overall structure thus be kept a bit fluid? Is a relational database the optimal data structure, given such objectives? How about the queries on it? Should they be sensitive only to the attributes, or to the “full text”?
I think these are just some of the questions that the students will themselves have to handle. Here, all that I am now going to do is to give some idea of how the usage scenarios would look like. In my description here, I will use the terms like database etc., but only in a broad sense—it doesn’t necessarily mean a traditional data structure like the relational database.
First, there will be a set of users who will populate the database using appropriate sources. They will have the authority to edit the database, and may be called system administrators. The database may sit on a remote secure Web server. The sys admins should be able to use, say, secure http-based (i.e. https-based) forms to edit the database, i.e. to add/modify/delete the entries.
Then there will be the end-users. The end-users should be able to access the database using plain http. Typically, they will use a user-friendly form to send their queries to the server over plain http, and get answers as reports consisting of Web pages. [More on this, in my next post.]
The database will, at the minimum, have information on the following attributes. (If this were an RDBMS, it will have the following as fields or columns): date, scientist, work, remarks, source reference (for the database entry—not the citation record itself), subject areas to which the entry belongs. Many of these attributes are tricky.
There are multiple meanings of “date” in a context like this. Some historical sources give dates only in years, others also in months, still other right up to the day and the time. For some advances, the dates are known only vaguely. For example: “He began working on it during the late 1820s, and continued up to 1826-end” or “fl. 1810s”, or “circa 3rd quarter of 1925”. The calender systems may have changed. Then, there are those BC dates. (Not all database software handle all the dates very smoothly anyway.) Sometimes, the date a scientist sent his manuscript may be more important than when it was published, but not always. Sometimes, the published accounts may be somewhat misleading: Newton did get the basic ideas of calculus right during that plague-related vacation, around 1666. Yet, it also is a fact that he still was working on many of the important calculus ideas in the few years before the publication in 1687 of the Principia. Newton didn’t have everything ready right in 1666, contrary to many stories you might have heard. (There has been a tendency to ascribe too much of a genius to the years of the early youth; wit a dim-wit like Hammings—dim-wit, when it comes to the age at which scientists created their best original ideas.)
The database must store the precise way the date was stated in the original sources. Yet, when a user sends in a query, regardless of the specific original recording, it should still be possible to sort out all the entries, with a certain smartness built into the system. The student will have to resolve the issue of how to put all the entries in (some sort of an) order—vague entries like “in the decade prior to the I WW” together with the more precise entries like “in the afternoon, after lunch, on October 7, 1900.” (Anyone knows what’s special with that time? I would be delighted to know from you!)
As far as this problem of sorting of entries of varying degrees of precision of time-specification is concerned, I do have an idea how it could be done. However, I would rather let the students try to work on it. [OK, I promise to give some hint in my next post on the topic.]
The “scientist” field may undergo changes. A plain guy like Thomson is the same as (the first Baron/Lord) Kelvin (after whom the temperature scale is named). “Snellius” is the same guy as Snell (of the laws of optics). A real gauntlet to us in the 21st century was thrown by a single Swiss family: There is a Daniel who, being one of a family, carries the same last name as Jacob, who is the same as Jakob is the same as Jacques is the same as James. It doesn’t end there. Johann, is the same as John is the same as Jean, but different from James. And then, there are Johann-II and also Johann-III. Worse: Sometimes they had a feud in the open. The famous brachistochrone problem was posed because one brother tried to show the other brother down. (The other brother succeeded, together with Lebniz and his credits-wise adversary, the “lion’s paws”-bearing Newton.) There has been a son in that family who thought his father had taken credit really due to him. The point? Your user should not get confused, and should be given all the relevant information as to which Bernoulli it was.
Some users may be interested in getting a time-line of all the works of just “Newton.” Others may wish to access a similar information for “Isaac Newton, Jr.” … Yes, the falling apple guy was named after his father, though few people know it, and even fewer (if ever) add the “Jr.” suffix to his name. A more likely search string, especially if originating from a tiny island lying above France and to the west of, say, Finland, might be: “Sir Isaac Newton.”
The “work” field needs to have both succinct information (so that the idea of a time-line or an Excel spreadsheet like report makes sense), but also a place for additional informative details. For instance, less known or less advertised factoids like the fact that Leibniz didn’t originate the idea of the “vis viva,” but was inspired by its descriptions in Huygens’ writings. The additional details should perhaps be supplied only in a more verbose report, but the point is that the database itself should be well-designed that there is a systematic and unambiguous way to tell such things if the need be.
The subjects attributes is important. It identifies all the subject areas in which a given advance of science falls. It will enable the end-user to run subject-specific queries. For example, a query like: “Give me a time-line of the kinetic theory,” which is slightly different from “Give me a time-line of the statistical mechancis,” which is slightly different from “Give me a time-line of the thermodynamics.” Contrary to the current pedagogy followed in the engineering schools, development in statistical mechanics was almost concurrent to thermodynamics. … Incidentally, thermodynamics didn’t exactly begin with Joule. It had sprung into action right after Watt’s engine, with Lazare Carnot, the father of Sadi Carnot, having a lot of thoughts related to the energetics program in physics. In case you have ever wondered how come the Second Law of Thermo came on the scene before the First one did, such a bit might be of interest to you.
Thus, the subject attributes may be both coarse-grained or fine-grained, and they may have multiple abstract hierarchies among them. The system administrators may invite subject experts (e.g. professors) to apply these attributes. This task, though complex and time-consuming, should not too difficult in a way: the attributes to use may come from certain standardized classification schemes such as the PACS classification [^], the AMS classifications [^], etc. What is more important for this project on the CS side is this requirement: the application mechanisms of subject attributes should be sufficiently neat that the application people—the subject experts—should find it very easy to find all the relevant attributes suggested to them in a handy way, when they apply these attributes to the individual entries.
As indicated above, one real tricky point is this: Some discoveries/inventions span across fine-grained descriptions. Another tricky point is the following: Some discoveries span across many fine-grained description only in the context of a later discovery—not otherwise. Therefore, they should not get applied for reports-generation unless the later discovery also is included in the search results. For instance, consider the topic of Fermat’s principle. This is a from classical, geometric, optics. It would not have been a potentially very important candidate for inclusion in mechanics-related searches until Schrodinger made it impossible to avoid it. … Actually, the idea of looking at Fermat’s principle as a mechanical principle goes centuries back; off-hand, I suppose it was Huygens (again), right in the 17th century, who first linked the mechanics of transfer of momentum (he would call it “the motion”) and Fermat’s principle. Then, in the 19th century, there was Hamilton who got inspired by the same conceptual link. But the point here is, before Schrodinger, not identifying the topic as a mechanical one, could, perhaps, have been an excusable omission. After Schrodinger, it is entirely inexcusable.
Each entry should clearly and explicitly identify the source(s) of the information on which it is based. For instance, McTutor’s, Encl. Brit., a certain book, a certain paper, etc. There could be multiple original sources objectively required for the same entry. For instance, when a subject expert rightly ascribes to Euler the impetus provided for development of the calculus of variation (CoV), he would need to take it away from Maupertuis, and for the same reason, he would need original sources authored by both.
BTW, this entry (on the impetus to CoV) would be different from another entry that is concerned with the first correct formalization of the CoV, which would involve a 19-year old Lagrange. And, both these entries would be different form the first-ever problem of CoV correctly posed and applied with correct a correct solution approach, which would be a credit due in (off-hand speaking, a 50+ year-old) Newton’s account (more than a century before Lagrange’s work).
Thus, the individual database entries should carry links to other, related, entries. Such things may come in the Remarks section, or there could be a special “See Also” section.
On the second thoughts, there could an extra attribute (to be carried by a database entry) for specifying the set of pre-requisite entries for it.
Further on the second thoughts, there could also be another, separate attribute that specifies a linkage to the other entries immediately preceding it and constituting the “prior development” for a given entry. (The prior development is not necessarily the same as a pre-requisite. For instance, the caloric fluid theory of heat is a prior development to the kinetic theory, but it’s not a pre-requisite.)
All the report fields may carry hyperlinks to the resources on the ‘net.
It might be desirable to include other information like the biographical details for scientists: dates (!) of birth and death, places thereof, nationalities (e.g. Prussia, West Germany, United Germany, etc!), educational institutions attended and degrees, if any, received, the career-wise affiliating institutes for each database entry. (Consider: would the work as a patent clerk qualify as an affiliating institute for a discovery in physics?) Details like these are not so important, and may be taken up during the version 2.0 of the project.
Finally, I would like to jot down a couple of resources. Informations from multiple sources like these needs to come together in a single, easily accessible database:
The McTutor Chronology [^]
The Wiki List of Timelines [^] (We need only the S&T related timelines. For version 1.0, focus only on physics and mathematics. For version 2.0, add: inventions/patents, engineering and technology. For version 3.0, add: biological sciences. For history of the humanities kind—who killed who to grab which power when and had how many wives—my advice to you would be: don’t bother.)
Now, a few things of business, if you are interested in picking this up as a project.
Don’t get in touch with me simply to ask if you can use this idea. Of course you can, so long as you acknowledge the intellectual credit due to me, and so long as you don’t release this idea into the Open Source/CopyLeft sort of a domain, and so long as you are not going to make any money out of it—I reserve all the rights to this idea in the first place.
However, do drop me a line before you begin any work on this idea. To repeat, I do reserve all the rights to this idea in the first place.
Also, if you are a student, don’t get in touch with me to ask if I could be become a guide to you. I won’t. What I have done here is to give you the basic idea in general (but in enough details). The particular fleshing out and the particular implementation is basically between you and your official college guide. This much should have been obvious, but the politics junkies and/or IB and/or CIA etc. have used a trick in the past, simply to harass me. For instance, when I was applying for a job as a professor to a college in Pune, someone from Gadchiroli or Ichalkaranji, claiming to be an undergraduate student, would call me, ask me if I can go to their place to be a guide—but could not name the name of his college principal.
However, once your official guide himself approaches me—which he must, if you are going to work on this project idea (see the above point again)—at that point, if he himself requests me to be an official co-guide, I wouldn’t necessarily mind—I will think about it, and let him know regarding my decision in this respect.
Finally, if you are a student, don’t get in touch with me to ask me if I could implement this project for some money from you. I won’t. It is true that as of today I am jobless, and that, even otherwise, I am always looking for ways to make money. But, I have not, don’t, would not ever sell finished (or even “half”-finished) projects to students for a charge.
I would rather write blogs and go further and harass the Maharashtra CM (any one occupying that post), possibly to (his or her) death, (correctly) blaming him (or her or they) for my joblessness, than start selling student projects for a charge. The first should come easier to me. That’s why.
Now, see if you wish to pick it up as a project. It will be useful. To a great degree. (And, perhaps, for this same reason, it won’t be suitable as a JPBTI project. Though, I couldn’t care less if IIT students pick it up as their project topic.)
As I said, there are a few more things about this project that I could say. I will write another blog post about it.
* * * * * * * * * * * * * * *
No “A Song I Like” section, once again. I still go jobless. Keep that in mind—if your mind is not so small as not to have the capacity to carry this factoid.
Prithviraj (the BITS Pilani- and Berkeley-trained CM)? Jairam (the IIT Bombay- and MIT-trained JPBTI)? Others like you? Ashamed of the fact that I still don’t have a job? Or not so? Or, perhaps, not at all so? What say? Not even on a Gandhi Jayanti day? Have borrowed that convenient “maun vrat” from a certain someone from Ralegan Siddhi? Really?
[This is revision 1, published on October 2, 2012, 12:35 PM IST. Initial draft posted on October 1, 2012, 4:10 PM, IST. I think I am done with the additions/clarifications for this post, and probably won’t come back to further update this post (unless a grammatical error is too glaring). The promised related post should appear in a few days’ time.]