[Short Updates] Results of a simplest possible extractive text summarizer

Today, I implemented a simplest possible extractive text summarizer, one which is based on word frequencies, as explained here [^], here [^], and also way back in 2014, here [^].

I ran it on my a few of my own blog posts. Here are the results. Amusement is all yours.


Original document: “A bit on Panpsychism—part 1: what its basis possibly could be” [^]

Extractive summary:

However, when I began wondering what could possibly be the theoretical bases of this idea (of panpsychism), these two seemed to be the right kind of bases.
[I don’t know if all advocates of panpsychism accept the above two ideas or not.
world that is permanently inaccessible to the rest of us, but one that is, somehow, definitely accessible to philosophers of mysticism such as Plato or Kant.
So, my discussion of panpsychism is going to be limited to what I understand about this theory after listening to only Prof. Chalmers.


Original document: “A bit on Panpsychism—part 2: Why the idea is basically problematic, and what could be a different (and hopefully better) alternative” [^].

Extractive summary:

We can rely on the physics principles so long as we are able to bring the physical actions produced by the consciousness of the cat into our system description.
In terms of our thought experiment, it takes the consciousness of the cat and smears it onto not just the wooden box, but also onto the wooden table.
The table acts exactly the same way whether there is only a box, or a box with a non-responsive cat, or a box with a much meowing cat.
Also, the elementary bits of “life”: can there be a of life too, and if yes, how does differ from ordinary loss of life (i.e.
Consciousness is an attribute of only those beings that actually have life.


Original document: My initial post (not including replies) at the iMechanica thread of discussion: “Stress or strain: which one is more fundamental?” [^]

Extractive summary:

You always need a geometric entity like area or line element (even if it is infinitesimally small) before quantities like stress or flux can at all be defined.
In short, since both are second order symmetric tensors, stress and strain tensors do seem completely similar.
Now, if you split the relative deformation tensor into its symmetric and anti-symmetric parts, and ignore the anti-symmetric part (representing rotations), what you get is the strain tensor.
In between stress and strain, which one is the more fundamental physical quantity?
stress, strain, electric field vector) is a fundamental one.


Original document: The entirety of the above thread

Extractive summary:

Definitions are arbitrary and the only rule one need to obey is the correctness of energy calculation for each pair of stress and strain definition.
In both cases, thinner or thicker, if no external force is applied, we would simply say that the stress in the dielectric is zero.
The ABAQUS theory manual has a section describing different definitions of strain and stress.
Stress and strain are duals in the sense of energy or work.
After all, thinner or thicker is just an observation of strain, and says nothing about stress.


Conclusion: Obviously, the algorithm does not work very well—not at least on the kind of documents that were considered here.


I put this post in the Short Updates category, because most of the content was produced by the program; I myself wrote only a few sentences.

 

 

Advertisements

Flames not so old…

The same picture, but two American interpretations, both partly misleading (to varying degrees):

NASA releases a photo [^] on the FaceBook, on 24 August at 14:24, with this note:

The visualization above highlights NASA Earth satellite data showing aerosols on August 23, 2018. On that day, huge plumes of smoke drifted over North America and Africa, three different tropical cyclones churned in the Pacific Ocean, and large clouds of dust blew over deserts in Africa and Asia. The storms are visible within giant swirls of sea salt aerosol (blue), which winds loft into the air as part of sea spray. Black carbon particles (red) are among the particles emitted by fires; vehicle and factory emissions are another common source. Particles the model classified as dust are shown in purple. The visualization includes a layer of night light data collected by the day-night band of the Visible Infrared Imaging Radiometer Suite (VIIRS) on Suomi NPP that shows the locations of towns and cities.

[Emphasis in bold added by me.]

For your convenience, I reproduce the picture here:

Aerosol data by NASA

Aerosol data by NASA. Red means: Carbon emissions. Blue means: Sea Salt. Purple means: Dust particles.

Nicole Sharp blogs [^] about it at her blog FYFD, on Aug 29, 2018 10:00 am, with this description:

Aerosols, micron-sized particles suspended in the atmosphere, impact our weather and air quality. This visualization shows several varieties of aerosol as measured August 23rd, 2018 by satellite. The blue streaks are sea salt suspended in the air; the brightest highlights show three tropical cyclones in the Pacific. Purple marks dust. Strong winds across the Sahara Desert send large plumes of dust wafting eastward. Finally, the red areas show black carbon emissions. Raging wildfires across western North America are releasing large amounts of carbon, but vehicle and factory emissions are also significant sources. (Image credit: NASA; via Katherine G.)

[Again, emphasis in bold is mine.]

As of today, Sharp’s post has collected some 281 notes, and almost all of them have “liked” it.

I liked it too—except for the last half of the last sentence, viz., the idea that vehicle and factory emissions are significant sources (cf. NASA’s characterization):


My comment:

NASA commits an error of omission. Dr. Sharp compounds it with an error of commission. Let’s see how.

NASA does find it important to mention that the man-made sources of carbon are “common.” However, the statement is ambiguous, perhaps deliberately so. It curiously omits to mention that the quantity of such “common” sources is so small that there is no choice but to regard it as “not critical.” We may not be in a position to call the “common” part an error of commission. But not explaining that the man-made sources play negligible (even vanishingly small) role in Global Warming, is sure an error of omission on NASA’s part.

Dr. Sharp compounds it with an error of commission. She calls man-made sources “significant.”

If I were to have an SE/TE student, I would assign a simple Python script to do a histogram and/or compute the densities of red pixels and have them juxtaposed with areas of high urban population/factory density.


This post may change in future:

BTW, I am only too well aware of the ugly political wars being waged by a lot of people in this area (of Global Warming). Since I do appreciate Dr. Sharp’s blog, I would be willing to delete all references to her writing from this post.

However, I am going to keep NASA’s description and the photo intact. It serves as a good example of how a good visualization can help in properly apprehending big data.

In case I delete references to Sharp’s blog, I will simply add another passage on my own, bringing out how man-made emissions are not the real cause for concern.

But in any case, I would refuse to be drawn into those ugly political wars surrounding the issue of Global Warming. I have neither the interest nor the bandwidth to get into it, and further, I find (though can’t off-hand quote) that several good modelers/scientists have come to offer very good, detailed, and comprehensive perspectives that justify my position (mentioned in the preceding paragraph). [Off-hand, I very vaguely remember an academic, a lady, perhaps from the state of Georgia in the US?]


The value of pictures:

One final point.

But, regardless of it all (related to Global Warming and its politics), this picture does serve to highlight a very important point: the undeniable strength of a good visualization.

Yes I do find that, in a proper context, a picture is worth a thousand words. The obvious validity of this conclusion is not affected by Aristotle’s erroneous epistemology, in particular, his wrong assertion that man thinks in terms of “images.” No, he does not.

So, sure, a picture is not an argument, as Peikoff argued in the late 90s (without using pictures, I believe). If Peikoff’s statement is taken in its context, you would agree with it, too.

But for a great variety of useful contexts, as the one above, I do think that a picture is worth a thousand words. Without such being the case, a post like this wouldn’t have been possible.


A Song I Like:
(Hindi) “dil sajan jalataa hai…”
Singer: Asha Bhosale
Music: R. D. Burman [actually, Bertha Egnos [^]]
Lyrics: Anand Bakshi


Copying it right:

“itwofs” very helpfully informs us [^] that this song was:

Inspired in the true sense, by the track, ‘Korbosha (Down by the river) from the South African stage musical, Ipi Ntombi (1974).”

However, unfortunately, he does not give the name of the original composer. It is: Bertha Egnos (apparently, a white woman from South Africa [^]).

“itwofs” further opines that:

Its the mere few initial bars that seem to have sparked Pancham create the totally awesome track [snip]. The actual tunes are completely different and as original as Pancham can get.

I disagree.

Listen to Korbosha and to this song, once again. You will sure find that it is far more than “mere few initial bars.” On the contrary, except for a minor twist here or there (and that too only in some parts of the “antaraa”/stanza), Burman’s song is almost completely lifted from Egnos’s, as far as the tune goes. And the tune is one of the most basic—and crucial—elements of a song, perhaps the most crucial one.

However, what Burman does here is to “customize” this song to “suit the Indian road conditions tastes.” This task also can be demanding; doing it right takes a very skillful and sensitive composer, and R. D. certainly shows his talents in this regard, too, here. Further, Asha not only makes it “totally, like, totally” Indian, she also adds a personal chutzpah. The combination of Egnos, RD and Asha is awesome.

If the Indian reader’s “pride” got hurt: For a reverse situation of “phoreenn” people customizing our songs, go see how well Paul Mauriat does it.

One final word: The video here is not recommended. It looks (and is!) too gaudy. So, even if you download a YouTube video, I recommend that you search for good Open Source tools and use it to extract just the audio track from this video. … If you are not well conversant with the music software, then Audacity would confuse you. However, as far as just converting MP4 to MP3 is concerned, VLC works just as great; use the menu: Media \ Convert/Save. This menu command works independently of the song playing in the “main” VLC window.


Bye for now… Some editing could be done later on.

Machine “Learning”—An Entertainment [Industry] Edition

Yes, “Machine ‘Learning’,” too, has been one of my “research” interests for some time by now. … Machine learning, esp. ANN (Artificial Neural Networks), esp. Deep Learning. …

Yesterday, I wrote a comment about it at iMechanica. Though it was made in a certain technical context, today I thought that the comment could, perhaps, make sense to many of my general readers, too, if I supply a bit of context to it. So, let me report it here (after a bit of editing). But before coming to my comment, let me first give you the context in which it was made:


Context for my iMechanica comment:

It all began with a fellow iMechanician, one Mingchuan Wang, writing a post of the title “Is machine learning a research priority now in mechanics?” at iMechanica [^]. Biswajit Banerjee responded by pointing out that

“Machine learning includes a large set of techniques that can be summarized as curve fitting in high dimensional spaces. [snip] The usefulness of the new techniques [in machine learning] should not be underestimated.” [Emphasis mine.]

Then Biswajit had pointed out an arXiv paper [^] in which machine learning was reported as having produced some good DFT-like simulations for quantum mechanical simulations, too.

A word about DFT for those who (still) don’t know about it:

DFT, i.e. Density Functional Theory, is “formally exact description of a many-body quantum system through the density alone. In practice, approximations are necessary” [^]. DFT thus is a computational technique; it is used for simulating the electronic structure in quantum mechanical systems involving several hundreds of electrons (i.e. hundreds of atoms). Here is the obligatory link to the Wiki [^], though a better introduction perhaps appears here [(.PDF) ^]. Here is a StackExchange on its limitations [^].

Trivia: Kohn and Sham received a Physics Nobel for inventing DFT. It was a very, very rare instance of a Physics Nobel being awarded for an invention—not a discovery. But the Nobel committee, once again, turned out to have put old Nobel’s money in the right place. Even if the work itself was only an invention, it did directly led to a lot of discoveries in condensed matter physics! That was because DFT was fast—it was fast enough that it could bring the physics of the larger quantum systems within the scope of (any) study at all!

And now, it seems, Machine Learning has advanced enough to be able to produce results that are similar to DFT, but without using any QM theory at all! The computer does have to “learn” its “art” (i.e. “skill”), but it does so from the results of previous DFT-based simulations, not from the theory at the base of DFT. But once the computer does that—“learning”—and the paper shows that it is possible for computer to do that—it is able to compute very similar-looking simulations much, much faster than even the rather fast technique of DFT itself.

OK. Context over. Now here in the next section is my yesterday’s comment at iMechanica. (Also note that the previous exchange on this thread at iMechanica had occurred almost a year ago.) Since it has been edited quite a bit, I will not format it using a quotation block.


[An edited version of my comment begins]

A very late comment, but still, just because something struck me only this late… May as well share it….

I think that, as Biswajit points out, it’s a question of matching a technique to an application area where it is likely to be of “good enough” a fit.

I mean to say, consider fluid dynamics, and contrast it to QM.

In (C)FD, the nonlinearity present in the advective term is a major headache. As far as I can gather, this nonlinearity has all but been “proved” as the basic cause behind the phenomenon of turbulence. If so, using machine learning in CFD would be, by the simple-minded “analysis”, a basically hopeless endeavour. The very idea of using a potential presupposes differential linearity. Therefore, machine learning may be thought as viable in computational Quantum Mechanics (viz. DFT), but not in the more mundane, classical mechanical, CFD.

But then, consider the role of the BCs and the ICs in any simulation. It is true that if you don’t handle nonlinearities right, then as the simulation time progresses, errors are soon enough going to multiply (sort of), and lead to a blowup—or at least a dramatic departure from a realistic simulation.

But then, also notice that there still is some small but nonzero interval of time which has to pass before a really bad amplification of the errors actually begins to occur. Now what if a new “BC-IC” gets imposed right within that time-interval—the one which does show “good enough” an accuracy? In this case, you can expect the simulation to remain “sufficiently” realistic-looking for a long, very long time!

Something like that seems to have been the line of thought implicit in the results reported by this paper: [(.PDF) ^].

Machine learning seems to work even in CFD, because in an interactive session, a new “modified BC-IC” is every now and then is manually being introduced by none other than the end-user himself! And, the location of the modification is precisely the region from where the flow in the rest of the domain would get most dominantly affected during the subsequent, small, time evolution.

It’s somewhat like an electron rushing through a cloud chamber. By the uncertainty principle, the electron “path” sure begins to get hazy immediately after it is “measured” (i.e. absorbed and re-emitted) by a vapor molecule at a definite point in space. The uncertainty in the position grows quite rapidly. However, what actually happens in a cloud chamber is that, before this cone of haziness becomes too big, comes along another vapor molecule, and “zaps” i.e. “measures” the electron back on to a classical position. … After a rapid succession of such going-hazy-getting-zapped process, the end result turns out to be a very, very classical-looking (line-like) path—as if the electron always were only a particle, never a wave.

Conclusion? Be realistic about how smart the “dumb” “curve-fitting” involved in machine learning can at all get. Yet, at the same time, also remain open to all the application areas where it can be made it work—even including those areas where, “intuitively”, you wouldn’t expect it to have any chance to work!

[An edited version of my comment is over. Original here at iMechanica [^]]


 

“Boy, we seem to have covered a lot of STEM territory here… Mechanics, DFT, QM, CFD, nonlinearity. … But where is either the entertainment or the industry you had promised us in the title?”

You might be saying that….

Well, the CFD paper I cited above was about the entertainment industry. It was, in particular, about the computer games industry. Go check out SoHyeon Jeong’s Web site for more cool videos and graphics [^], all using machine learning.


And, here is another instance connected with entertainment, even though now I am going to make it (mostly) explanation-free.

Check out the following piece of art—a watercolor landscape of a monsoon-time but placid sea-side, in fact. Let me just say that a certain famous artist produced it; in any case, the style is plain unmistakable. … Can you name the artist simply by looking at it? See the picture below:

A sea beach in the monsoons. Watercolor.

If you are unable to name the artist, then check out this story here [^], and a previous story here [^].


A Song I Like:

And finally, to those who have always loved Beatles’ songs…

Here is one song which, I am sure, most of you had never heard before. In any case, it came to be distributed only recently. When and where was it recorded? For both the song and its recording details, check out this site: [^]. Here is another story about it: [^]. And, if you liked what you read (and heard), here is some more stuff of the same kind [^].


Endgame:

I am of the Opinion that 99% of the “modern” “artists” and “music composers” ought to be replaced by computers/robots/machines. Whaddya think?

[Credits: “Endgame” used to be the way Mukul Sharma would end his weekly Mindsport column in the yesteryears’ Sunday Times of India. (The column perhaps also used to appear in The Illustrated Weekly of India before ToI began running it; at least I have a vague recollection of something of that sort, though can’t be quite sure. … I would be a school-boy back then, when the Weekly perhaps ran it.)]