Friday, April 24, 2009

A Proposal For All Practicing Paleontologists

This blog on softwarephysics was originally intended to help IT professionals with the daily mayhem of life in IT, but in this posting, I would like to make a suggestion that might be of help to a totally different group of professionals that are also very dear to my heart.

Back in 1977, when I was an exploration geophysicist exploring for oil off the coast of Cameroon in West Africa with Shell, we had a bunch of big shots from our mother company, Royal Dutch Shell, pay a friendly visit to our Houston office for a review of our Cameroon concession. They came all the way from the Royal Dutch Shell corporate headquarters in The Hague, so the whole Houston exploration office was naturally a little nervous. Our local Shell Houston management team arranged for a high-powered presentation for our Dutch visitors, and all were in attendance when the Cameroon exploration team made its presentation. The petroleum engineers went up first and made a presentation on the past, current, and projected production volumes we were seeing. The geologists and geochemists went up next and gave an overview of the lithology of the sands and shales that we had been drilling through, with special attention to the porosities and permeabilities of the sandstone reservoir rock that held the oil, and the carbon content of the shale source rock from which the oil came. The petrophysicists naturally tagged along for this segment of the presentation with sample well logs. The petrophysicists earned a living by pulling up well logging tools from the bottom of exploration wells on a cable, after the drill string pipe that turned the drill bit had been removed, and measuring the resistivity, gamma ray flux, sound velocity, and neutron porosity of the rock along the side of the borehole, as the well logging tools passed by. This yielded graphs of wiggly lines that told us all sorts of things about the rock strata that we had drilled through. We geophysicists followed next, with our very impressive seismic sections and structure maps of the production fields, and also of lots of prospective exploration targets too. The seismic sections were obtained by shooting very low-frequency sound waves from an air gun, in the range of 10 – 100 Hz, down into the rock strata from a recording vessel, trailing a long cable strung out with hydrophones to record the reflected echoes that came back up as wiggly lines. When you lined these wiggly lines up, one after the other, as the recording vessel steamed by off the coast of Cameroon, you ended up with a seismic section which looked very much like a cross section of the underlying rock layers– sort of a sonogram of the underlying rock.

By the way, all the digital information that you use on a daily basis – your CDs, DVDs, iPods, digital cameras, JPEG images on websites, digitized telephone traffic, and now digital TV broadcasts, all stem from research done back in the 1950s by oil companies. As you can see, exploration teams deal with lots of wiggly lines. In the 1950s and early 1960s, we recorded these wiggly lines on analog magnetic tape as wiggly variations of magnetization, like analog audio tape, or like the wiggly bumps and valleys on a vinyl record groove. But in order to manipulate these analog wiggles, we had to pass them through real amplifiers or filters, like people used to do when they turned the bass and treble knobs on their old analog amplifiers. As anybody who has ever played with the bass and treble knobs on an analog amplifier, while listening to an old vinyl record, can attest, there is only so much you can do with analog technology. So in the 1950s, oil companies began to do a great deal of research into converting from analog recording to digital recording. In the late 1960s, the whole oil industry went digital and started processing the wiggly lines on seismic sections and well logs with computers instead of physical amplifiers and filters. Again, this demonstrates the value of scientific research. I have always been amazed at the paltry sums that human civilization has routinely allocated to scientific research over the past 400 years, even in the face of all the benefits that it has generated for mankind.

So after all this high-tech computer-generated data had been presented to our Dutch guests, our lowly paleontologist followed up with the final presentation of the day. Our sole paleontologist was a one-man army on an exploration team consisting of about 50 geologists, geophysicists, geochemists, petrophysicists and petroleum engineers. His job was to look for little fossils called “forams”, also known in the industry by the pejorative term of “bugs”, in the cuttings that came up in the drilling mud from the drill bit at the bottom of exploration holes, as we drilled down through the rock strata. Based upon these “bugs”, he could date the age of the rock we were drilling through, and also determine the depositional settings of the sediments, as they were deposited over time. This sounded pretty boring, even for a geologist, so that is why our paleontologist went last, in case the meeting ran long and we had to cut his talk. So when our sole paleontologist began his presentation, we all expected to see a lot of boring slides of countless “bugs”, like the compulsory slideshow of your neighbor’s summer vacation at Yellowstone. To our surprise, our lone paleontologist got up and proceeded to blow us all away! By far, he gave the best presentation of the day, as he described in great detail the whole evolutionary history of how the Cameroon basin developed over time, complete with hand-drawn panoramas that showed how the place looked millions of years ago! It turns out that our paleontologist was quite an artist too, so he vividly brought to life the whole business, and I learned a great deal about the geological history of the basin that day and so did the rest of our exploration team. I think this demonstrates the value of taking an interdisciplinary approach to gaining knowledge, and the importance of not discounting the efforts of any discipline engaged in the pursuit of knowledge.

So I would like to suggest a possible interdisciplinary dissertation topic for one of your graduate students. This might involve teaming up with the Computer Science Department at your university or other universities. I think there would be a great benefit in doing a paleontological study of the evolution of software architecture over the past 70 years from a biological standpoint. I do not think this has ever been attempted before, and with a more or less complete and intact fossil record of paleosoftware still at hand, along with the fact that many of the original contributors or their protégés are still alive today, I think it would be a very useful and interesting study that would greatly benefit computer science and paleontology as well. My suspicion is that there would be a great deal of controversy in such a study regarding the importance and priority of many of the events in the evolution of software architecture, even with all the data freely at hand and with most of the events having occurred within living memory, so no wonder paleontologists in the field have such a hard go of it! This is something that computer science cannot do on its own. It needs the skills of a good paleontologist to put this all together before it is too late and much of the historical data is lost.

This might sound like a strange proposal for a paleontologist, but for those of you who are familiar with Life’s Solution (2003) by Simon Conway Morris, there really is life on Thega IX, as he supposed. Life on Thega IX arose about 2.15 billion seconds ago but is a little different than professor Morris imagined. Life on Thega IX is silicon-based and not carbon-based, and it does not rely upon the chemical characteristics of silicon either, but rather its electrical properties instead. Yet despite all these differences, the first forms of silicon-based life on Thega IX were prokaryotic bacterial forms very similar in structure to the prokaryotic bacteria of Earth. These early prokaryotic life forms had very little internal structure, but they were quite hardy and can still be found in huge quantities on Thega IX even today. Similarly, the first eukaryotic forms of life appeared about 1.17 billion seconds ago, following the long dominance of the Thega IX biosphere by the simple prokaryotes. These first eukaryotic cells divided their internal functions up amongst a large number of internally encapsulated organelles called functions(). Over time, the close association of large numbers of eukaryotic cells in parasitic/symbiotic communities led to the first emergence of simple worm-like multicellular organisms about 536 million seconds ago. But multicellular organization is a hard nut to crack and not much happened for several hundred million seconds until Thega IX experienced a Cambrian Explosion of its own about 158 million seconds ago, which suddenly generated several dozen Phyla on the planet called Design Patterns. Currently, Thega IX is still in the midst of its Cambrian Explosion and is generating very complicated multicellular organisms consisting of millions of objects (cells) that make CORBA calls on the services of millions of other objects located in a set of dispersed organs within organisms. There are even early indications that some of the more advanced organisms on Thega IX are on the brink of consciousness, and might even start communicating with us. You see, Thega IX is much closer to us than professor Morris imagined and is also known by some as the Earth.

It is truly amazing how software architecture has converged upon the very same solutions as living things did on Earth many millions of years ago and followed exactly the same evolutionary path through the huge combinatorial universe of program hyperspace. You see, just as the vast combinatorial universe of protein hyperspace is mostly barren, so too most potential programs are stillborn and do not work at all. There are only a few isolated islands of functional software architecture in the immense combinatorial universe of program hyperspace, and the IT community has slowly navigated through this program hyperspace over the past 70 years through a series of island hops.

DNA and software are both forms of self-replicating information that must deal with the second law of thermodynamics in a nonlinear Universe, and consequently, have evolved very similar survival strategies to deal with these challenges through a process of convergence.

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

Currently, DNA uses enzymes to replicate, while software uses programmers. Some of the postings in this blog on softwarephysics might provide a good starting point. Specifically, I would recommend the following postings:

Self-Replicating Information
Software Symbiogenesis
The Fundamental Problem of Software

The beauty of doing a paleontological study of the evolution of software architecture is that software is evolving about 30 million times faster than carbon-based life forms on Earth so that 1 software sec ~ 1 year of geological time. I have been doing IT for 30 years, so I have personally seen about half of this software evolution unfold in my own career.

So here is how you can do some very interesting fieldwork on Thega IX:

1. Make contact with some of your colleagues in the Computer Science department of your university. Try to find a colleague who lists “Biologically Inspired Computing” (BIC) or “Natural Computing” as a topic of interest on their web profile. They can help you with the IT jargon and provide you with a very high-level discussion of software architecture. Some of the older faculty members can also walk you through the evolution of software architecture over the past 70 years too.

2. Approach some of the major corporations in your area. Try to find corporations that have large high-volume websites running on J2EE Appservers like WebSphere. Then ask to spend some time in the IT Operations Command Center for the corporation. This will give you a high-level view of their IT infrastructure under processing load. This will be very much like being reduced to an observer at the molecular level within a multicellular organism. Watch for the interplay and information flow between the huge number of IT components in action and all the problems that happen on a daily basis too.

3. Then spend some time with the corporation’s developers (programmers) and have them explain to you how their Applications work. You will be amazed.

The most fascinating thing about this convergence of software architecture is that it all occurred in complete intellectual isolation. I have been trying to get IT professionals to think in biological terms for more than 30 years to no avail, so this convergence of software architecture over the past 70 years is truly a bona fide example of convergence and not of intellectual inheritance.

Here are a few important concepts, and their biological equivalents, that you will hear about when working with IT professionals:

Class – Think of a class as a cell type. For example, the class Customer is a class that defines the cell type of Customer and describes how to store and manipulate the data for a Customer, like firstName, lastName, address, and accountBalance. For example, in a program, they might instantiate a Customer called “steveJohnston”.

Object – Think of an object as a cell. A particular object will be an instance of a class. For example, the object steveJohnston might be an instance of the class Customer and will contain all the information about my particular account with a corporation. At any given time, there could be many thousands of Customer objects bouncing around in the IT infrastructure of a major corporation’s website.

Instance – An instance is a particular object of a class. For example, the steveJohnston object would be an instance of the class Customer. Many times programmers will say things like “This instantiates the Customer class”, meaning it creates objects (cells) of the Customer class (cell type).

Method – Think of a method() as a biochemical pathway. It is a series of programming steps or “lines of code” that produce a macroscopic change in the state of an object (cell). The Class for each type of object defines the data for the Class, like firstName, lastName, address, and accountBalance, but it also defines the methods() that operate upon these data elements. Some methods() are public, while others are private. A public method() is like a receptor on the cell membrane of an object (cell). Other objects(cells) can send a message to the public methods of an object (cell) to cause it to execute a biochemical pathway within the object (cell). For example, steveJohnston.setFirstName(“Steve”) would send a message to the steveJohnston object instance (cell) of the Customer class (cell type) to have it execute the setFirstName method() to change the firstName of the object to “Steve”. The steveJohnston.getaccountBalance() method would return my current account balance with the corporation. Objects also have many internal private methods() within that are biochemical pathways that are not exposed to the outside world. For example, the calculateAccountBalance() method could be an internal method that adds up all of my debits and credits and updates the accountBalance data element within the steveJohnston object, but this method cannot be called by objects (cells) outside of the steveJohnston object (cell). External objects (cells) have to call the steveJohnston.getaccountBalance() in order to find out my accountBalance.

Line of Code – This is a single statement in a method() like:

discountedTotalCost = (totalHours * ratePerHour) - costOfNormalOffset;

Remember methods() are the equivalent of biochemical pathways and are composed of many lines of code, so each line of code is like a single step in a biochemical pathway. Similarly, each character in a line of code can be thought of as an atom, and each variable as an organic molecule. Each character can be in one of 256 ASCII quantum states defined by 8 quantized bits, with each bit in one of two quantum states “1” or “0”, which can also be characterized as ↑ or ↓ and can be thought of as 8 electrons in 8 electron shells, with each electron in a spin up ↑ or spin down ↓ state:

C = 01000011 = ↓ ↑ ↓ ↓ ↓ ↓ ↑ ↑
H = 01001000 = ↓ ↑ ↓ ↓ ↑ ↓ ↓ ↓
N = 01001110 = ↓ ↑ ↓ ↓ ↑ ↑ ↑ ↓
O = 01001111 = ↓ ↑ ↓ ↓ ↑ ↑ ↑ ↑

Developers (programmers) have to assemble characters (atoms) into organic molecules (variables) to form the lines of code that define a method() (biochemical pathway). As in carbon-based biology, the slightest error in a method() can cause drastic and usually fatal consequences. Because there is nearly an infinite number of ways of writing code incorrectly and only a very few ways of writing code correctly, there is an equivalent of the second law of thermodynamics at work. This simulated second law of thermodynamics and the very nonlinear macroscopic effects that arise from small coding errors is why software architecture has converged upon Life’s Solution.

Comments are welcome at

To see all posts on softwarephysics in reverse order go to:

Steve Johnston