Thursday, July 24, 2008

Software Symbiogenesis

Last time we applied softwarephysics to the “real world” of human affairs. This time we will apply softwarephysics to the “real world” of IT by analyzing a case study from the 1980s that highlights the role of symbiogenesis in software evolution. In 1985, I began work on the BSDE - Bionic Systems Development Environment while in the IT department of Amoco. BSDE was used to “grow” many applications and several million lines of production code from genes and embryos and was also a very good example of the symbiogenesis of software, but first let us review the theory of symbiogenesis in biology.

As discussed in Self-Replicating Information, Lynn Margulis is famous for her theory of symbiogenesis that proposes that mitochondria, chloroplasts, and the other organelles of eukaryotic cells came from formerly free-floating bacteria that invaded bacterial hosts about 1500 million years ago - an idea that is now widely accepted by nearly all biologists, primarily because both mitochondria and chloroplasts still contain residual genes stored on DNA within their membranes. Mitochondria also replicate themselves as distinct individual components within a cell prior to the division of a cell, with half going into each of the two daughter cells. This goes back to the mitochondria that were originally in your mother’s egg prior to fertilization by your father. Sperm cells do not contain mitochondria, so all of the mitochondria in the 100 trillion cells of your body originally came from your mother and from your mother’s mother and from her mother’s mother….. So essentially all the mitochondria in the cells of your body are an unbroken chain of bacterial invaders going back 1500 million years! Over time, the parasitic mitochondria and chloroplasts took on a symbiotic relationship within their host cells and finally became fully incorporated with their hosts to the point that both the parasites and the hosts became totally dependent upon each other for survival. This explains the highly dramatic architectural quantum leap that life took when it went from the very simple prokaryotic architecture of bacteria, which had persisted for the previous 2500 million years, to the very complex architecture of the eukaryotic cells found in our bodies and the bodies of all the higher forms of life.

This dramatic architectural change has always been hard to explain in terms of the gradual incremental changes favored by traditional Darwinian thought. In fact, the theory of symbiogenesis goes much further, as outlined in Margulis’s book Acquiring Genomes: A Theory of the Origins of Species (2002). Symbiogenesis contends that nearly all the modern features found in the higher forms of multicellular eukaryotic life had their roots in the parasitic/symbiotic acquisition of primitive prokaryotic bacterial genomes of DNA. Once acquired by multicellular forms of life, these bacterial genomes were further spread throughout the biosphere through the hybridization of species, like the mating of a horse with a donkey that produces a mule, but unlike most mules, these hybrids were also fertile and thus able to pass on the hybridized genomes of bacterial DNA. For Margulis, most innovation and natural selection occur down at the bacterial level because bacteria are great reservoirs of innovation that rapidly reproduce in large numbers and can easily exchange DNA with one another across various bacterial strains. In fact, Margulis contends that because so much DNA is transferred between different strains of bacteria, the concept of a species at the bacterial level is nonsensical. So for Margulis, the concept of a species as a group of organisms which can only mate with each other to produce fertile offspring, really only applies to large-scale multicellular organisms. Margulis goes on to explain that the reason that all of the major biochemical reactions found in the higher forms of life are also found down at the bacterial level is that that is where the DNA for these reactions first came from.

Although Margulis has some skeptical reservations for Richard Dawkins’ “selfish genes”, I would suggest that the acquisition of bacterial genomes and the subsequent hybridizations between species would simply be an additional mechanism for “selfish genes” to quickly build even better DNA survival machines for their own benefit than through the slow mutations of DNA one base-pair at a time. After all, it’s really just about self-replicating information finding better ways to survive, and dramatic innovations brought about by parasitic/symbiotic relationships with bacteria and the subsequent hybridization of new species would be an even better way to accomplish this task than the slow incremental changes of DNA one base-pair at a time favored by most biologists. Probably both mechanisms are of substantial importance and have had a major impact on the evolution of living things throughout time.

The acquisition of bacterial DNA by multicellular organisms and the subsequent dispersal of this DNA through hybridization between multicellular species could also be a mechanism to account for the kind of evolutionary change described by the theory of punctuated equilibrium developed by Niles Eldredge and Stephen Jay Gould (1972). It has long been noted, as far back as Darwin, that the fossil record seems to display long periods of little evolutionary change punctuated by brief periods of accelerated change. The theory of punctuated equilibrium provides an explanation for why the rate of evolutionary change is not constant, but speeds up and slows down with time, as is evident in the fossil record. Usually, living things exist in a state of equilibrium with their environment. Rabbits evolve to run with a certain top speed of escape, and foxes evolve to a corresponding top speed of pursuit. As Richard Dawkins has pointed out, neither rabbits nor foxes evolved to the point where they broke the speed of sound because the high costs in doing so would not have paid off with sufficient benefits. After all, the fox is merely running for his dinner, while the rabbit is running for his life. But every so often, something happens to disturb this equilibrium and then the pace of evolutionary change can accelerate greatly. For example, the removal of foxes from a region by urban sprawl might lead to slower moving rabbits that can easily outrun the overweight cats of the neighborhood. These slower moving rabbits could then invest the energy and materials needed to build strong leg muscles into something else that made the making of even more rabbits possible. The theory of symbiogenesis could be another mechanism to account for these dramatic changes found in the fossil record because the acquisition of bacterial genomes by higher forms of life is like building new software from already existing software parts, rather than slowly evolving software one line of code at a time. As IT professionals, we all routinely do this on a daily basis by combining already existing software into “new” software with “new” capabilities.

Another interesting observation from the world of IT that also supports the theory of symbiogenesis is that we do indeed see evidence that parasitic/symbiotic relationships between various forms of software seem to have been more important for the evolution of software over time than the outright competition between various forms of software as seen in a “Nature, red in tooth and claw”. True, Microsoft Word did prey upon WordPerfect to the point of extinction as did Microsoft Excel feast upon Lotus 1-2-3, which had previously hunted VisiCalc down until none was left. But those were the exceptions. Most times operating systems and various other forms of systems and application software enter into parasitic/symbiotic relationships that benefit all, and sometimes they even merge into a new form of software. For example, I vividly remember the early 1990s, when it was predicted that LANs composed of “high-speed” Intel 386 PCs running at a whopping 33 MHz would make IBM’s mainframes obsolete, and indeed, IBM nearly did go bankrupt in those days. However, today we find that IBM mainframes running z/OS, Unix servers, and client PCs running Windows or Mac have formed a hybridized parasitic/symbiotic relationship. True, the IBM OS/360 (1965) did evolve into the OS/370 of the 1970s, which then evolved into OS/390 (z/OS) in the 1990s, through small incremental advances, with good old CICS (1968) always remaining on top as the king of TP monitors throughout the whole business, but today we do see a true hybridization of these operating systems and their associated system and application software supporting high volume websites, and that hybridization occurred over a matter of a few brief years during the current decade, and not over the long decades of slow evolutionary change that we saw with the IBM mainframe operating systems. The other interesting thing that seems to recapitulate the symbiogenetic theory is that most of the current mainstream concepts found in IT seem to stem from the early prokaryotic bacterial Unstructured Period (1941 – 1972) of software. There is an old truism in IT that no matter what cutting-edge technology you may be working on today, somebody else was doing it back in the 1950s or 1960s, but nobody paid any attention to it at the time. Unfortunately, the history of IT is rampant with the frustration of Cassandra, who was granted the gift of prophecy by Apollo, but the curse that nobody would listen to her.

Like Cassandra, Lynn Margulis’s full-blown theory of symbiogenesis is still a bit on the fringe of current thought in biology, even though we clearly see evidence of it in the evolutionary history of software. This brings up an important point. Because genes, memes, and software are all forms of self-replicating information struggling with both the second law of thermodynamics and nonlinearity, people who spend their lives studying genes, memes, or software should be able to obtain valuable insights by observing how genes, memes, and software have all stumbled upon similar architectural and tactical solutions to these problems through the Darwinian mechanisms of innovation and natural selection. In evolutionary biology, this is called convergence, where different evolutionary lines of organisms evolve similar solutions to the same problems. An example of convergent evolution is the striking similarity of the wings of insects, birds, bats, and flying dinosaurs. All are used for the same purpose and have similar structures, but each evolved independently from different ancestral lines. Similarly, the concept of the “eye” has independently evolved more than 40 times over the past 600 million years. An excellent treatment of the significance that convergence has played in the evolutionary history of life on Earth, and possibly beyond, can be found in Life’s Solution (2003) by Simon Conway Morris.

The evolution of meme-complexes also seems to support the theory of symbiogenesis. Nearly all meme-complexes are really just combinations of ancient memes recast in modern terms. For example, nearly all the memes of modern science can be traced back to ancient Greece – the atomic theory to Leucippus and Democritus (440 BC), the heliocentric theory of the solar system to Aristarchus (270 BC), and string theory to Pythagoras (580 BC). Meme-complexes also display convergent evolution as they struggle for survival with the second law of thermodynamics and nonlinearity. In cultural evolution, the convergent evolution of meme-complexes is most noticeable when totally isolated cultures develop similar meme-complexes to solve similar problems. The fall of the Aztec civilization can be partially explained by the fact that a very small number of Spanish invaders under Cortes, about 600 men, were quite at home with the social structure of the Aztec civilization and could predict its weaknesses based upon similar weaknesses in the European civilization from which they came, just as the smallpox virus that accompanied them was also quite at home in the new DNA survival machines they found on the new continent.

The important thing to keep in mind is that the genes, memes, and software are all forms of self-replicating information trying to deal with the second law of thermodynamics and nonlinearity. That is the driving force for the convergent evolution amongst all three. So just as IT professionals can learn from the biological evolution of genes, I would like to suggest that biologists and cultural anthropologists could learn a few things from the evolution of software. For example, in biology, there has been a long-standing debate as to what exactly constitutes life. Many biologists do not consider viruses to be living things. Recall that a virus is a stretch of DNA or RNA wrapped in a protein coat that cannot replicate itself. In order for a virus to replicate, it must invade a living cell that contains all the necessary machinery in the form of enzymes to replicate the virus. In that regard, a virus is a purely parasitic form of nucleic acid, similar to the parasitic RNA which may have arisen in an early form of metabolic cellular life, and which got life as we know it off to a start. But based upon our definition of self-replicating information, viruses are surely just as valid a form of self-replicating information as any fully functional cell. If biologists could simply think of cellular life and viruses as both forms of self-replicating information, then the difficulty with the classification of viruses instantly disappears. So there is much to be gained by “real” scientists taking advantage of the “simulated” Software Universe that the IT community has so graciously provided.

I have long been amazed that fieldwork in the Software Universe has been so exceedingly sparse. I have been traipsing through it myself for nearly 30 years and have yet to come across a fellow research party - not even a lonely lost graduate student! And I am quite sure there must be some out there; graduate students have a tremendous knack for getting lost before their major professors set them straight upon the right course, narrowly averting a major scientific breakthrough. But I just have not stumbled across any of them, which is quite sad. Here we have this perfectly wonderful simulated universe, ripe for investigation, that is seemingly unknown to “real” observational scientists in the physical Universe. As Peter Ward and Donald Brownlee pointed out in Rare Earth (2000), it is hard to do statistics as an observational scientist when your sample population is N = 1. So I would encourage all observational scientists working in the “real” physical Universe to put together a proposal for a grant to do some fieldwork in the simulated universe of the Software Universe and raise N=2.

As I have already explained, softwarephysics is a simulated and largely observational science because I still have not figured out how to do simulated experiments in a simulated universe. But that is OK. When I first transitioned from physics into geophysics, I had to take a lot of geology courses because, with a B.S. in physics, I was accepted into graduate school with many geological deficiencies, having not taken a single course in geology as an undergraduate. Geophysics is truly an oil and vinegar affair, which combines the most mathematical science with probably the least, and physics is nearly 100% experimental, while geology is nearly 100% observational. Thankfully, physicists tend to be blessed with extraordinarily large egos, which allow them to look down from on high upon the rest of us with both modesty and forbearance. Surprisingly, as a newly minted physics graduate, I was quite impressed when I first had a chance to interact with my new found geological colleagues and see what they had accomplished over the past 200 years as observational scientists. They would take me out to a roadside cut and show me a layered section of rock. With the sole support of their trusty hand-lenses, they would then proceed to enlighten me with a 30-minute lecture on what the rocks had to say. For me, it was just a layered pile of rocks, but to them, it was a crime scene and they were the wily crime scene investigators, revealing the secrets hiding within the dead rocks. So I would encourage all observational scientists to take advantage of the Software Universe, by contacting the IT department of any local major corporation. However, I must warn you in advance, like 12th century Europeans, the members of the IT department will have no knowledge of the existence of the Software Universe or their place within it. You see, I have not been a very successful softwarephysicist when it comes to convincing IT to adopt softwarephysics, but the good news is that you will be able to conduct unencumbered fieldwork in the Software Universe to your heart’s content.

BSDE – Bionic Systems Development Environment
In the last half of my original posting on SoftwarePhysics, I described how BSDE came to be and a bit about how it was used to grow applications from embryos by turning on and off a set of genes. If you have not had a chance to read SoftwarePhysics yet, I would advise doing so now and also taking a look at a document on BSDE originally written in 1989, which describes how BSDE was used to grow applications. In the 1980s, BSDE was used to create mainframe applications on both the VM/CMS and MVS (OS/370) operating systems. BSDE generated applications using ISPF Dialog Manager to display and navigate screens, REXX, PL/1, or COBOL code for logic, and DB2 (SQL/DS on IBM VM/CMS) for the storage of data in databases. BSDE performed a maternal role for the developing application embryo until it was fully developed and delivered into production. Once in production, applications were subsequently maintained by BSDE as well. Because the computer charge rate for VM/CMS was substantially less than that for MVS/TSO at Amoco, it was much cheaper to grow an embryo within BSDE VM/CMS and then port the nearly fully grown larval stage embryo to MVS/TSO for the final delivery into production. BSDE MVS/TSO could then be used to maintain the fully-grown adult MVS/TSO application. So BSDE generated MVS/TSO applications took on a two-stage life cycle. The bulk of their development took place within BSDE VM/CMS as a larval stage embryo in an environment where computer charges were cheap and the living was easy. The adult stage application fluttered about on MVS/TSO fulfilling its purpose in life. Because BSDE was written with the same kinds of software that it grew, it was also used to grow the next release of BSDE. Over a period of seven years, from 1985 – 1992, more than 1,000 generations of BSDE were generated, and BSDE slowly evolved into a very sophisticated tool through small incremental changes.

I believe BSDE is a very typical example of how software evolves through both the small incremental changes of traditional Darwinian thought and also through the dramatic changes of symbiogenesis. For example, ISPF was originally developed by IBM in 1974 as a screen and menu-driven interface to their interactive command line interpreter to the OS/370 operating system known as TSO. Even today, ISPF is still the interface that all mainframe IT professionals use to interact with the IBM z/OS operating system. ISPF contained a very good full-screen editor whose power could be greatly expanded through the use of REXX ISPF edit macros, so I found it to be the perfect starting point for BSDE. REXX first came out in 1981 as an interpretive language for interacting with IBM mainframe operating systems that was similar in syntax to PL/1, but with a functionality similar to that of Unix shell scripts or DOS .bat files. REXX could also interact with DB2 (SQL/DS on IBM VM/CMS), so it was very good for quickly writing BSDE edit macros or light application code for application embryos. PL/1 came out in 1965, originally with the debut of the OS/360 operating system as a replacement for COBOL (1959), but it never quite succeeded. BSDE generated both PL/1 and COBOL code for embryos to handle heavy logic and to provide for faster execution than the slower interpretive REXX could provide. DB2 (SQL/DS on IBM VM/CMS) came out in 1983, as IBM’s relational database replacement for its hierarchical IMS (1966) database. IMS was originally developed to store the huge bill of materials for NASA’s Saturn V rocket used to carry men to the Moon in 1969. So you see, BSDE simply acquired the genomes of all these ancient software components in a symbiotic/parasitic manner. I could not possibly have written BSDE if I had to code up the functionality of all these components that IBM had spent many decades developing with huge budgets and vast amounts of manpower. But by simply assembling these components through symbiogenesis into a new form of software, I got BSDE off to a good start. Then I simply allowed the Darwinian mechanisms of innovation and natural selection through small incremental changes, one line of code at a time, to take over and allow BSDE to evolve into a very powerful development tool of its own.

Next time we will sum things up with a lessons learned from softwarephysics for IT professionals with a list of tips on how you can improve your performance and make your IT job easier by using softwarephysics on a daily basis. Then we will finish up softwarephysics with a posting on CyberCosmology that describes the origins of cyberspacetime and software and where they may be heading in the future.

Comments are welcome at

To see all posts on softwarephysics in reverse order go to:

Steve Johnston

No comments: