Sunday, May 26, 2013

A Proposal for an Odd Collaboration to Explore the Origin of Life with IT Professionals

Currently, there are a number of diverse collaborations throughout the world exploring the origin of life on Earth and elsewhere. These teams are normally composed of members from the hard sciences, such as astronomers, astrophysicists, biologists, biochemists, chemists, geologists, geochemists, geophysicists, and physicists. I would like to propose that these diverse teams also take on a few additional participants from the Computer Science departments of their home universities and also a number of IT professionals from the IT departments of several major corporations throughout the world. The purpose of this strange collaboration would be to use the origin and evolution of commercial software over the past 70 years as a model for the origin and evolution of life on Earth and elsewhere. My hope would be that amongst such a diverse team something would click – somebody from outside IT might see something in what the IT community has painstakingly built over the past 70 years that rings a bell in their domain of experience. I think this effort could be conducted with very little cost using remote WebEx conferences, instant messaging, and email over the Internet. I would be very much interested in participating in such a collaboration as a junior member, with the intention of recruiting additional IT professionals from the pool of computer science graduates from the participating universities who happen to have some interest in bioinformatics or biologically inspired computing, and who have also moved into the world of corporate IT. Let me explain.

Currently, I am in the Middleware Operations group of the IT department of a major US corporation and support all of its externally facing websites and also all of the internal applications used to run the business. I graduated from the University of Illinois in 1973 with a B.S. in Physics and from the University of Wisconsin in 1975 with an M.S. in Geophysics, and from 1975 – 1979 I was an exploration geophysicist exploring for oil, first with Shell, and then with Amoco. I started programming in 1972, and in 1979 I decided to make a career change and become an IT professional in Amoco’s IT department. When I first transitioned into IT from geophysics, I figured if you could apply physics to geology; why not apply physics to software? So like the exploration team at Amoco that I had just left, consisting of geologists, geophysicists, paleontologists, geochemists, and petrophysicists, I decided to take all the physics, chemistry, biology, and geology that I could muster and throw it at the problem of software. The basic idea was that many concepts in physics, chemistry, biology, and geology suggested to me that the IT community had accidentally created a pretty decent computer simulation of the physical Universe on a grand scale, a Software Universe so to speak, and that I could use this fantastic simulation in reverse, to better understand the behavior of commercial software by comparing software to how things behaved in the physical Universe. Softwarephysics depicts software as a virtual substance, and relies upon our understanding of the current theories in physics, chemistry, biology, and geology to help us model the nature of software behavior. So in physics we use software to simulate the behavior of the Universe, while in softwarephysics we use the Universe to simulate the behavior of software.

I will soon be turning 62 years old and heading into the homestretch, so a few years back I started this blog on softwarephysics to share what I had discovered over the years with the rest of the IT community. My initial intention for my blog on softwarephysics was to help the IT community to better cope with the daily mayhem of life in IT. However, in laying down the postings for this blog an unintended consequence arose in my mind as I became profoundly aware of the enormity of this vast computer simulation of the physical Universe that the IT community had so graciously provided to the scientific community free of charge, and also of the very significant potential scientific value that it provided. One of the nagging problems for many of the observational and experimental sciences is that many times there is only one example readily at hand to study or experiment with, and it is very difficult to do meaningful statistics with a population of N=1. But the computer simulation of the physical Universe that the Software Universe presents provides another realm for comparison. For example, both biology and astrobiology only have one biosphere on Earth to study and even physics itself has only one Universe with which to engage. Imagine the possibilities if scientists had another Universe readily at hand in which to work! This is exactly what the Software Universe provides.

Currently, there are many researchers working on the origin of life on Earth and elsewhere, but the problem is that on Earth we are missing the rocks from the very first billion years of the Earth’s history when life first arose, so workers in the field are left to drawing historical inferences in deep time based upon the modern metabolic pathways, RNA, and DNA we still have at hand today, and also upon biochemical simulations in the lab that are based upon those inferences. And even if we do find life on other planets, we will most likely be faced with the same challenge of not being able to figure out how it all happened.

My suggestion would be that everybody is looking just a couple of levels too low in the hierarchy of self-replicating information. Living things are just one form of self-replicating information, and all forms of self-replicating information have many characteristics in common as they battle the second law of thermodynamics in a nonlinear Universe. Currently, there are three forms of self-replicating information on the Earth – the genes, memes, and software, with software rapidly becoming the dominant form of self-replicating information on the planet. However, of the three, the only form of self-replicating information that we have a good history of is software, going all the way back to May of 1941 when Konrad Zuse cranked up his Z3 computer for the very first time. So the best model for the origin of life might be obtained by studying the hodge-podge of precursors, false starts, and failed attempts that led to the origin and early evolution of software, with particular attention paid to the parasitic/symbiotic relationships that allowed software to bootstrap itself into existence.

Yes, there are many other examples of universal Darwinism at work in the Universe, such as the evolution of languages or political movements, but I think that the origin and evolution of software provides a unique example because both programmers and living things are faced with nearly identical problems. A programmer must assemble a huge number of characters into complex patterns of source code to instruct a computer to perform useful operations. Similarly, living things must assemble an even larger number of atoms into complex molecules in order to perform the functions of life. And because the Universe is largely nonlinear in nature, meaning that small changes to initial conditions will most likely result in dramatic, and many times, lethal outcomes for both software and living things, the evolutionary history of living things on Earth and of software have both converged upon very similar solutions to overcome the effects of the second law of thermodynamics in a nonlinear Universe. For example, both living things and software went through a very lengthy prokaryotic architectural period, with little internal structure, to be followed by a eukaryotic architectural period with a great deal of internal structure, which later laid the foundations for forms with a complex multicellular architecture. And both also experienced a dramatic Cambrian explosion in which large multicellular systems arose consisting of huge numbers of somatic cells that relied upon the services of large numbers of cells to be found within a number of discrete organs.

Also, software presents a much clearer distinction between the genotype and phenotype of a system than do other complex systems, like languages or other technologies that also undergo evolutionary processes. The genotype of software is determined by the source code files of programs, while the phenotype of software is expressed by the compiled executable files that run upon a computer and that are generated from the source code files by a transcription process similar to the way genes are transcribed into proteins. Also, like a DNA or RNA sequence, source code provides a very tangible form of self-replicating information that can be studied over historical time without ambiguity. Source code is also not unique, in that many different programs, and even programs written in different languages, can produce executable files with identical phenotypes or behaviors.

Currently, many researchers working on the origin of life and astrobiology are trying to produce computer simulations to help investigate how life could have originated and evolved at its earliest stages. But trying to incorporate all of the relevant elements into a computer simulation is proving to be a very daunting task indeed. Why not simply take advantage of the naturally occurring $10 trillion computer simulation that the IT community has already patiently evolved over the past 70 years and has already run for 2.2 billion seconds? It has been hiding there in plain sight the whole time for anybody with a little bit of daring and flair to explore.

Some might argue that this is an absurd proposal because software currently is a product of the human mind, while biological life is not a product of intelligent design. Granted, biological life is not a product of intelligent design, but neither is the human mind. The human mind and biological life are both the result of natural processes at work over very long periods of time. This objection simply stems from the fact that we are all still, for the most part, self-deluded Cartesian dualists at heart, with seemingly a little “me” running around within our heads that just happens to have the ability to write software and to do other challenging things. But since the human mind is a product of natural processes in action, so is the software that it produces. See:

The Ghost in the Machine the Grand Illusion of Consciousness

Still, I realize that there might be some hesitation to participate in this collaboration because it might be construed by some as an advocacy of intelligent design, but that is hardly the case. The evolution of software over the past 70 years has essentially been a matter of Darwinian inheritance, innovation and natural selection converging upon similar solutions to that of biological life. For example, it took the IT community about 60 years of trial and error to finally stumble upon an architecture similar to that of complex multicellular life that we call SOA – Service Oriented Architecture. The IT community could have easily discovered SOA back in the 1960s if it had adopted a biological approach to software and intelligently designed software architecture to match that of the biosphere. Instead, the world-wide IT architecture we see today essentially evolved on its own because nobody really sat back and designed this very complex world-wide software architecture; it just sort of evolved on its own through small incremental changes brought on by many millions of independently acting programmers through a process of trial and error. When programmers write code, they always take some old existing code first and then modify it slightly by making a few changes. Then they add a few additional new lines of code, and test the modified code to see how far they have come. Usually, the code does not work on the first attempt because of the second law of thermodynamics, so they then try to fix the code and try again. This happens over and over, until the programmer finally has a good snippet of new code. Thus, new code comes into existence through the Darwinian mechanisms of inheritance coupled with innovation and natural selection. Some might object that this coding process of software is actually a form of intelligent design, but that is not the case. It is important to differentiate between intelligent selection and intelligent design. In softwarephysics we extend the concept of natural selection to include all selection processes that are not supernatural in nature, so for me, intelligent selection is just another form of natural selection. This is really nothing new. Predators and prey constantly make “intelligent” decisions about what to pursue and what to evade, even if those “intelligent” decisions are only made with the benefit of a few interconnected neurons or molecules. So in this view, the selection decisions that a programmer makes after each iteration of working on some new code really are a form of natural selection. After all, programmers are just DNA survival machines with minds infected with memes for writing software, and the selection processes that the human mind undergo while writing software are just as natural as the Sun drying out worms on a sidewalk or a cheetah deciding upon which gazelle in a herd to pursue.

For example, when IT professionals slowly evolved our current $10 trillion world-wide IT architecture over the past 2.2 billion seconds, they certainly did not do so with the teleological intent of creating a simulation of the evolution of the biosphere. Instead, like most organisms in the biosphere, these IT professionals were simply trying to survive just one more day in the frantic world of corporate IT. It is hard to convey the daily mayhem and turmoil of corporate IT to outsiders. In 1979, I had been working continuously on geophysical models and simulations in Fortran and Basic for my thesis and for oil companies, ever since taking CS 101 at the University of Illinois back in 1972, but when I made a career change from being an exploration geophysicist at Amoco to become a systems analyst in Amoco’s IT department in 1979, I was in complete shock. When I first hit the floor of Amoco’s IT department on one very scary Monday morning, I suddenly found myself surrounded by countless teams of IT professionals, all running around like the Mad Hatter in Alice in Wonderland. After a couple of terrorizing weeks on this new job, it seemed to me like I was trapped in a frantic computer simulation, like the ones that I had been programming for the past seven years, hopelessly buried in punch card decks and fan-fold listings. But I quickly realized that all IT jobs essentially boiled down to simply pushing buttons. All you had to do was to push the right buttons, in the right sequence, at the right time, and with zero errors. How hard could that be? Well, it turned out to be very difficult indeed, and in response I began to subconsciously work on softwarephysics to try to figure out why this job was so hard, and how I could dig myself out of the mess that I had gotten myself into. After a while, it dawned on me that the fundamental problem was the second law of thermodynamics operating in a nonlinear simulated universe. The second law made it very difficult to push the right buttons in the right sequence and at the right time because there were so many erroneous combinations of button pushes. Writing and maintaining software was like looking for a needle in a huge utility phase space. There just were nearly an infinite number of ways of pushing the buttons “wrong”. The other problem was that we were working in a very nonlinear utility phase space, meaning that pushing just one button incorrectly usually brought everything crashing down. Next, I slowly began to think of pushing the correct buttons in the correct sequence as stringing together the correct atoms into the correct sequence to make molecules in chemical reactions that could do things. I also knew that living things were really great at doing that. Living things apparently overcame the second law of thermodynamics by dumping entropy into heat as they built low entropy complex molecules from high entropy simple molecules and atoms. I then began to think of each line of code that I wrote as a step in a biochemical pathway. The variables were like organic molecules composed of characters or “atoms” and the operators were like chemical reactions between the molecules in the line of code. The logic in several lines of code was the same thing as the logic found in several steps of a biochemical pathway, and a complete function was the equivalent of a full-fledged biochemical pathway in itself. But one nagging question remained - how could I take advantage of these similarities to save myself? That’s a long story, but in 1985 I started working on BSDE– the Bionic Systems Development Environment, which was used at Amoco to “grow” software biologically from an “embryo” by having programmers turn on and off a set of “genes”. The second half of my original softwarephysics posting provides more information on BSDE:


To have some fun with softwarephysics, and to see how it might help with exploring the origin of life, please take a look at the postings down below. To get to any other posting, just use the Blog Archive links in the upper right hand corner of each posting.

A Brief History of Self-Replicating Information

The Driving Forces of Software Evolution

A Proposal For All Practicing Paleontologists

How Software Evolves

Self-Replicating Information

Skip down to the section on SoftwarePaleontology

The Origin of Software the Origin of Life

Programming Clay

Using the Evolution of Software as a Model for Astrobiologists

An IT Perspective of the Cambrian Explosion

Using the Origin of Software as a Model for the Origin of Life

An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer

Software Embryogenesis

Introduction to Softwarephysics

Some Pertinent Observations Already Gleaned From the History of IT
1. It’s all about self-replicating information.
Living things are just one form of self-replicating information that happen to carry along their own hardware with them. The concept of living things is really just an artificial human classification, and that is why it is nearly impossible for us to define. Perhaps, in some sense, the vitalists had it right all along; the mysterious vital force they sought was simply the driving force of self-replicating information seeking to survive. So the definition of life might really just hinge upon an arbitrary level of information density. A quartz crystal forming in a melt, or an ice crystal forming from many mobile water molecules plugging into a lattice, are also primitive forms of self-replicating information, just with a lower information density and a higher entropy than we are accustomed to finding in living things, but it really is just a matter of degree. So do not think in terms of investigating the origin of life, rather, think in terms of investigating the early history of self-replicating information on the Earth. After all, in the end, everything is just made of dead atoms. See:

The Demon of Software

2. Self-replicating information is very opportunistic.
Self-replicating information is very opportunistic and will exapt whatever hardware happens to be available at the time, and will certainly jump ship if something better comes along. For example, software started out on electrical relays and then proceeded to vacuum tubes, discrete transistors, integrated circuits with thousands of transistors, integrated circuits with millions of transistors, integrated circuits with billions of transistors, and will probably jump ship again to optical chips in the next decade or so. Now 4 billion years ago, the only available hardware on the newly formed Earth were organic monomers brought to the Earth by comets and asteroids or generated here by abiotic processes, and rock forming minerals. All rocks in the Earth’s crust are formed from rock-forming silicate minerals that are composed of polymerized silica tetrahedrons. The silica tetrahedrons are made from one atom of silicon and four atoms of oxygen, and are very similar in structure to methane. The silica tetrahedrons have a net charge of -4 so they polymerize into single chains, double chains, sheets, or 3-dimensional frameworks with positive cations of Ca++, Na+, K+, Fe++, or Mg++ interspersed in the crystalline lattice to neutralize the negative charge. So in a sense they are the silicon equivalent of organic molecules. The rock forming minerals are essentially the structural proteins that hold rocks together, and their 3-dimensional structures are key to rock properties, just as the structure of alpha chains and beta sheets are key to the properties of proteins. The minerals in rocks are usually formed at very high temperatures and usually under very high pressures too, so they are very much out of thermodynamic equilibrium at the Earth’s surface or at the bottom of the sea. Granted, there may be hundreds of kinds of rocks, but no matter the rock, they all eventually chemically weather down into mud and sand. The mud is formed from sheet-like polymers of silica tetrahedrons called clay minerals, and the sand comes from quartz grains which are made from very tough frameworks of pure silica tetrahedrons. It’s the H+ ions, mainly from carbonic and other acids, that break down the minerals by working themselves into the crystalline lattices to replace the positive cations. Since the silicates are very much like organic molecules, the odds are that our very distant ancestors were some kind of hybrid of the two. Also, life probably first arose several hundred meters below the Earth’s surface in the pore fluids circulating through rocks near the first spreading centers when plate tectonics was first initiated. These environments would have been safe from the late heavy bombardment 3.8 – 4.1 billion years ago. So make friends with a good geochemist. See:

Programming Clay

3. Self-replicating information easily forms parasitic/symbiotic relationships.
Lynn Margulis’s endosymbiotic theory seems to be universal for all forms of self-replicating information. The genes, memes, and software on Earth are all currently deeply intertwined in very complex parasitic/symbiotic relationships amongst them all. The Earth could certainly not support a population of 7 billion people without them all competing and also working together. Similarly, I vividly remember the early 1990s, when it was predicted that LANs composed of “high-speed” Intel 386 PCs running at a whopping 33 MHz would make IBM’s mainframes obsolete, and indeed, IBM nearly did go bankrupt in those days. However, today we find that IBM mainframes running z/OS, Unix servers, client PCs running Windows or Mac, and smart phones have all formed a heavily interdependent hybridized parasitic/symbiotic relationship, and so has the software running upon them.

4. Convergence plays a major role in the evolution of self-replicating information.
As Daniel Dennett put it, there are only a certain number of “Good Tricks” for living things to discover, and the same goes for all forms of self-replicating information. Over the past 70 years, software architecture has very closely recapitulated the same path through Design Space that living things did billions of years ago on Earth. Software went through a lengthy period of prokaryotic organization, which was followed by a period of eukaryotic organization, which finally led to the foundations of multi-cellular organization. And over the past decade, software has seen a Cambrian explosion, in which large numbers of somatic objects use the services of large numbers of service objects in service organs. See the SoftwarePaleontology section of:


Similarly, in

Crocheting Software

we see that crochet and knitting patterns are precursors to computer software that evolved in parallel branches to software, but were never really in the same line of descent. I imagine on the early Earth there were similar branches of self-replicating information that were never our distant ancestors, but might have been unrelated competitors at the time. They might even still exist today in some isolated environments.

5. Beware of the memes lurking within your mind.
Meme-complexes are very conservative in nature, and have a very great reluctance to adopting new memes from outside that might threaten the very existence of the entire meme-complex. This is especially true of scientific meme-complexes, and rightly so. Scientific meme-complexes must always be on guard to prevent the latest crackpot idea from taking hold. But if you look to the history of science, the downside to all this is that nearly all of the great scientific breakthroughs were delayed by 10 – 50 years, patiently awaiting acceptance by a scientific meme-complex. That is why Thomas Kuhn found that scientific meme-complexes were so very reluctant to adopt paradigm shifts. The rigidity of scientific meme-complexes today holds back scientific progress because it prevents people from working together on difficult problems, like the origin of life.

Some very good books on the subject are:

The Meme Machine (1999) by Susan Blackmore,

Virus of the Mind: The New Science of the Meme (1996) by Richard Brodie

Also see:

How to Use Softwarephysics to Revive Memetics in Academia

6. Adopt a positivistic approach using effective theories.
Softwarephysics adopts a very positivistic view of software in that we do not care about what software “really” is; we only care about how software is observed to behave, and we only attempt to model this behavior with a set of effective theories. Positivism is an enhanced form of empiricism, in which we do not care about how things “really” are; we are only interested with how things are observed to behave. With positivism, physicists only seek out models of reality - not reality itself. Effective theories are an extension of positivism. An effective theory is an approximation of reality that only holds true over a certain restricted range of conditions and only provides for a certain depth of understanding of the problem at hand. For example, Newtonian mechanics works very well for objects moving in weak gravitational fields at less than 10% of the speed of light and which are larger than a very small mote of dust. For things moving at high velocities or in strong gravitational fields we must use relativity theory, and for very small things like atoms we must use quantum mechanics. All of the current theories of physics, such as Newtonian mechanics, classical electrodynamics, thermodynamics, statistical mechanics, the special and general theories of relativity, quantum mechanics, and quantum field theories like QED (quantum electrodynamics) are just effective theories that are based upon models of reality, and all these models are approximations - all these models are fundamentally "wrong", but at the same time, these effective theories make exceedingly good predictions of the behavior of physical systems over the limited ranges in which they apply, and that is all positivism hopes to achieve. Remember, all of chemistry is just an approximation of QED, and QED is also just an approximate effective theory that explains nearly all of the behaviors of electrons, but cannot explain the gravitational attraction between electrons. When all is said and done, biology and chemistry are all just about electrons in various quantum states, and electrons are nearly massless particles made of who knows what. See:

Model-Dependent Realism - A Positivistic Approach to Realism

Comments are welcome at

To see all posts on softwarephysics in reverse order go to:

Steve Johnston

No comments: