Saturday, July 15, 2017

Introduction to Softwarephysics

Softwarephysics is a simulated science for the simulated Software Universe that we are all immersed in. It is an approach to software development, maintenance and support based upon concepts from physics, chemistry, biology, and geology that I have been using on a daily basis for over 35 years as an IT professional. For those of you not in the business, IT is short for Information Technology, commercial computer science. The original purpose of softwarephysics was to explain why IT was so difficult, to suggest possible remedies, and to provide a direction for thought. Since then softwarephysics has taken on a larger scope, as it became apparent that softwarephysics could also assist the physical sciences with some of the Big Problems that they are currently having difficulties with. So if you are an IT professional, general computer user, or simply an individual interested in computer science, physics, chemistry, biology, or geology then softwarephysics might be of interest to you, if not in an entirely serious manner, perhaps at least in an entertaining one.

The Origin of Softwarephysics
From 1975 – 1979, I was an exploration geophysicist exploring for oil, first with Shell, and then with Amoco. In 1979, I made a career change into IT, and spent about 20 years in development. For the past 17 years, I have been in IT operations, supporting middleware on WebSphere, JBoss, Tomcat, and ColdFusion. When I first transitioned into IT from geophysics, I figured that if you could apply physics to geology; why not apply physics to software? So like the exploration team at Amoco that I had just left, consisting of geologists, geophysicists, paleontologists, geochemists, and petrophysicists, I decided to take all the physics, chemistry, biology, and geology that I could muster and throw it at the problem of software. The basic idea was that many concepts in physics, chemistry, biology, and geology suggested to me that the IT community had accidentally created a pretty decent computer simulation of the physical Universe on a grand scale, a Software Universe so to speak, and that I could use this fantastic simulation in reverse, to better understand the behavior of commercial software, by comparing software to how things behaved in the physical Universe. Softwarephysics depicts software as a virtual substance, and relies upon our understanding of the current theories in physics, chemistry, biology, and geology to help us model the nature of software behavior. So in physics we use software to simulate the behavior of the Universe, while in softwarephysics we use the Universe to simulate the behavior of software. Along these lines, we use the Equivalence Conjecture of Softwarephysics as an aid; it allows us to shift back and forth between the Software Universe and the physical Universe, and hopefully to learn something about one by examining the other:

The Equivalence Conjecture of Softwarephysics
Over the past 75 years, through the uncoordinated efforts of over 50 million independently acting programmers to provide the world with a global supply of software, the IT community has accidentally spent more than $10 trillion creating a computer simulation of the physical Universe on a grand scale – the Software Universe.

For more on the origin of softwarephysics please see Some Thoughts on the Origin of Softwarephysics and Its Application Beyond IT.

Logical Positivism and Effective Theories
Many IT professionals have a difficult time with softwarephysics because they think of physics as being limited to the study of real things like electrons and photons, and since software is not “real”, how can you possibly apply concepts from physics and the other sciences to software? To address this issue, softwarephysics draws heavily upon two concepts from physics that have served physics quite well over the past century – the concept of logical positivism and the concept of effective theories. This was not always the case. In the 17th, 18th, and 19th centuries, physicists mistakenly thought that they were actually discovering the fundamental laws of the Universe, which they thought were based upon real tangible things like particles, waves, and fields. Classical Newtonian mechanics (1687), thermodynamics (1850), and classical electrodynamics (1864) did a wonderful job of describing the everyday world at the close of the 19th century, but early in the 20th century it became apparent that the models upon which these very successful theories were based did not work very well for small things like atoms or for objects moving at high velocities or in strong gravitational fields. This provoked a rather profound philosophical crisis within physics at the turn of the century, as physicists worried that perhaps 300 years of work was about to go down the drain. The problem was that classical physicists confused their models of reality with reality itself, and when their classical models began to falter, their confidence in physics began to falter as well. This philosophical crisis was resolved with the adoption of the concepts of logical positivism and some new effective theories in physics. Quantum mechanics (1926) was developed for small things like atoms, the special theory of relativity (1905) was developed for objects moving at high velocities and the general theory of relativity (1915) was developed for objects moving in strong gravitational fields.

Logical positivism, usually abbreviated simply to positivism, is an enhanced form of empiricism, in which we do not care about how things “really” are; we are only interested with how things are observed to behave. With positivism, physicists only seek out models of reality - not reality itself. When we study quantum mechanics, we will find that the concept of reality gets rather murky in physics anyway, so this is not as great a loss as it might at first seem. By concentrating on how things are observed to behave, rather than on what things “really” are, we avoid the conundrum faced by the classical physicists. In retrospect, this idea really goes all the way back to the very foundations of physics. In Newton’s Principia (1687) he outlined Newtonian mechanics and his theory of gravitation, which held that the gravitational force between two objects was proportional to the product of their masses divided by the square of the distance between them. Newton knew that he was going to take some philosophical flack for proposing a mysterious force between objects that could reach out across the vast depths of space with no apparent mechanism, so he took a very positivistic position on the matter with the famous words:

I have not as yet been able to discover the reason for these properties of gravity from phenomena, and I do not feign hypotheses. For whatever is not deduced from the phenomena must be called a hypothesis; and hypotheses, whether metaphysical or physical, or based on occult qualities, or mechanical, have no place in experimental philosophy. In this philosophy particular propositions are inferred from the phenomena, and afterwards rendered general by induction.

Instead, Newton focused on how things were observed to move under the influence of his law of gravitational attraction, without worrying about what gravity “really” was.

The second concept, that of effective theories, is an extension of positivism. An effective theory is an approximation of reality that only holds true over a certain restricted range of conditions and only provides for a certain depth of understanding of the problem at hand. For example, Newtonian mechanics is an effective theory that makes very good predictions for the behavior of objects moving less than 10% of the speed of light and which are bigger than a very small grain of dust. These limits define the effective range over which Newtonian mechanics can be applied to solve problems. For very small things we must use quantum mechanics and for very fast things moving in strong gravitational fields, we must use relativity theory. So all of the current theories of physics, such as Newtonian mechanics, Newtonian gravity, classical electrodynamics, thermodynamics, statistical mechanics, the special and general theories of relativity, quantum mechanics, and the quantum field theories of QED and QCD are effective theories that are based upon models of reality, and all these models are approximations - all these models are fundamentally "wrong", but at the same time, these effective theories make exceedingly good predictions of the behavior of physical systems over the limited ranges in which they apply. That is the goal of softwarephysics – to provide for an effective theory of software behavior that makes useful predictions of software behavior that are applicable to the day-to-day activities of IT professionals. So in softwarephysics, we adopt a very positivistic viewpoint of software; we do not care what software “really is”, we only care about how software is observed to behave and try to model those behaviors with an effective theory of software behavior that only holds true over a certain restricted range of conditions and only provides for a certain depth of understanding of the problem at hand.

GPS satellites provide a very good example of positivism and effective theories at work. There are currently 31 GPS satellites orbiting at an altitude of 12,600 miles above the Earth, and each contains a very accurate atomic clock. The signals from the GPS satellites travel to your GPS unit at the speed of light, so by knowing the travel time of the signals from at least 4 of the GPS satellites, it is possible to determine your position on Earth very accurately. In order to do that, it is very important to have very accurate timing measurements. Newtonian mechanics is used to launch the GPS satellites to an altitude of 12,600 miles and to keep them properly positioned in orbit. Classical electrodynamics is then used to beam the GPS signals back down to Earth to the GPS unit in your car. Quantum mechanics is used to build the transistors on the chips on board the GPS satellites and to understand the quantum tunneling of electrons in the flash memory chips used to store GPS data on the satellites. The special theory of relativity predicts that the onboard atomic clocks on the GPS satellites will run slower and lose about 7.2 microseconds per day due to their high velocities relative to an observer on the Earth. But at the same time, the general theory of relativity also predicts that because the GPS satellites are further from the center of the Earth and in a weaker gravitational field, where spacetime is less deformed than on the surface of the Earth, their atomic clocks also run faster and gain 45.9 microseconds per day due to the weaker gravitational field out there. The net effect is a gain of 38.7 microseconds per day, so the GPS satellite atomic clocks have to be purposefully built to run slow by 38.7 microseconds per day before they are launched, so that they will keep in sync with clocks on the surface of the Earth. If this correction were not made, an error in your position of 100 yards/day would accrue. The end result of the combination of all these fundamentally flawed effective theories is that it is possible to pinpoint your location on Earth to an accuracy of 16 feet or better for as little as $100. But physics has done even better than that with its fundamentally flawed effective theories. By combining the effective theories of special relativity (1905) with quantum mechanics (1926), physicists were able to produce a new effective theory for the behavior of electrons and photons called quantum electrodynamics QED (1948) which was able to predict the gyromagnetic ratio of the electron, a measure of its intrinsic magnetic field, to an accuracy of 11 decimal places. As Richard Feynman has pointed out, this was like predicting the exact distance between New York and Los Angeles accurate to the width of a human hair!

So Newtonian mechanics makes great predictions for the macroscopic behavior of GPS satellites, but it does not work very well for small things like the behavior of individual electrons within transistors, where quantum mechanics is required, or for things moving at high speeds or in strong gravitational fields where relativity theory must be applied. And all three of these effective theories are based upon completely contradictory models. General relativity maintains that spacetime is curved by matter and energy, but that matter and energy are continuous, while quantum mechanics maintains that spacetime is flat, but that matter and energy are quantized into chunks. Newtonian mechanics simply states that space and time are mutually independent dimensions and universal for all, with matter and energy being continuous. The important point is that all effective theories and scientific models are approximations – they are all fundamentally "wrong". But knowing that you are "wrong" gives you a great advantage over people who know that they are "right", because knowing that you are "wrong" allows you to seek improved models of reality. So please consider softwarephysics to simply be an effective theory of software behavior that is based upon models that are fundamentally “wrong”, but at the same time, fundamentally useful for IT professionals. So as you embark upon your study of softwarephysics, please always keep in mind that the models of softwarephysics are just approximations of software behavior, they are not what software “really is”. It is very important not to confuse models of software behavior with software itself, if one wishes to avoid the plight of the 19th century classical physicists.

If you are an IT professional and many of the above concepts are new to you, do not be concerned. This blog on softwarephysics is aimed at a diverse audience, but with IT professionals in mind. All of the above ideas will be covered at great length in the postings in this blog on softwarephysics and in a manner accessible to all IT professionals. Now it turns out that most IT professionals have had some introduction to physics in high school or in introductory college courses, but that presents an additional problem. The problem is that such courses generally only cover classical physics, and leave the student with a very good picture of physics as it stood in 1864! It turns out that the classical physics of Newtonian mechanics, thermodynamics, and classical electromagnetic theory were simply too good to discard and are still quite useful, so they are taught first to beginners and then we run out of time to cover the really interesting physics of the 20th century. Now imagine the problems that the modern world would face if we only taught similarly antiquated courses in astronomy, metallurgy, electrical and mechanical engineering, medicine, economics, biology, or geology that happily left students back in 1864! Since many of the best models for software behavior stem from 20th century physics, we will be covering a great deal of 20th century material in these postings – the special and general theories of relativity, quantum mechanics, quantum field theories, and chaos theory, but I hope that you will find that these additional effective theories are quite interesting on their own, and might even change your worldview of the physical Universe at the same time.

Unintended Consequences for the Scientific Community
As I mentioned at the close of my original posting on SoftwarePhysics, my initial intention for this blog on softwarephysics was to fulfill a promise I made to myself about 25 years ago to approach the IT community with the concept of softwarephysics a second time, following my less than successful attempt to do so in the 1980s, with the hope of helping the IT community to better cope with the daily mayhem of life in IT. However, in laying down the postings for this blog an unintended consequence arose in my mind as I became profoundly aware of the enormity of this vast computer simulation of the physical Universe that the IT community has so graciously provided to the scientific community free of charge and also of the very significant potential scientific value that it provides. One of the nagging problems for many of the observational and experimental sciences is that many times there is only one example readily at hand to study or experiment with, and it is very difficult to do meaningful statistics with a population of N=1.

But the computer simulation of the physical Universe that the Software Universe presents provides another realm for comparison. For example, both biology and astrobiology only have one biosphere on Earth to study and even physics itself has only one Universe with which to engage. Imagine the possibilities if scientists had another Universe readily at hand in which to work! This is exactly what the Software Universe provides. For example, in SoftwareBiology and A Proposal For All Practicing Paleontologists we see that the evolution of software over the past 75 years, or 2.4 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941, has closely followed the same path as life on Earth over the past 4.0 billion years in keeping with Simon Conway Morris’s contention that convergence has played the dominant role in the evolution of life on Earth. In When Toasters Fly, we also see that software has evolved in fits and starts as portrayed by the punctuated equilibrium of Stephen Jay Gould and Niles Eldredge, and in The Adaptationist View of Software Evolution we explore the overwhelming power of natural selection in the evolution of software. In keeping with Peter Ward’s emphasis on mass extinctions dominating the course of evolution throughout geological time, we also see in SoftwareBiology that there have been several dramatic mass extinctions of various forms of software over the past 75 years as well, that have greatly affected the evolutionary history of software, and that between these mass extinctions, software has also tended to evolve through the gradual changes of Hutton’s and Lyell’s uniformitarianism. In Software Symbiogenesis and Self-Replicating Information, we also see the very significant role that parasitic/symbiotic relationships have played in the evolution of software, in keeping with the work of Lynn Margulis and also of Freeman Dyson’s two-stage theory of the origin of life on Earth. In The Origin of Software the Origin of Life, we explore Stuart Kauffman’s ideas on how Boolean nets of autocatalytic chemical reactions might have kick-started the whole thing as an emergent behavior of an early chaotic pre-biotic environment on Earth, and that if Seth Shostak is right, we will never end up talking to carbon-based extraterrestrial aliens, but to alien software instead. In Is the Universe Fine-Tuned for Self-Replicating Information? we explore the thermodynamics of Brandon Carter’s Weak Anthropic Principle (1973), as it relates to the generation of universes in the multiverse that are capable of sustaining intelligent life. Finally, in Programming Clay we revisit Alexander Graham Cairns-Smith’s theory (1966) that Gene 1.0 did not run on nucleic acids, but on clay microcrystal precursors instead.

Similarly for the physical sciences, in Is the Universe a Quantum Computer? we find a correspondence between TCP/IP and John Cramer’s Transactional Interpretation of quantum mechanics. In SoftwarePhysics and Cyberspacetime, we also see that the froth of CPU processes running with a clock speed of 109 Hz on the 10 trillion currently active microprocessors that comprise the Software Universe can be viewed as a slowed down simulation of the spin-foam froth of interacting processes of loop quantum gravity running with a clock speed of 1043 Hz that may comprise the physical Universe. And in Software Chaos, we examine the nonlinear behavior of software and some of its emergent behaviors and follow up in CyberCosmology with the possibility that vast quantities of software running on large nonlinear networks might eventually break out into consciousness in accordance with the work of George Dyson and Daniel Dennett. Finally, in Model-Dependent Realism - A Positivistic Approach to Realism we compare Steven Weinberg’s realism with the model-dependent realism of Stephen Hawking and Leonard Mlodinow and how the two worldviews affect the search for a Final Theory. Finally, in The Software Universe as an Implementation of the Mathematical Universe Hypothesis and An Alternative Model of the Software Universe we at long last explore what software might really be, and discover that the Software Universe might actually be more closely related to the physical Universe than you might think.

The chief advantage of doing fieldwork in the Software Universe is that, unlike most computer simulations of the physical Universe, it is an unintended and accidental simulation, without any of the built-in biases that most computer simulations of the physical Universe suffer. So you will truly be able to do fieldwork in a pristine and naturally occuring simulation, just as IT professionals can do fieldwork in the wild and naturally occuring simulation of software that the living things of the biosphere provide. Secondly, the Software Universe is a huge simulation that is far beyond the budgetary means of any institution or consortium by many orders of magnitude. So if you are an evolutionary biologist, astrobiologist, or paleontologist working on the origin and evolution of life in the Universe, or a physicist or economist working on the emergent behaviors of nonlinear systems and complexity theory, or a neurobiologist working on the emergence of consciousness in neural networks, or even a frustrated string theorist struggling with quantum gravity, it would be well worth your while to pay a friendly call upon the local IT department of a major corporation in your area. Start with a visit to the Command Center for their IT Operations department to get a global view of their IT infrastructure and to see how it might be of assistance to the work in your area of interest. From there you can branch out to the applicable area of IT that will provide the most benefit.

The Impact of Self-Replicating Information Upon the Planet
One of the key findings of softwarephysics is concerned with the magnitude of the impact upon the planet of self-replicating information.

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

Basically, we have seen several waves of self-replicating information dominate the Earth:
1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Note that because the self-replicating autocatalytic metabolic pathways of organic molecules, RNA and DNA have become so heavily intertwined over time that now I simply refer to them as the “genes”. Over the past 4.0 billion years, the surface of the Earth has been totally reworked by three forms of self-replicating information – the genes, memes and software, with software rapidly becoming the dominant form of self-replicating information on the planet. For more on this see:

A Brief History of Self-Replicating Information
Self-Replicating Information
Is Self-Replicating Information Inherently Self-Destructive?
Enablement - the Definitive Characteristic of Living Things
Is the Universe Fine-Tuned for Self-Replicating Information?
How to Use an Understanding of Self-Replicating Information to Avoid War
The Great War That Will Not End
How to Use Softwarephysics to Revive Memetics in Academia

Softwarephysics and the Real World of Human Affairs
Having another universe readily at hand to explore, even a simulated universe like the Software Universe, necessarily has an impact upon one's personal philosophy of life, and allows one to draw certain conclusions about the human condition and what’s it all about, so as you read through the postings in this blog you will stumble across a bit of my own personal philosophy - definitely a working hypothesis still in the works. Along these lines you might be interested in a few postings where I try to apply softwarephysics to the real world of human affairs:

MoneyPhysics – my impression of the 2008 world financial meltdown.

The Fundamental Problem of Everything – if you Google "the fundamental problem of everything", this will be the only hit you get on the entire Internet, which is indicative of the fundamental problem of everything!

What’s It All About? and Genes, Memes and Software – my current working hypothesis on what’s it all about.

How to Use an Understanding of Self-Replicating Information to Avoid War – my current working hypothesis for how the United States can avoid getting bogged down again in continued war in the Middle East.

Hierarchiology and the Phenomenon of Self-Organizing Organizational Collapse - a modern extension of the classic Peter Principle that applies to all hierarchical organizations and introduces the Time Invariant Peter Principle.

The Economics of the Coming Software Singularity, The Enduring Effects of the Obvious Hiding in Plain Sight and The Dawn of Galactic ASI - Artificial Superintelligence - my take on some of the issues that will arise for mankind as software becomes the dominant form of self-replicating information upon the planet over the coming decades.

The Continuing Adventures of Mr. Tompkins in the Software Universe, The Danger of Tyranny in the Age of Software, Cyber Civil Defense, and Oligarchiology and the Rise of Software to Predominance in the 21st Century - my worries that the world might abandon democracy in the 21st century, as software comes to predominance as the dominant form of self-replicating information on the planet.

Making Sense of the Absurdity of the Real World of Human Affairs - how software has aided the expansion of our less desirable tendencies in recent years.

Some Specifics About These Postings
The postings in this blog are a supplemental reading for my course on softwarephysics for IT professionals entitled SoftwarePhysics 101 – The Physics of Cyberspacetime, which was originally designed to be taught as a series of seminars at companies where I was employed. Since softwarephysics essentially covers the simulated physics, chemistry, biology, and geology of an entire simulated universe, the slides necessarily just provide a cursory skeleton upon which to expound. The postings in this blog go into much greater depth. Because each posting builds upon its predecessors, the postings in this blog should be read in reverse order from the oldest to the most recent, beginning with my original posting on SoftwarePhysics. In addition, several universities also now offer courses on Biologically Inspired Computing which cover some of the biological aspects of softwarephysics, and the online content for some of these courses can be found by Googling for "Biologically Inspired Computing" or "Natural Computing". At this point we will finish up with my original plan for this blog on softwarephysics with a purely speculative posting on CyberCosmology that describes the origins of the Software Universe, cyberspacetime, software and where they all may be heading. Since CyberCosmology will be purely speculative in nature, it will not be of much help to you in your IT professional capacities, but I hope that it might be a bit entertaining. Again, if you are new to softwarephysics, you really need to read the previous posts before taking on CyberCosmology. I will probably continue on with some additional brief observations about softwarephysics in the future, but once you have completed CyberCosmology, you can truly consider yourself to be a bona fide softwarephysicist.

For those of you following this blog, the posting dates on the posts may seem to behave in a rather bizarre manner. That is because in order to get the Introduction to Softwarephysics listed as the first post in the context root of I have to perform a few IT tricks. When publishing a new posting, I simply copy the contents of the Introduction to Softwarephysics to a new posting called the New Introduction to Softwarephysics. Then I update the original Introduction to Softwarephysics entry with the title and content of the new posting to be published. I then go back and take “New” out of the title of the New Introduction to Softwarephysics. This way the Introduction to Softwarephysics always appears as the first posting in the context root of The side effect of all this is that the real posting date of posts is the date that appears on the post that you get when clicking on the Newer Post link at the bottom left of the posting webpage.

SoftwarePhysics 101 – The Physics of Cyberspacetime is now available on Microsoft OneDrive.

SoftwarePhysics 101 – The Physics of Cyberspacetime - Original PowerPoint document

Entropy – A spreadsheet referenced in the document

BSDE – A 1989 document describing how to use BSDE - the Bionic Systems Development Environment - to grow applications from genes and embryos within the maternal BSDE software.

Comments are welcome at

To see all posts on softwarephysics in reverse order go to:

Steve Johnston

Tuesday, June 13, 2017

The Rise of Complexity in Living Things and Software

I just finished reading two really great books by Nick Lane, a biochemist at the University College London - Life Ascending: The Ten Great Inventions of Evolution (2010) and The Vital Question: Energy, Evolution, and the Origins of Complex Life (2015). I am certainly not alone on this. Bill Gates has also recommended that we all read the The Vital Question. Now in this posting I would like to focus on Nick Lane's vital question of how living things stumbled upon complexity on this planet and relate his findings to how software converged upon the very same solution through a similar process. Recall that one of the fundamental findings of softwarephysics is that living things and software are both forms of self-replicating information, and that both have converged upon similar solutions to combat the second law of thermodynamics in a highly nonlinear Universe. For biologists, the value of softwarephysics is that software has been evolving about 100 million times faster than living things over the past 76 years, or 2.4 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941, and the evolution of software over that period of time is the only history of a form of self-replicating information that has actually been recorded by human history. In fact, the evolutionary history of software has all occurred within a single human lifetime, and many of those humans are still alive today to testify as to what actually had happened, something that those working on the origin of life on the Earth and its early evolution can only try to imagine. Again, in softwarephysics we define self-replicating information as:

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

The Characteristics of Self-Replicating Information
All forms of self-replicating information have some common characteristics:

1. All self-replicating information evolves over time through the Darwinian processes of innovation and natural selection, which endows self-replicating information with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity.

2. All self-replicating information begins spontaneously as a parasitic mutation that obtains energy, information and sometimes matter from a host.

3. With time, the parasitic self-replicating information takes on a symbiotic relationship with its host.

4. Eventually, the self-replicating information becomes one with its host through the symbiotic integration of the host and the self-replicating information.

5. Ultimately, the self-replicating information replaces its host as the dominant form of self-replicating information.

6. Most hosts are also forms of self-replicating information.

7. All self-replicating information has to be a little bit nasty in order to survive.

8. The defining characteristic of self-replicating information is the ability of self-replicating information to change the boundary conditions of its utility phase space in new and unpredictable ways by means of exapting current functions into new uses that change the size and shape of its particular utility phase space. See Enablement - the Definitive Characteristic of Living Things for more on this last characteristic.

So far we have seen 5 waves of self-replicating information sweep across the Earth, with each wave greatly reworking the surface and near subsurface of the planet as it came to predominance:

1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Software is now rapidly becoming the dominant form of self-replicating information on the planet, and is having a major impact on mankind as it comes to predominance. For more on that see: A Brief History of Self-Replicating Information.

The Strong Connection Between Information Flows and Energy Flows
In biology today there seems to be a disturbing ongoing battle between bioinformatics, the study of information flows in biology, and bioenergetics, the study of energy flows in biology. Because of the great triumphs that bioinformatics has made in the 20th and 21st centuries, and the huge amount of attention it has garnered in the public domain, bioenergetics has, unfortunately, been undeservedly relegated somewhat to a backburner in biology, relative to the esteemed position it once held earlier in the 20th century when people were actively working on things like the Krebs cycle. This is indeed unfortunate because information flows and energy flows in our Universe are intimately connected. I think that much of the confusion arises from the rather ambiguous definition of the term "information" that is in common use today. In The Demon of Software I explained that in softwarephysics we exclusively use Leon Brillouin’s concept of information as a form of negative entropy. In that view, in order to create some useful information, like a chunk of functional software, one must first degrade some high-grade form of energy, like electrical energy, into low-grade heat energy. Thanks to the first law of thermodynamics, no energy is lost in the process of creating useful information, but thanks to the second law of thermodynamics, we must always convert some high-grade form of energy into an equal amount of low-grade heat energy in order to make it all happen. Similarly, using Leon Brillouin’s concept of information, the bioenergetics of a very efficient biochemical metabolic pathway also becomes a form of useful information, like a chunk of functional software. In contrast, electrical engineers and those in bioinformatics, usually think of information in terms of Claude Shannon's view that information is the amount of "surprise" in a message. That is because they are primarily concerned with the DNA sequences of a species and comparing the DNA sequences of proteins between species. For more on that please see: Some More Information About Information. Now Nick Lane is definitely a member of the bioenergetics camp, so he might not be so pleased with softwarephysics, and its seeming obsession with the characteristics of self-replicating information, but that might be because he is most familiar with the people in bioinformatics who primarily use Claude Shannon's view that information is the amount of "surprise" in a message. However, I think that Nick Lane would be quite comfortable with softwarephysics' use of Leon Brillouin’s concept of information as a form of negative entropy, that can only be created by a heat engine that degrades high-grade energy into low-grade heat energy, because in that view, an efficient metabolic pathway that has been honed by natural selection is a form of negative entropy, and is therefore, also a useful form of information too - something that both bioinformatics and bioenergetics can easily share. In that view, bioinformatics and bioenergetics go hand-in-hand and become essentially one.

In IT we also deal with both formulations for the concept of information. We deal with large amounts of data, like customer and product data, that is then processed by software algorithms. The customer and product data are very much like DNA sequences, where Claude Shannon's view of information as the amount of surprise in a message makes sense, but the processing software algorithms are more like bioenergetic metabolic pathways, where Leon Brillouin’s concept of information as a form of negative entropy is more appropriate.

The Great Mystery of Complexity in Biology
In the The Vital Question, Nick Lane addresses the greatest mystery of them all in biology - why there is complex life on the planet and why all of that complex life is based upon exactly the same identical cellular architecture. Now in discussing complex life on the planet, most people are usually talking about things like lions, tigers and bears - "oh my!". But Nick Lane is really talking about the very complex architecture of eukaryotic cells, as opposed to the much simpler architecture of prokaryotic cells, because lions, tigers and bears, and all of the other "higher" forms of life that we are familiar with are simply made of aggregations of eukaryotic cells. Even the simple yeasts that make our breads, and get us drunk, are very complex eukaryotic cells. The troubling thing is that only an expert could tell the difference between a yeast eukaryotic cell and a human eukaryotic cell because they are so similar, while any school child could easily tell the difference between the microscopic images of a prokaryotic bacterial cell and a eukaryotic yeast cell - see Figure 1.

Figure 1 – The prokaryotic cell architecture of the bacteria and archaea is very simple and designed for rapid replication. Prokaryotic cells do not have a nucleus enclosing their DNA. Eukaryotic cells, on the other hand, store their DNA on chromosomes that are isolated in a cellular nucleus. Eukaryotic cells also have a very complex internal structure with a large number of organelles, or subroutine functions, that compartmentalize the functions of life within the eukaryotic cells.

Prokaryotic cells essentially consist of a tough outer cell wall enclosing an inner cell membrane and contain a minimum of internal structure. The cell membrane is composed of phospholipids and proteins. The DNA within prokaryotic cells generally floats freely as a large loop of DNA, and their ribosomes, used to help translate mRNA into proteins, float freely within the entire cell as well. The ribosomes in prokaryotic cells are not attached to membranes, like they are in eukaryotic cells which have membranes called the rough endoplasmic reticulum for that purpose. The chief advantage of prokaryotic cells is their simple design and the ability to thrive and rapidly reproduce even in very challenging environments, like little AK-47s that still manage to work in environments where modern tanks will fail. Eukaryotic cells, on the other hand, are found in the bodies of all complex organisms, from single-celled yeasts to you and me, and they divide up cell functions amongst a collection of organelles (functional subroutines), such as mitochondria, chloroplasts, Golgi bodies, and the endoplasmic reticulum.

Recall that we now know that there actually are three forms of life on this planet, as first described by Carl Woese in 1977 at my old Alma Mater the University of Illinois - the Bacteria, the Archea and the Eucarya. The Bacteria and the Archea both use the simple prokaryotic cell architecture, while the Eucarya use the much more complicated eukaryotic cell architecture. Figure 2 depicts Carl Woese's rewrite of Darwin's famous tree of life, and shows that complex forms of life, like you and me, that are based upon cells using the eukaryotic cell architecture, actually spun off from the archaea and not the bacteria. Now archaea and bacteria look identical under a microscope, and that is the reason why at first we thought they were all just bacteria for hundreds of years. But in the 1970s Carl Woese discovered that the ribosomes used to transcribe mRNA into proteins were different between certain microorganisms that had all been previously lumped together as "bacteria". Carl Woese determined that the lumped together "bacteria" really consisted of two entirely different forms of life - the bacteria and the archaea - see Figure 2. The bacteria and archaea both have cell walls, but use slightly different organic molecules to build them. Some archaea, known as the extremophiles that live in harsh conditions also wrap their DNA around stabilizing histone proteins. Eukaryotes also wrap their DNA around histone proteins to form chromatin and chromosomes - for more on that see: An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer. For that reason, and other biochemical reactions that the archaea and eukaryotes both share, we now think that the eukaryotes split off from the archaea and not the bacteria.

Figure 2 – In 1977 Carl Woese developed a new tree of life consisting of the Bacteria, the Archea and the Eucarya. The Bacteria and Archea use a simple prokaryotic cell architecture, while the Eucarya use the much more complicated eukaryotic cell structure.

The other thing about eukaryotic cells, as opposed to prokaryotic cells, is that eukaryotic cells are HUGE! They are like 15,000 times larger by volume than prokaryotic cells! See Figure 3 for a true-scale comparison of the two. The usual depiction of the differences between prokaryotic cells and eukaryotic cells, like that of Figure 2, remind me very much of the very distorted picture of the Solar System that I grew up with. Unfortunately, most schoolchildren are frequently presented with a highly distorted depiction of our Solar System, in which the planets are much larger than they should be, and the orbits of the planets are much smaller than they should be too, relative to the size of the Sun, and unfortunately, this glaring distortion of the facts unknowingly haunts them for the rest of their lives! In such depictions the size of the Sun and the orbits of the planets, and the sizes of the planets, is highly distorted, as is depicted in Figure 4.

Figure 3 – Not only are eukaryotic cells much more complicated than prokaryotic cells, the are also HUGE!

Figure 4 – Schoolchildren are frequently presented with a highly distorted depiction of our Solar System, in which the planets are much larger than they should be, and the orbits of the planets are much smaller than they should be too.

Figure 5 – Above is an accurate depiction of the relative sizes of the planets compared to the Sun. If the Sun were the size of a standard basketball (9.39 inches or 0.239 meters in diameter), then the Earth would be a little smaller than a peppercorn (0.086 inches in diameter) at a distance of 84.13 feet (25.69 meters) and Jupiter would be at a distance of 437.87 feet (133.73 meters) from the Sun.

The Great Mystery of biology is that there are no examples of intermediate forms between the simple prokaryotic cell architecture and the very large and very complex eukaryotic cell architecture. Darwinian thought is based upon the evolution of life by means of innovation honed by natural selection, and therefore, should lead to a continuous path through Design Space of intermediate forms between the simple prokaryotic cell architecture and the much larger and more complex eukaryotic cell architecture, but none is to be found in nature. Frequently, these intermediate forms are hard to find in the fossil record because many times the evolution from one form to another is very rapid in geological terms, and therefore, very few intermediate forms are left behind in the fossil record. But in recent decades we have indeed found many fossils of intermediate forms in the fossil record. Fortunately, we are not limited solely to the fossil record because many times nature leaves behind a trail of species that are stuck in intermediate forms as they converge upon a similar solution to a given problem.

The classic example of this is the evolution of the camera-like eye. Many of those in opposition to Darwinian thought frequently bring up the complex human camera-like eye as an example of something that could not possibly evolve by means of small incremental changes arising from innovation honed by natural selection. They argue that 1% of a complex camera-like eye is of no value without the other 99% of the complex components. But that is not true. A 1% eye in the form of a light-patch of neurons that can be excited by incoming photons is certainly much better than being totally blind because it allows organisms to sense the shadow of a potential predator and then to get out of the way. Visible photons from the Sun have an energy of a few electron volts, and thus, could easily trigger chemical reactions that also only require a few electron volts of energy within some already existing neurons that initially evolved for an entirely different purpose. Thus, such neurons could easily become exapted into serving as the photoreceptors of a light-patch of an organism. Once a light-patch forms, a camera-like eye will soon follow. For example, a slight depression of the light-patch offers some degree of ability to detect the direction from which the incoming photons are coming, providing some directional information on how to avoid predators or to locate prey. As the depression of the light-patch deepens through small incremental changes, this ability to sense the direction of incoming photons also incrementally improves too. As the depression starts to become a nearly spherical pit with a small opening, it starts to begin to behave like a pinhole camera, forming a very dim blurry image back on the photoreceptors. Similarly, if the original skin over the light-patch begins to thicken into a lens-like shape, it will help to protect the photoreceptors on the back wall of the incipient eye, and it will also help to focus and sharpen the image formed by the incoming photons. Figure 8 shows that we can actually catch various species of organisms in the intermediate stages of a light-patch becoming a camera-like eye.

Figure 6 - The eye of a human and the eye of an octopus are nearly identical in structure, but evolved totally independently of each other. As Daniel Dennett pointed out, there are only a certain number of Good Tricks in Design Space and natural selection will drive different lines of descent towards them.

Figure 7 – Computer simulations reveal how a camera-like eye can easily evolve from a simple light sensitive spot on the skin.

Figure 8 – We can actually see this evolutionary history unfold in the evolution of the camera-like eye by examining modern-day mollusks such as the octopus.

The biggest mystery in biology is that it seems as though the very large and very complex architecture of eukaryotic cells just seems to have suddenly popped into existence all by itself about 2 billion years ago, with no intermediates left behind between the simple prokaryotic cell architecture and the complex eukaryotic cell architecture. We simply now do not see any prokaryotic cells caught in the act of being at an intermediate stage on the way to becoming full-fledged eukaryotic cells. There are some creatures called archezoa that were once thought to be the "missing link" between the prokaryotes and the eukaryotes, but it now turns out that they are really just dumbed-down eukaryotic cells that lost some of the features of eukaryotic cells. But the archezoa do help to demonstrate that some intermediate forms could be viable, so why do we not see any?

Nick Lane maintains that there must be some extremely inhibitive limiting factor that has generally prevented prokaryotic cells from advancing on to become large complicated eukaryotic cells over the last 4.0 billion years, and also that some very rare event must have taken place about 2 billion years ago to breach that inhibitive wall, but never again. Nick Lane thinks that the limiting factor was a lack of free energy. Nick Lane goes on to explain that eukaryotic cells have access to about 5,000 times as much free energy as do prokaryotic cells, thanks to the work of several hundred enslaved mitochondria in each eukaryotic cell. Compared to the simple, but extremely energy-poor prokaryotic cells, eukaryotic cells are the super-rich in biology, with a tremendous abundance of free energy to build outlandish palatial estates, and that is what allowed prokaryotic archaeon cells to evolve into the much more complex eukaryotic cells of today.

Figure 9 – The juxtaposition of extreme poverty next to extreme wealth in Kenya, is a metaphor for the extreme energy poverty of prokaryotic cells, compared to the extreme energy abundance of the eukaryotic cells that have 5,000 times as much free energy to work with. With large amounts of free energy, it is possible to build the palatial estates of the eukaryotic cell architecture. Remember, in thermodynamics energy is the ability to do work, while in economics, money is the ability not to do work.

Recall that in the Endosymbiosis theory of Lynn Margulis, the mitochondria were originally parasitic bacteria that once invaded archaeon prokaryotic cells and took up residence. Certain of those ancient archaeon prokaryotic cells, with their internal bacterial mitochondrial parasites, were then able to survive the parasitic bacterial onslaught, and later, went on to form a strong parasitic/symbiotic relationship with them, like all forms of self-replicating information tend to do. The reason researchers think this is what happened is because mitochondria have their own DNA and that DNA is stored as a loose loop, like bacteria store their DNA. Also, when a eukaryotic cell divides, hundreds of mitochondria first self-replicate, like a bacterial infection is want to do, just before the eukaryotic cell divides and half of the mitochondria are then passed on to each daughter cell, like a bacterial infection does to self-replicate.

Figure 10 shows what the mitochondria do to power eukaryotic cells. Essentially they are friendly little bacterial parasites residing in eukaryotic cells, and the mitochondria contain a great deal of internal membrane surface area. Along these internal membranes is an electron transport chain that acts much like an energy delivery staircase. Willing electrons from the eukaryotic host cell are to be found on the organic molecules generated by the Krebs cycle within each mitochondrion, and those electrons then bounce down the electron transport chain, like a ball bouncing down a staircase. Each time the ball or electron bounces, it pumps an H+ proton to the outside of an internal membrane. Now an H+ proton is simply a hydrogen atom stripped of its electron, and the hydrogen atoms come from the NADH molecules produced by the Krebs cycle in the mitochondria. This causes a net positive charge to accumulate on the outside of the internal membrane and a net negative charge to form along the inside of the internal membrane, like a biological capacitor with the electric field strength of a lightning stroke, and this vast potential difference can be used to do work. For example, a number of H+ protons can drop back through a gate to synthesize ATP from ADP, and that ATP can be later used to fuel a biochemical process. Now it turns out that all living things use this same trick of pumping H+ protons across a membrane to later be used to do some useful biochemical work. It's a great way to store energy for later use because you can pump up small amounts of H+ protons and then later spend a large number of them on an expensive item, like saving up a few dollars from each paycheck so that you can later put down a down payment for a new car. In fact, Nick Lane begins the The Vital Question by explaining how alkaline hydrothermal vents are now our best bet for finding the original cradle of life on this planet. This is done in an excellent, and highly accessible manner, that would make reading The Vital Question well worth it for that alone. Alkaline hydrothermal vents provide a porous matrix that is infused with alkaline pore fluids, that are at a nice toasty temperature suitable for the manufacture of organic molecules, and which are produced by the serpentization of the mineral olivine into the mineral serpentenite. These warm alkaline pore fluids also contain a great deal of dissolved hydrogen H2 gas too. The alkaline hydrothermal vent structures are sitting in acidic seawater containing a great deal of dissolved carbon dioxide CO2. The combination of all of this could allow for the formation of H+ proton gradients to form across inorganic iron sulfide FeS "membranes". That stored energy could then be used to make organic molecules out of H2 and CO2 - for more on that see: An IT Perspective on the Transition From Geochemistry to Biochemistry and Beyond.

Figure 10 – Mitochondria are little parasitic bacteria that at one time invaded some prokaryotic archaeon cells about 2 billion years ago, and went on to form a strong parasitic/symbiotic relationship with their archaeon hosts. Mitochondria have their own genes stored on bacterial DNA in a large loop, just like all other bacteria. Each eukaryotic cell contains several hundred mitochondria, which self-replicate before the eukaryotic cell divides. Half of the mitochondria go into each daughter cell after a division of the eukaryotic cell. The eukaryotic host cell provides the mitochondria with a source of food, and the mitochondria metabolize that food using the Krebs cycle and an electron transport chain to pump H+ protons uphill to the outside of their internal membranes. As the H+ protons fall back down they release stored energy to turn ADP into ATP for later use as a fuel.

Living Together Is Never Easy
Nick Lane then goes on to explain that living together is never easy. Sure, these early prokaryotic archaeon cells may have initially learned to live with their new energy-rich parasitic bacterial symbiotes, like an old man taking on a new trophy-wife, but there are always problems in cohabitating, like a number of new and unwelcome freeloading brother-in-laws taking up residence too. For example, bacteria carry with them parasitic segments of DNA called "mobile group II self-splicing introns". These are segments of parasitic DNA that are just trying to self-replicate, like all other forms of self-replicating information. These bacterial introns snip themselves out of the mRNA that is copied from the bacterial DNA, and then form an active complex of reverse transcriptase that reinserts the intron DNA back into the bacterial DNA loop between genes to complete the self-replication process. These bacterial introns are rather mild parasites that just slow down the replication time for the main bacterial DNA loop, and also waste a bit of energy and material as they self-replicate. A typical bacterial cell with 4,000 genes might have about 30 of these introns, so over time, the bacteria just managed to learn to live with these pesky introns some how. The bacteria that could not keep their introns under control simply died out, and so the bacteria infected with tons of parasitic introns were simply weeded out because they could not compete with the more healthy bacteria.

Now for the eukaryotes, the situation is completely different. Eukaryotes have tens of thousands of introns buried in their 20,000+ genes, and these introns are located right in the middle of the eukaryotic genes!

Figure 11 – Eukaryotic genes consist of a number of stretches of DNA, called exons, that code for protein synthesis, and a number of stretches of DNA, called introns, that do not code for protein synthesis. Unlike the bacteria, eukaryotic introns are not between the genes, they are right in the middle of the genes, so they must be spliced out of the transcribed mRNA within the eukaryotic nuclear membrane by molecular machines called spliceosomes before they exit a nuclear poor and become translated by ribosomes into a protein.

These introns must be spliced out of the transcribed mRNA by cellular machines called spliceosomes before the mRNA, transcribed from the genetic DNA, can be translated into a sequence of amino acids to form a protein. If the eukaryotic introns were not spliced out of the mRNA prior to translation by ribosomes, incorrect polypeptide chains of amino acids would form, creating proteins that simply would not work. This is a messy, but necessary process. Now the strange thing is that we share hundreds of genes with other forms of life, like trees, because we both evolved from some common ancestor, and we both still need some proteins that essentially perform the same biochemical functions. The weird thing is that these shared genes all have their introns in exactly the same location within these shared genes. This means that the parasitic introns must have been introduced into the DNA of the eukaryotic cell architecture very early, and that is why trees and humans have the introns located in the same spots on shared genes. The theory goes that when the parasitic mitochondria first took up residence in prokaryotic archaeon cells, like all parasites, they tried to take as much advantage of their hosts as possible, without killing the hosts outright. Now copying the several million base pairs in the 5,000 genes of a bacterial cell is the limiting time factor in self-replicating bacteria and requires at least 20 minutes to do so. Consequently, the mitochondrial invaders began to move most of their genes to the DNA loop of their hosts, and let their energy-rich archaeon hosts produce the proteins associated with those transplanted genes for them. However, the mitochondrial invaders wisely kept all of the genes necessary for metabolism locally on their own mitochondrial DNA loop because that allowed them to quickly get their hands on the metabolic proteins they needed without putting in a back order back on the main host DNA loop. After all, their energy-rich archaeon hosts now had several hundred mitochondria constantly pumping out the necessary ATP to make proteins, so why not let their hosts provide most of the proteins that were not essential to metabolism. This was the beginning of the compartmentalization of function within the hosts, and was the beginning of a division of labor that produced a symbiotic advantage for both the archaeon hosts and their mitochondrial invaders.

But there was one problem with transplanting the mitochondrial genes to their hosts' DNA loop. The parasitic bacterial introns from the mitochondria tagged along as well, and those transplanted parasitic bacterial introns could now run wild because their energy-rich hosts could now afford the luxury of supporting tons of parasitic DNA, like a number of freeloading brother-in-laws that came along with your latest trophy wife - at least up to a point. Indeed, probably most archaeon hosts died from the onslaught of tons of parasitic mitochondrial introns clogging up their critical genes and making them produce nonfunctional proteins. Like I said, it is always hard to live with somebody. But some archaeon hosts must have managed to come to some kind of living arrangement with their new mitochondrial roommates that allowed the both of them to live together in a love-hate relationship that worked.

Modern eukaryotes have a distinctive nuclear membrane surrounding their DNA, and the spliceosomes that splice out introns from mRNA work inside of this nuclear membrane to splice out the nasty introns within the mRNA. So initially, when a gene composed of DNA with embedded introns is transcribed to mRNA, everything is transcribed, including the introns, producing a strand of mRNA with "good" exon segments and "bad" intron segments. This happens all within the protection of the nuclear membrane that keeps the sexually turned on ribosomes, that are just dying to translate mRNA into proteins, at bay, like an elderly prom chaperone keeping things from getting out of hand. The spliceosomes then go to work within the nuclear membrane to splice out the parasitic introns, forming a new strand of mRNA that just contains the "good" exons. This highly edited strand of DNA then passes through a pore in the nuclear membrane out to the ribosomes patiently waiting outside - see Figure 12.

The nuclear membrane of modern eukaryotes provides the clue as to what happened. When the initial bacterial mitochondrial parasites would die inside of an archaeon host, they would simply dump all of their DNA into the interior of the archaeon host, and this allowed the parasitic bacterial introns in the dumped DNA to easily splice themselves at random points into the archaeon host DNA loop, and they frequently did so right in the middle of an archaeon host gene. Most times, that simply killed the archaeon host and all of its parasitic mitochondrial bacteria. Now the easiest way to prevent that from happening would be to simply put a membrane barrier around the archaeon host DNA loop to protect if from all of the dumped mitochondrial introns, and here is an explanation of how that could happen. It turns out that archaea and bacteria use different lipids to form their membranes, and although we know that the eukaryotes split off from the archaea, the eukaryotes strangely use bacterial lipids in their membranes instead of archaeon lipids. So the eukaryotes had to have transitioned from archaeon lipids to bacterial lipids at some point in time. The theory is that the genes for building bacterial lipids were originally on the bacterial mitochondrial invaders, but were later transplanted to the host archaeon DNA loops at some point. Once on the host archaeon DNA loop those genes would then start to create bacterial lipids with no place to go. Instead, the generated lipids would simply form lipid "bags" near the host archaeon DNA loop. Those bacterial lipid "bags" would then tend to flatten, like empty plastic grocery bags, and then surround the host archaeon DNA loop. These flattened "bags" of bacterial lipids then evolved to produce the distinctive double-membrane structure of the nuclear membrane.

Figure 12 – The eukaryotic nuclear membrane is a double-membrane consisting of an inner and outer membrane separated by a perinuclear space. The nuclear membrane contains nuclear pores that allow edited mRNA to pass out, but prevents ribosomes from entering and translating unedited mRNA containing introns into nonfunctional proteins. This double membrane is just the remnant of flattened "bags" of bacterial lipids that shielded the central archaeon DNA loop from onslaught by parasitic mitochondrial DNA introns.

Nick Lane then goes on to explain how the host archaeon prokaryotic cells, trying to live with their parasitic mitochondrial bacterial roommates, and their unwelcome bacterial introns, also led to the exclusive development of two sexes for eukaryotic-based life, and the use of sexual reproduction between those two sexes as well. It seems that the overwhelming benefit of having mitochondrial parasites, generating huge amounts of energy for the host archaeon prokaryotic cells, was just too much to resist, and the host archaeon prokaryotic cells went on to extremes to accommodate them, like also taking the host archaeon DNA loop that was initially wrapped around histone proteins to protect the DNA from the harsh environments that the extremophiles loved, and transforming that DNA into chromatin and chromosomes - for more on that see An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer.

Nick Lane's key point is that the overwhelming amount of free energy that the parasitic mitochondrial bacteria brought to their original prokaryotic archaeon hosts was well worth the effort of learning to cope with a chronic infection that has lasted for 2 billion years. Yes, in some sense the mitochondrial parasites were indeed a collection of pain-in-the-butt pests, but putting up with them was well worth the effort because they provided their hosts with 5,000 times as much free energy as they originally had. That much energy allowed their hosts to put up with a lot of grief, and in the process of leaning to cope with these parasites, the archaeon hosts went on to build all of the complex structures of the eukaryotic cell. This is a compelling argument, but it is very difficult to reconstruct a series of events that happened almost 2 billion years ago. Perhaps some softwarephysics and the evolution of software over the past 76 years, or 2.4 billion seconds, could be of assistance. During that long period of software evolution, did software also have to overcome a similar extremely inhibitive limiting factor that at first prevented the rise of complex software? Well, yes it did.

Using the Evolution of Software as a Guide
On the Earth we have seen life go through three major architectural advances:

1. The origin of life about 4 billion years ago, probably in the alkaline hydrothermal vents of the early Earth, producing the prokaryotic cell architecture.
2. The rise of the complex eukaryotic cell architecture about 2 billion years ago.
3. The rise of multicellular organisms consisting of millions, or billions, of eukaryotic cells all working together in the Ediacaran about 635 million years ago.

As we have seen, the most difficult thing to explain in the long history of life on this planet is not so much the origin of life itself, but the origin of the very complex architecture of the eukaryotic cell. This is where the evolutionary history of software on this planet can be of some help.

The evolutionary history of software on the Earth has converged upon a very similar historical path through Design Space because software also had to battle with the second law of thermodynamics in a highly nonlinear Universe - see The Fundamental Problem of Software for more on that. Software progressed through these similar architectures:

1. The origin of simple unstructured prokaryotic software on Konrad Zuse's Z3 computer in May of 1941 - 2.4 billion seconds ago.
2. The rise of structured eukaryotic software in 1972 - 1.4 billion seconds ago.
3. The rise of object-oriented software (software using multicellular organization) in 1995 - 694 million seconds ago

For more details on the above evolutionary history of software see the SoftwarePaleontology section of SoftwareBiology. From the above series of events, we can easily see that there was indeed a very long period of time, spanning at least one billion seconds, between 1941 and 1972 when only simple unstructured prokaryotic software was to be found on the Earth. Then early in the 1970s, highly structured eukaryotic software appeared and became the dominant form of software. Even today, the highly structured eukaryotic architecture of the early 1970s can still be seen in the modern object-oriented architecture of software.

Now suppose some alien AI software landed on this planet today, and found a planet nearly totally dominated by software, with some carbon-based life forms still hanging around that had not yet gotten the word, but with no written history of how the software came to be. How would they piece that evolutionary history together? The first thing they would notice is that nearly all of the software used multicellular organization based upon object-oriented programming languages. Some digging through old documents would reveal that object-oriented programming had actually been around since 1962, but that it did not at first catch on. In the late 1980s, the use of the very first significant object-oriented programming language, known as C++, started to appear in corporate IT, but object-oriented programming really did not become significant in IT until 1995 when both Java and the Internet Revolution arrived at the same time. The key idea in object-oriented programming is naturally the concept of an object. An object is simply a cell. Object-oriented languages use the concept of a Class, which is a set of instructions for building an object (cell) of a particular cell type in the memory of a computer. Depending upon whom you cite, there are several hundred different cell types in the human body, but in IT we generally use many thousands of cell types or Classes in commercial software. For a brief overview of these concepts go to the webpage below and follow the links by clicking on them.

Lesson: Object-Oriented Programming Concepts

A Class defines the data that an object stores in memory and also the methods that operate upon the object data. Remember, an object is simply a cell. Methods are like biochemical pathways that consist of many steps or lines of code. A public method is a biochemical pathway that can be invoked by sending a message to a particular object, like using a ligand molecule secreted from one object to bind to the membrane receptors on another object. This binding of a ligand to a public method of an object can then trigger a cascade of private internal methods within an object or cell.

Figure 13 – A Class contains the instructions for building an object in the memory of a computer and basically defines the cell type of an object. The Class defines the data that an object stores in memory and also the methods that can operate upon the object data.

Figure 14 – Above is an example of a Bicycle object. The Bicycle object has three data elements - speed in mph, cadence in rpm, and a gear number. These data elements define the state of a Bicycle object. The Bicycle object also has three methods – changeGears, applyBrakes, and changeCadence that can be used to change the values of the Bicycle object’s internal data elements. Notice that the code in the object methods is highly structured and uses code indentation to clarify the logic.

Figure 15 – Above is some very simple Java code for a Bicycle Class. Real Class files have many data elements and methods and are usually hundreds of lines of code in length.

Figure 16 – Many different objects can be created from a single Class just as many cells can be created from a single cell type. The above List objects are created by instantiating the List Class three times and each List object contains a unique list of numbers. The individual List objects have public methods to insert or remove numbers from the objects and also an internal sort method that could be called whenever the public insert or remove methods are called. The internal sort method automatically sorts the numbers in the List object whenever a number is added or removed from the object.

Figure 17 – Objects communicate with each other by sending messages. Really one object calls the exposed public methods of another object and passes some data to the object it calls, like one cell secreting a ligand molecule that then plugs into a membrane receptor on another cell.

Figure 18 – In a growing embryo the cells communicate with each other by sending out ligand molecules called morphogens, or paracrine factors, that bind to the membrane receptors on other cells.

Figure 19 – Calling a public method of an object can initiate the execution of a cascade of private internal methods within the object. Similarly, when a paracrine factor molecule plugs into a receptor on the surface of a cell, it can initiate a cascade of internal biochemical pathways. In the above figure an Ag protein plugs into a BCR receptor and initiates a cascade of biochemical pathways or methods within a cell.

When a high-volume corporate website, consisting of many millions of lines of code running on hundreds of servers, starts up and begins taking traffic, billions of objects (cells) begin to be instantiated in the memory of the servers in a manner of minutes and then begin to exchange messages with each other in order to perform the functions of the website. Essentially, when the website boots up, it quickly grows to a mature adult through a period of very rapid embryonic growth and differentiation, as billions of objects are created and differentiated to form the tissues of the website organism. These objects then begin exchanging messages with each other by calling public methods on other objects to invoke cascades of private internal methods which are then executed within the called objects - for more on that see Software Embryogenesis.

In addition to finding lots of object-oriented software using millions or billions of intercommunicating objects, the aliens would also find some very old living fossils of huge single-celled Cobol programs still running on mainframe computers. The first Cobol program ran on December 6, 1960 on an RCA 501 computer, the first computer to use all transistors and no vacuum tubes. The RCA 501 came with 16K (131,072 bits) - 260K (2,129,920 bits) of magnetic core memory. The United States Air Force purchased an RCA 501 in 1959 for $121,698 - $1,022,614 in 2017 dollars. I just checked, and you can now buy a laptop online with 4 GB of memory for $170. That 4 GB comes to 34.36 billion bits of computer memory. The Cobol programs written in the 1960s during the Unstructured Period used simple unstructured prokaryotic code because of the severe computer memory limitations of the computers of the day, but later in the 1970s and 1980s Cobol programs grew to become a single HUGE object, or cell, like a single-celled Paramecium, that allocated tons of memory up front and then processed the data read into the allocated memory with methods that are called subroutines.

Figure 20 – Cobol programs originally consisted of a few hundred lines of unstructured prokaryotic code, due to the severe memory limitations of the computers of the day, but in the 1970s and 1980s they grew to HUGE sizes of tens of thousands of lines of code using structured eukaryotic code, like a HUGE single-celled Paramecium, because cheap semiconductor memory chips dramatically increased the amount of memory computers could access.

Below is a code snippet from a fossil Cobol program listed in a book published in 1975. Notice the structured programming use of indented code and calls to subroutines with PERFORM statements.

      CLOSE FILE-1, FILE-2.
            IF ACCT-NO OF REC-1 > ACCT-NO OF REC-2
                  DISPLAY REC-2, 'NO MATCHING ACCT-NO'
                  PERORM READ-FILE-1-RTN

Figure 21 – A fossil Cobol program listed in a book published in 1975. Notice the structured programming use of indented code and calls to subroutines with PERFORM statements.

The aliens would also find a small amount of unstructured software, primarily in Unix shell scripts, but even that software would be much more structured than the code from before 1972. With a little more digging, the aliens would also find some truly ancient fossilized code. For example, below is a code snippet from a fossil Fortran program, listed in a book published in 1969, showing what ancient unstructured prokaryotic software really looked like. It has no internal structure and notice the use of GOTO statements to skip around in the code. Later this would become known as the infamous “spaghetti code” of the Unstructured Period that was such a joy to support.

30 DO 50 I=1,NPTS
31 IF (MODE) 32, 37, 39
32 IF (Y(I)) 35, 37, 33
33 WEIGHT(I) = 1. / Y(I)
      GO TO 41
35 WEIGHT(I) = 1. / (-1*Y(I))
37 WEIGHT(I) = 1.
      GO TO 41
39 WEIGHT(I) = 1. / SIGMA(I)**2
      YMEAN = WEIGHT(I) * FCTN(X, I, J, M)
      DO 44 J = 1, NTERMS

Figure 22 – A fossil Fortran program, listed in a book published in 1969, showing what ancient unstructured prokaryotic software looked like. It has no internal structure and notice the use of GOTO statements to skip around in the code. Later this would become known as the infamous “spaghetti code” of the Unstructured Period that was such a joy to support.

The Extremely Inhibitive Limiting Factor of Software Complexity
Nick Lane argues that the extremely inhibitive limiting factor of biological complexity, that kept living things stuck at the level of simple prokaryotic cells for 2 billion years, was the amount of available free energy. Once the mitochondrial invaders came along and boosted the amount of available free energy available to a cell by a factor of 5,000, the very large and very complicated eukaryotic cell architecture became possible. But what was the extremely inhibitive limiting factor of software complexity that limited software to a very simple unstructured prokaryotic architecture for a span of more than one billion seconds from 1941 to 1972? Well in IT we all know that the two limiting factors for software are always processing speed and computer memory, and I would argue that the extremely inhibitive limiting factor of software complexity was simply the amount of available free computer memory.

Software is currently still being exclusively written by human beings, but that will all change sometime in the next 50 years when AI software will also begin to write software, and software will finally be able to self-replicate all on its own. In the meantime, let’s get back to the original rise of complex software. All computers have a CPU that can execute a fundamental set of primitive operations that are called its instruction set. The computer’s instruction set is formed by stringing together a large number of logic gates that are now composed of transistor switches. For example, all computers have a dozen or so registers that are like little storage bins for temporarily storing data that is being operated upon. A typical primitive operation might be taking the binary number stored in one register, adding it to the binary number in another register, and putting the final result into a third register. Since computers can only perform operations within their instruction set, computer programs written in high-level languages like C, C++, Fortran, Cobol or Visual Basic that can be read by a human programmer, must first be compiled, or translated, into a file that consists of the “1s” and “0s” that define the operations to be performed by the computer in terms of its instruction set. This compilation, or translation process, is accomplished by feeding another compiled program, called a compiler, with the source code of the program to be translated. The output of the compiler program is called the compiled version of the program, and is an executable file on disk that can be directly loaded into the memory of a computer and run. Computers also now have memory chips that can store these compiled programs and the data that the compiled programs process. For example, when you run a compiled program, by double-clicking on its icon on your desktop, it is read from disk into the memory of your computer, and it then begins executing the primitive operations of the computer’s instruction set as defined by the compiled program.

Figure 23 – A ribosome behaves very much like a compiler. It reads the mRNA source code and translates it into an executable polypeptide chain of amino acids that then go on to fold up into a 3-D protein molecule that can directly execute a biochemical function, much like a compiled executable file that can be directly loaded into the memory of a computer to do things.

Below is the source code for a simple structured program, with a single Average function, that computes the average of several numbers that are entered via the command line of a computer. Please note that modern applications now consist of many thousands to many millions of lines of structured code. The simple example below is just for the benefit of our non-IT readers to give them a sense of what is being discussed when I describe the compilation of source code into executable files that can be loaded into the memory of a computer and run.

Figure 24 – Source code for a C program that calculates an average of several numbers entered at the keyboard.

The problem is that the amount of free computer memory determines the maximum size of a program because the executable file for the program has to fit into the available free memory of the computer at run time. Generally, the more lines of source code you have in your program, the larger will be its compiled executable file, so the amount of available free memory determines the maximum size of your program. Now in the 1970s people did come up with computer operating systems that could use virtual memory to remove this memory constraint, but that happened after structured eukaryotic software had appeared, so we can overlook that bit of IT history. Now in the 1950s and 1960s during the prokaryotic Unstructured Period, computer memory was very expensive and very limited. Prior to 1955 computers, like the UNIVAC I that first appeared in 1951, were using mercury delay lines that consisted of a tube of mercury that was about 3 inches long. Each mercury delay line could store about 18 bits of computer memory as sound waves that were continuously refreshed by quartz piezoelectric transducers on each end of the tube. Mercury delay lines were huge and very expensive per bit so computers like the UNIVAC I only had a memory of 12 K (98,304 bits).

Figure 25 – Prior to 1955, huge mercury delay lines built from tubes of mercury that were about 3 inches long were used to store bits of computer memory. A single mercury delay line could store about 18 bits of computer memory as a series of sound waves that were continuously refreshed by quartz piezoelectric transducers at each end of the tube.

In 1955 magnetic core memory came along, and used tiny magnetic rings called "cores" to store bits. Four little wires had to be threaded by hand through each little core in order to store a single bit, so although magnetic core memory was a lot cheaper and smaller than mercury delay lines, it was still very expensive and took up lots of space.

Figure 26 – Magnetic core memory arrived in 1955 and used a little ring of magnetic material, known as a core, to store a bit. Each little core had to be threaded by hand with 4 wires to store a single bit.

Figure 27 – Magnetic core memory was a big improvement over mercury delay lines, but it was still hugely expensive and took up a great deal of space within a computer.

Because of the limited amount of free computer memory during the 1950s and 1960s, computers simply did not have enough free computer memory to allow people to write very large programs, so programs were usually just a few hundred lines of code each. Now you really cannot do much logic in a few hundred lines of code, so IT people would string together several small programs into a batch run. Input-Output tapes were used between each small program in the batch run. The first small program would run and write its results to one or more output tapes. The next program in the batch run would then read those tapes and do some more processing, and write its results to one or more output tapes too. This continued on until the very last program in the batch run wrote out its final output tapes. For more on that see: An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer. Now when you are writing very small programs, because you are severely limited by the available amount of free memory, there really is no need to write structured code because you are dealing with a very small amount of processing logic. So during the Unstructured Period of the 1950s and 1960s, IT professionals simply did not bother with breaking up software logic into functions or subroutines. So each little program in a batch stream was like a small single prokaryotic cell, with no internal structure.

But in the early 1970s inexpensive semiconductor memory chips came along that made computer memory small and cheap. These memory chips were the equivalent of software mitochondria because they removed the extremely inhibitive limiting factor of software complexity. Suddenly, IT now had large amounts of computer memory that allowed IT people to write huge programs. But that presented a problem. It was found that the processing logic became too contorted and impossible to maintain when simple prokaryotic unstructured programs were scaled up to programs with 50,000 lines of unstructured code. Even the programmer who wrote the original code could not make sense of it a few months later, and this was much worse for new programmers who came along later to maintain the code. Since commercial software can easily live for 10 years or more, that was a real problem. To alleviate this problem, Dahl, Dijkstra, and Hoare published Structured Programming in 1972, in which they suggested that computer programs should have complex internal structure with no GOTO statements, lots of subroutines, indented code, and many comment statements. During the Structured Period that soon followed, these structured programming techniques were adopted by the IT community, and the GOTO statements were replaced by subroutines, also known as functions(), and indented code with lots of internal structure, like the eukaryotic structure of modern cells that appeared about 2 billion years ago.

Figure 28 – Finally in the early 1970s inexpensive semiconductor memory chips came along that made computer memory small and cheap. These memory chips were the equivalent of software mitochondria because they removed the extremely inhibitive limiting factor of software complexity.

Today, we now almost exclusively use object-oriented programming techniques that allow for the multicellular organization of software via objects, but the code for the methods that operate on those objects, still use the structured programming techniques that first appeared in the early 1970s.

Now many tend to dismiss the biological findings of softwarephysics because software currently is a product of the human mind, while biological life is not a product of intelligent design. Granted, biological life is not a product of intelligent design, but neither is the human mind. The human mind and biological life are both the result of natural processes at work over very long periods of time. This objection simply stems from the fact that we are all still, for the most part, self-deluded Cartesian dualists at heart, with seemingly a little “Me” running around within our heads that just happens to have the ability to write software and to do other challenging things. Thus, most human beings do not think of themselves as part of the natural world. Instead, they think of themselves, and others, as immaterial spirits temporarily haunting a body, and when that body dies the immaterial spirit lives on. In this view, human beings are not part of the natural world. Instead, they are part of the supernatural. But in softwarephysics we maintain that the human mind is a product of natural processes in action, and so is the software that it produces. For more on that see The Ghost in the Machine the Grand Illusion of Consciousness.

Still, I realize that there might be some hesitation to pursue this line of thought because it might be construed by some as an advocacy of intelligent design, but that is hardly the case. The evolution of software over the past 76 years has essentially been a matter of Darwinian inheritance, innovation and natural selection converging upon similar solutions to that of biological life. For example, it took the IT community about 60 years of trial and error to finally stumble upon an architecture similar to that of complex multicellular life that we call SOA – Service Oriented Architecture. The IT community could have easily discovered SOA back in the 1960s if it had adopted a biological approach to software and intelligently designed software architecture to match that of the biosphere. Instead, the worldwide IT architecture we see today essentially evolved on its own because nobody really sat back and designed this very complex worldwide software architecture; it just sort of evolved on its own through small incremental changes brought on by many millions of independently acting programmers through a process of trial and error. When programmers write code, they always take some old existing code first and then modify it slightly by making a few changes. Then they add a few additional new lines of code, and test the modified code to see how far they have come. Usually, the code does not work on the first attempt because of the second law of thermodynamics, so they then try to fix the code and try again. This happens over and over, until the programmer finally has a good snippet of new code. Thus, new code comes into existence through the Darwinian mechanisms of inheritance coupled with innovation and natural selection - for more on that see How Software Evolves. Some might object that this coding process of software is actually a form of intelligent design, but that is not the case. It is important to differentiate between intelligent selection and intelligent design. In softwarephysics we extend the concept of natural selection to include all selection processes that are not supernatural in nature, so for me, intelligent selection is just another form of natural selection. This is really nothing new. Predators and prey constantly make “intelligent” decisions about what to pursue and what to evade, even if those “intelligent” decisions are only made with the benefit of a few interconnected neurons or molecules. So in this view, the selection decisions that a programmer makes after each iteration of working on some new code really are a form of natural selection. After all, programmers are just DNA survival machines with minds infected with memes for writing software, and the selection processes that the human mind undergo while writing software are just as natural as the Sun drying out worms on a sidewalk or a cheetah deciding upon which gazelle in a herd to pursue.

For example, when IT professionals slowly evolved our current $10 trillion worldwide IT architecture over the past 2.4 billion seconds, they certainly did not do so with the teleological intent of creating a simulation of the evolution of the biosphere. Instead, like most organisms in the biosphere, these IT professionals were simply trying to survive just one more day in the frantic world of corporate IT. It is hard to convey the daily mayhem and turmoil of corporate IT to outsiders. When I first hit the floor of Amoco’s IT department back in 1979, I was in total shock, but I quickly realized that all IT jobs essentially boiled down to simply pushing buttons. All you had to do was to push the right buttons, in the right sequence, at the right time, and with zero errors. How hard could that be? Well, it turned out to be very difficult indeed, and in response I began to subconsciously work on softwarephysics to try to figure out why this job was so hard, and how I could dig myself out of the mess that I had gotten myself into. After a while, it dawned on me that the fundamental problem was the second law of thermodynamics operating in a nonlinear simulated universe. The second law made it very difficult to push the right buttons in the right sequence and at the right time because there were so many erroneous combinations of button pushes. Writing and maintaining software was like looking for a needle in a huge utility phase space. There just were nearly an infinite number of ways of pushing the buttons “wrong”. The other problem was that we were working in a very nonlinear utility phase space, meaning that pushing just one button incorrectly usually brought everything crashing down. Next, I slowly began to think of pushing the correct buttons in the correct sequence as stringing together the correct atoms into the correct sequence to make molecules in chemical reactions that could do things. I also knew that living things were really great at doing that. Living things apparently overcame the second law of thermodynamics by dumping entropy into heat as they built low entropy complex molecules from high entropy simple molecules and atoms. I then began to think of each line of code that I wrote as a step in a biochemical pathway. The variables were like organic molecules composed of characters or “atoms” and the operators were like chemical reactions between the molecules in the line of code. The logic in several lines of code was the same thing as the logic found in several steps of a biochemical pathway, and a complete function was the equivalent of a full-fledged biochemical pathway in itself. For more on that see Some Thoughts on the Origin of Softwarephysics and Its Application Beyond IT and SoftwareChemistry.

So based upon the above analysis, I have a high level of confidence that Nick Lane has truly solved the greatest mystery in biology, and that is indeed quite an accomplishment!

Comments are welcome at

To see all posts on softwarephysics in reverse order go to:

Steve Johnston