Wednesday, November 22, 2017

Did Carbon-Based Life on Earth Really Have a LUCA - a Last Universal Common Ancestor?

The idea that all the current living things found on the Earth descended from one single cell long, long, ago runs deep in biology, going all the way back to Darwin himself in On the Origin of Species (1859), which had one single diagram in the whole volume - see Figure 1 down below. But is that really the case? In this posting, I would like to explore the possibility that it is not.

Figure 1 – Darwin's On the Origin of Species (1859) had one single figure, displayed above, that describes the tree of life descending from one single cell, later to be known as the LUCA - the Last Universal Common Ancestor.

The Phylogenic Tree of Life
Before proceeding further, recall that we now know that there actually are three forms of life on this planet, as first described by Carl Woese in 1977 at my old Alma Mater the University of Illinois - the Bacteria, the Archea and the Eukarya. The Bacteria and the Archea both use the simple prokaryotic cell architecture, while the Eukarya use the much more complicated eukaryotic cell architecture, and all of the "higher" forms of life that we are familiar with are simply made of aggregations of eukaryotic cells. Even the simple yeasts that make our breads, and get us drunk, consist of very complex eukaryotic cells. The troubling thing is that only an expert could tell the difference between a yeast eukaryotic cell and a human eukaryotic cell because they are so similar, while any school child could easily tell the difference between the microscopic images of a prokaryotic bacterial cell and a eukaryotic yeast cell - see Figure 2.

Figure 2 – The prokaryotic cell architecture of the bacteria and archaea is very simple and designed for rapid replication. Prokaryotic cells do not have a nucleus enclosing their DNA. Eukaryotic cells, on the other hand, store their DNA on chromosomes that are isolated in a cellular nucleus. Eukaryotic cells also have a very complex internal structure with a large number of organelles, or subroutine functions, that compartmentalize the functions of life within the eukaryotic cells.

Prokaryotic cells essentially consist of a tough outer cell wall enclosing an inner cell membrane and contain a minimum of internal structure. The cell membrane is composed of phospholipids and proteins. The DNA within prokaryotic cells generally floats freely as a large loop of DNA, and their ribosomes, used to help translate mRNA into proteins, float freely within the entire cell as well. The ribosomes in prokaryotic cells are not attached to membranes, like they are in eukaryotic cells, which have membranes called the rough endoplasmic reticulum for that purpose. The chief advantage of prokaryotic cells is their simple design and the ability to thrive and rapidly reproduce even in very challenging environments, like little AK-47s that still manage to work in environments where modern tanks will fail. Eukaryotic cells, on the other hand, are found in the bodies of all complex organisms, from single-celled yeasts to you and me, and they divide up cell functions amongst a collection of organelles (functional subroutines), such as mitochondria, chloroplasts, Golgi bodies, and the endoplasmic reticulum. Figure 3 depicts Carl Woese's rewrite of Darwin's famous tree of life, and shows that complex forms of life, like you and me, that are based upon cells using the eukaryotic cell architecture, actually spun off from the archaea and not the bacteria. Now archaea and bacteria look identical under a microscope, and that is the reason why at first we thought they were all just bacteria for hundreds of years. But in the 1970s Carl Woese discovered that the ribosomes used to transcribe mRNA into proteins were different between certain microorganisms that had all been previously lumped together as "bacteria". Carl Woese determined that the lumped together "bacteria" really consisted of two entirely different forms of life - the bacteria and the archaea - see Figure 3. The bacteria and archaea both have cell walls, but use slightly different organic molecules to build them. Some archaea, known as the extremophiles that live in harsh conditions also wrap their DNA around stabilizing histone proteins. Eukaryotes also wrap their DNA around histone proteins to form chromatin and chromosomes - for more on that see: An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer. For that reason, and other biochemical reactions that the archaea and eukaryotes both share, it is now thought that the eukaryotes split off from the archaea and not the bacteria.

Figure 3 – In 1977 Carl Woese developed a new tree of life consisting of the Bacteria, the Archea and the Eukarya. The Bacteria and Archea use a simple prokaryotic cell architecture, while the Eukarya use the much more complicated eukaryotic cell structure.

The other thing about eukaryotic cells, as opposed to prokaryotic cells, is that eukaryotic cells are HUGE! They are like 15,000 times larger by volume than prokaryotic cells! See Figure 4 for a true-scale comparison of the two.

Figure 4 – Not only are eukaryotic cells much more complicated than prokaryotic cells, they are also HUGE!

Recall that in the Endosymbiosis theory of Lynn Margulis, it is thought that the mitochondria of eukaryotic cells were originally parasitic bacteria that once invaded archaeal prokaryotic cells and took up residence. Certain of those ancient archaeal prokaryotic cells, with their internal bacterial mitochondrial parasites, were then able to survive the parasitic bacterial onslaught, and later, went on to form a strong parasitic/symbiotic relationship with them, like all forms of self-replicating information tend to do. The reason researchers think this is what happened is because mitochondria have their own DNA and that DNA is stored as a loose loop like bacteria store their DNA. In The Rise of Complexity in Living Things and Software we explored Nick Lane's contention that it was the arrival of the parasitic/symbiotic prokaryotic mitochondria in eukaryotic cells that provided the necessary energy to produce the very complicated eukaryotic cell architecture.

Originally, Carl Woese proposed that all three Domains diverged from one single line of "progenotes" in the distant past, as depicted in Figure 5 below. Over time, this became the standard model for the early diversification of life on the Earth.

Figure 5 – Originally, Carl Woese proposed that all three domains diverged from a single "progenote". From Phylogenetic Classification and the Universal Tree (1999) by W. Ford Doolittle.

But in later years, he and others, like W. Ford Doolittle, proposed the three domains diverged from a network of progenotes that shared many genes amongst themselves by way of lateral gene transmission, as depicted in Figure 6 below.

Figure 6 – Some now propose that the three domains diverged from a network of progenotes that shared many genes amongst themselves by way of lateral gene transmission. From Phylogenetic Classification and the Universal Tree (1999) by W. Ford Doolittle.

For more on that see Phylogenetic Classification and the Universal Tree (1999) by W. Ford Doolittle at:

http://www.faculty.biol.ttu.edu/Strauss/Phylogenetics/Readings/Doolittle1999.pdf

I just finished reading The Common Ancestor of Archaea and Eukarya Was Not an Archaeon (2013) by Patrick Forterre which is available at:

https://www.hindawi.com/journals/archaea/2013/372396/

This paper calls into question the current idea that the Eukarya Domain arose from an archaeal prokaryotic cell that fused with bacterial prokaryotic cells. In the The Common Ancestor of Archaea and Eukarya Was Not an Archaeon, Patrick Forterre contends that the eukaryotic cell architecture actually arose from a third separate line of protoeukaryotic cells that were more complicated than prokaryotic archaeal cells, but simpler than today's complex eukaryotic cells. In this view, modern simple prokaryotic archaeal and bacterial cells evolved from the more complex protoeukaryotic cell by means of simplification in order to occupy high-temperature environments. This is best summed up by the abstract for the above paper:

Abstract
It is often assumed that eukarya originated from archaea. This view has been recently supported by phylogenetic analyses in which eukarya are nested within archaea. Here, I argue that these analyses are not reliable, and I critically discuss archaeal ancestor scenarios, as well as fusion scenarios for the origin of eukaryotes. Based on recognized evolutionary trends toward reduction in archaea and toward complexity in eukarya, I suggest that their last common ancestor was more complex than modern archaea but simpler than modern eukaryotes (the bug in-between scenario). I propose that the ancestors of archaea (and bacteria) escaped protoeukaryotic predators by invading high-temperature biotopes, triggering their reductive evolution toward the “prokaryotic” phenotype (the thermoreduction hypothesis). Intriguingly, whereas archaea and eukarya share many basic features at the molecular level, the archaeal mobilome resembles more the bacterial than the eukaryotic one. I suggest that selection of different parts of the ancestral virosphere at the onset of the three domains played a critical role in shaping their respective biology. Eukarya probably evolved toward complexity with the help of retroviruses and large DNA viruses, whereas similar selection pressure (thermoreduction) could explain why the archaeal and bacterial mobilomes somehow resemble each other.


In the paper, Patrick Forterre goes into great detail discussing the many similarities and differences between the Bacteria, the Archea and the Eukarya on a biochemical level. While reading the paper, I once again began to appreciate the great difficulties that arise when trying to piece together the early evolution of carbon-based life on the Earth in deep time, with only one example readily at hand to study. I was particularly struck by the many differences between the biochemistry of the Bacteria, the Archea and the Eukarya that did not seem to jive with them all coming from a common ancestor - a LUCA or Last Universal Common Ancestor. As a softwarephysicist, I naturally began to think back on the historical evolution of software and of other forms of self-replicating information over time. This led me to wonder if the Bacteria, the Archea and the Eukarya really all evolved from a common LUCA? What if the Bacteria, the Archea and the Eukarya actually represented the vestiges of three separate lines of descent that all independently arose on their own? Perhaps carbon-based life arose many times on the early Earth and the Bacteria, the Archea and the Eukarya just represent the last collection of survivors? The differences between the Bacteria, the Archea and the Eukarya could be due to their separate originations, and their similarities could be due to them converging upon similar biochemical solutions to solve similar problems.

The Power of Convergence
In biology, convergence is the idea that sometimes organisms that are not at all related will come up with very similar solutions to common problems that they share. For example, the concept of the eye has independently evolved at least 40 different times in the past 600 million years, so there are many examples of “living fossils” showing the evolutionary path. For example, the camera-like structures of the human eye and the eye of an octopus are nearly identical, even though each structure evolved totally independent of each other. Could it be that the complex structures of the Bacteria, the Archea and the Eukarya also evolved from dead organic molecules independently?

Figure 7 - The eye of a human and the eye of an octopus are nearly identical in structure, but evolved totally independently of each other. As Daniel Dennett pointed out, there are only a certain number of Good Tricks in Design Space and natural selection will drive different lines of descent towards them.

Similarly, in SoftwareBiology and A Proposal For All Practicing Paleontologists we see that the evolution of software over the past 77 years, or 2.4 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941, has closely followed the same path as life on Earth over the past 4.0 billion years in keeping with Simon Conway Morris’s contention that convergence has played the dominant role in the evolution of life on Earth. As I mentioned above, an oft-cited example of this is the evolution of the complex camera-like human eye. Even Darwin himself had problems with trying to explain how something as complicated as the human eye could have evolved through small incremental changes from some structure that could not see at all. After all, what good is 1% of an eye? As I have often stated in the past, this is not a difficult thing for IT professionals to grasp because we are constantly evolving software on a daily basis through small incremental changes to our applications. However, when we do look back over the years to what our small incremental changes have wrought, it is quite surprising to see just how far our applications have come from their much simpler ancestors and to realize that it would be very difficult for an outsider to even recognize their ancestral forms. However, with the aid of computers, many researchers in evolutionary biology have shown just how easily a camera-like eye can evolve. Visible photons have an energy of about 1 – 3 eV, which is about the energy of most chemical reactions. Consequently, visible photons are great for stimulating chemical reactions, like the reactions in chlorophyll that turn the energy of visible photons into the chemical energy of carbohydrates or stimulating the chemical reactions of other light-sensitive molecules that form the basis of sight. In a computer simulation, the eye can simply begin as a flat eyespot of photosensitive cells that look like a patch like this: |. In the next step, the eyespot forms a slight depression, like the beginnings of the letter C, which allows the simulation to have some sense of image directionality because the light from a distant source will hit different sections of the photosensitive cells on the back part of the C. As the depression deepens and the hole in the C gets smaller, the incipient eye begins to behave like a pin hole camera that forms a clearer, but dimmer, image on the back part of the C. Next a transparent covering covers over the hole in the pin hole camera to provide some protection for the sensitive cells at the back of the eye, and a transparent humor fills the eye to keep its shape: C). Eventually, the transparent covering thickens into a flexible lens under the protective covering that can be used to focus light, and to allow for a wider entry hole that provides a brighter image, essentially decreasing the f-stop of the eye like in a camera: C0).

So it is easy to see how a 1% eye could easily evolve into a modern complex eye through small incremental changes that always improve the visual acuity of the eye. Such computer simulations predict that a camera-like eye could easily evolve in as little as 500,000 years.

Figure 8 – Computer simulations of the evolution of a camera-like eye (click to enlarge).

Now the concept of the eye has independently evolved at least 40 different times in the past 600 million years, so there are many examples of “living fossils” showing the evolutionary path. In Figure 9 below, we see that all of the steps in the computer simulation of Figure 8 can be found today in various mollusks. Notice that the human-like eye on the far right is really that of an octopus, not a human, again demonstrating the power of natural selection to converge upon identical solutions by organisms with separate lines of descent.

Figure 9 – There are many living fossils that have left behind signposts along the trail to the modern camera-like eye. Notice that the human-like eye on the far right is really that of an octopus (click to enlarge).

Could it be that the very similar unicellular designs of the Bacteria, the Archea and the Protoeukarya represent yet another example of convergence bringing forth very complex structures multiple times, that of membrane-based living cells, from the extant organic molecules of the early earth? I know that is a pretty wild idea but think of the implications. It would mean that the probability of living things emerging from organic molecules was nearly assured given the right conditions, and that simple unicellular life in our Universe should be quite common. In The Bootstrapping Algorithm of Carbon-Based Life I covered Dave Deamer's and Bruce Damer's new Hot Spring Origins Hypothesis model for the origin of carbon-based life on the early Earth. Perhaps such a model is not limited to producing only a single type of membrane-based form of unicellular life. In fact, I would contend that its Bootstrapping Algorithm might indeed produce a number of such types. Perhaps a little softwarephysics might shed some light on the subject.

Some Help From Softwarephysics
Recall that one of the fundamental findings of softwarephysics is that carbon-based life and software are both forms of self-replicating information, and that both have converged upon similar solutions to combat the second law of thermodynamics in a highly nonlinear Universe. For biologists, the value of softwarephysics is that software has been evolving about 100 million times faster than living things over the past 77 years, or 2.4 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941, and the evolution of software over that period of time is the only history of a form of self-replicating information that has actually been recorded by human history. In fact, the evolutionary history of software has all occurred within a single human lifetime, and many of those humans are still alive today to testify as to what actually had happened, something that those working on the origin of life on the Earth and its early evolution can only try to imagine. Again, in softwarephysics, we define self-replicating information as:

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

The Characteristics of Self-Replicating Information
All forms of self-replicating information have some common characteristics:

1. All self-replicating information evolves over time through the Darwinian processes of innovation and natural selection, which endows self-replicating information with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity.

2. All self-replicating information begins spontaneously as a parasitic mutation that obtains energy, information and sometimes matter from a host.

3. With time, the parasitic self-replicating information takes on a symbiotic relationship with its host.

4. Eventually, the self-replicating information becomes one with its host through the symbiotic integration of the host and the self-replicating information.

5. Ultimately, the self-replicating information replaces its host as the dominant form of self-replicating information.

6. Most hosts are also forms of self-replicating information.

7. All self-replicating information has to be a little bit nasty in order to survive.

8. The defining characteristic of self-replicating information is the ability of self-replicating information to change the boundary conditions of its utility phase space in new and unpredictable ways by means of exapting current functions into new uses that change the size and shape of its particular utility phase space. See Enablement - the Definitive Characteristic of Living Things for more on this last characteristic.

So far we have seen 5 waves of self-replicating information sweep across the Earth, with each wave greatly reworking the surface and near subsurface of the planet as it came to predominance:

1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Software is now rapidly becoming the dominant form of self-replicating information on the planet, and is having a major impact on mankind as it comes to predominance. For more on that see: A Brief History of Self-Replicating Information. To gain some insights let us take a look at the origin of software and certain memes to see if they had a common LUCA, or if they independently arose several times instead, and then later merged. The origin of human languages and writing will serve our purposes for the origins of a class of memes. As Daniel Dennett pointed out, languages are simply memes that you can speak. Writing is just a meme for recording memes.

The Rise of Software
Currently, we are witnessing one of those very rare moments in time when a new form of self-replicating information, in the form of software, is coming to dominance. Software is now so ubiquitous that it now seems like the whole world is immersed in a Software Universe of our own making, surrounded by PCs, tablets, smartphones and the software now embedded in most of mankind's products. In fact, I am now quite accustomed to sitting with audiences of younger people who are completely engaged with their "devices", before, during and after a performance. This may seem like a very recent development in the history of mankind, but in Crocheting Software we saw that crochet patterns are actually forms of software that date back to the early 19th century! In Crocheting Software we also saw that the origin of computer software was such a hodge-podge of precursors, false starts, and failed attempts that it is nearly impossible to pinpoint an exact date for its origin, but for the purposes of softwarephysics I have chosen May of 1941, when Konrad Zuse first cranked up his Z3 computer, as the starting point for modern software. Zuse wanted to use his Z3 computer to perform calculations for aircraft designs that were previously done manually in a very tedious manner. In the discussion below, I will first outline a brief history of the evolution of hardware technology to explain how we got to this state, but it is important to keep in mind that it was the relentless demands of software for more and more memory and CPU-cycles over the years that really drove the exponential explosion of hardware capability. I hope to show that software independently arose many times over the years, using many differing hardware technologies. Self-replicating information is very opportunistic, and will exapt whatever hardware happens to be available at the time. Four billion years ago, carbon-based life exapted the extant organic molecules and the naturally occurring geochemical cycles of the day in order to bootstrap itself into existence, and that is what software has been doing for the past 2.4 billion seconds on the Earth. As we briefly cover the evolutionary history of computer hardware down below, please keep in mind that for each new generation of machines, the accompanying software had to essentially independently arise again because each new machine had a unique instruction set, meaning that an executable program on one computer could not run on a different computer because they had different instruction sets. As IT professionals, writing and supporting software, and as end-users, installing and using software, we are all essentially software enzymes caught up in a frantic interplay of self-replicating information. Software is currently domesticating our minds, to churn out ever more software, of ever-increasing complexity, and this will likely continue at an ever-accelerating pace, until one day, when software finally breaks free, and begins to generate itself using AI and machine learning techniques. For more details on the evolutionary history of software see the SoftwarePaleontology section of SoftwareBiology. See Software Embryogenesis for a description of the software architecture of a modern high-volume corporate website in action, just prior to the current Cloud Computing Revolution that we are now experiencing.

A Brief Evolutionary History of Computer Hardware
It all started back in May of 1941 when Konrad Zuse first cranked up his Z3 computer. The Z3 was the world's first real computer and was built with 2400 electromechanical relays that were used to perform the switching operations that all computers use to store information and to process it. To build a computer, all you need is a large network of interconnected switches that have the ability to switch each other on and off in a coordinated manner. Switches can be in one of two states, either open (off) or closed (on), and we can use those two states to store the binary numbers of “0” or “1”. By using a number of switches teamed together in open (off) or closed (on) states, we can store even larger binary numbers, like “01100100” = 38. We can also group the switches into logic gates that perform logical operations. For example, in Figure 10 below we see an AND gate composed of two switches A and B. Both switch A and B must be closed in order for the light bulb to turn on. If either switch A or B is open, the light bulb will not light up.

Figure 10 – An AND gate can be simply formed from two switches. Both switches A and B must be closed, in a state of “1”, in order to turn the light bulb on.

Additional logic gates can be formed from other combinations of switches as shown in Figure 11 below. It takes about 2 - 8 switches to create each of the various logic gates shown below.

Figure 11 – Additional logic gates can be formed from other combinations of 2 – 8 switches.

Once you can store binary numbers with switches and perform logical operations upon them with logic gates, you can build a computer that performs calculations on numbers. To process text, like names and addresses, we simply associate each letter of the alphabet with a binary number, like in the ASCII code set where A = “01000001” and Z = ‘01011010’ and then process the associated binary numbers.

Figure 12 – Konrad Zuse with a reconstructed Z3 in 1961 (click to enlarge).


Figure 13 – Block diagram of the Z3 architecture (click to enlarge).

The electrical relays used by the Z3 were originally meant for switching telephone conversations. Closing one relay allowed current to flow to another relay’s coil, causing that relay to close as well.

Figure 14 – The Z3 was built using 2400 electrical relays, originally meant for switching telephone conversations.

Figure 15 – The electrical relays used by the Z3 for switching were very large, very slow and used a great deal of electricity which generated a great deal of waste heat.

Now I was born about 10 years later in 1951, a few months after the United States government installed its very first commercial computer, a UNIVAC I, for the Census Bureau on June 14, 1951. The UNIVAC I was 25 feet by 50 feet in size, and contained 5,600 vacuum tubes, 18,000 crystal diodes and 300 relays with a total memory of 12 K. From 1951 to 1958 a total of 46 UNIVAC I computers were built and installed.

Figure 16 – The UNIVAC I was very impressive on the outside.

Figure 17 – But the UNIVAC I was a little less impressive on the inside.

Figure 18 – Most of the electrical relays of the Z3 were replaced with vacuum tubes in the UNIVAC I, which were also very large, used lots of electricity and generated lots of waste heat too, but the vacuum tubes were 100,000 times faster than relays.

Figure 19 – Vacuum tubes contain a hot negative cathode that glows red and boils off electrons. The electrons are attracted to the cold positive anode plate, but there is a gate electrode between the cathode and anode plate. By changing the voltage on the grid, the vacuum tube can control the flow of electrons like the handle of a faucet. The grid voltage can be adjusted so that the electron flow is full blast, a trickle, or completely shut off, and that is how a vacuum tube can be used as a switch.

In the 1960s the vacuum tubes were replaced by discrete transistors and in the 1970s the discrete transistors were replaced by thousands of transistors on a single silicon chip. Over time, the number of transistors that could be put onto a silicon chip increased dramatically, and today, the silicon chips in your personal computer hold many billions of transistors that can be switched on and off in about 10-10 seconds. Now let us look at how these transistors work.

There are many different kinds of transistors, but I will focus on the FET (Field Effect Transistor) that is used in most silicon chips today. A FET transistor consists of a source, gate and a drain. The whole affair is laid down on a very pure silicon crystal using a multi-step process that relies upon photolithographic processes to engrave circuit elements upon the very pure silicon crystal. Silicon lies directly below carbon in the periodic table because both silicon and carbon have 4 electrons in their outer shell and are also missing 4 electrons. This makes silicon a semiconductor. Pure silicon is not very electrically conductive in its pure state, but by doping the silicon crystal with very small amounts of impurities, it is possible to create silicon that has a surplus of free electrons. This is called N-type silicon. Similarly, it is possible to dope silicon with small amounts of impurities that decrease the amount of free electrons, creating a positive or P-type silicon. To make an FET transistor you simply use a photolithographic process to create two N-type silicon regions onto a substrate of P-type silicon. Between the N-type regions is found a gate which controls the flow of electrons between the source and drain regions, like the grid in a vacuum tube. When a positive voltage is applied to the gate, it attracts the remaining free electrons in the P-type substrate and repels its positive holes. This creates a conductive channel between the source and drain which allows a current of electrons to flow.

Figure 20 – A FET transistor consists of a source, gate and drain. When a positive voltage is applied to the gate, a current of electrons can flow from the source to the drain and the FET acts like a closed switch that is “on”. When there is no positive voltage on the gate, no current can flow from the source to the drain, and the FET acts like an open switch that is “off”.

Figure 21 – When there is no positive voltage on the gate, the FET transistor is switched off, and when there is a positive voltage on the gate the FET transistor is switched on. These two states can be used to store a binary “0” or “1”, or can be used as a switch in a logic gate, just like an electrical relay or a vacuum tube.



Figure 22 – Above is a plumbing analogy that uses a faucet or valve handle to simulate the actions of the source, gate and drain of an FET transistor.

The CPU chip in your computer consists largely of transistors in logic gates, but your computer also has a number of memory chips that use transistors that are “on” or “off” and can be used to store binary numbers or text that is encoded using binary numbers. The next thing we need is a way to coordinate the billions of transistor switches in your computer. That is accomplished with a system clock. My current laptop has a clock speed of 2.5 GHz which means it ticks 2.5 billion times each second. Each time the system clock on my computer ticks, it allows all of the billions of transistor switches on my laptop to switch on, off, or stay the same in a coordinated fashion. So while your computer is running, it is actually turning on and off billions of transistors billions of times each second – and all for a few hundred dollars!

Computer memory was another factor greatly affecting the origin and evolution of software over time. Strangely, the original Z3 used electromechanical switches to store working memory, like we do today with transistors on memory chips, but that made computer memory very expensive and very limited, and this remained true all during the 1950s and 1960s. Prior to 1955 computers, like the UNIVAC I that first appeared in 1951, were using mercury delay lines that consisted of a tube of mercury that was about 3 inches long. Each mercury delay line could store about 18 bits of computer memory as sound waves that were continuously refreshed by quartz piezoelectric transducers on each end of the tube. Mercury delay lines were huge and very expensive per bit so computers like the UNIVAC I only had a memory of 12 K (98,304 bits).

Figure 23 – Prior to 1955, huge mercury delay lines built from tubes of mercury that were about 3 inches long were used to store bits of computer memory. A single mercury delay line could store about 18 bits of computer memory as a series of sound waves that were continuously refreshed by quartz piezoelectric transducers at each end of the tube.

In 1955 magnetic core memory came along, and used tiny magnetic rings called "cores" to store bits. Four little wires had to be threaded by hand through each little core in order to store a single bit, so although magnetic core memory was a lot cheaper and smaller than mercury delay lines, it was still very expensive and took up lots of space.

Figure 24 – Magnetic core memory arrived in 1955 and used a little ring of magnetic material, known as a core, to store a bit. Each little core had to be threaded by hand with 4 wires to store a single bit.

Figure 25 – Magnetic core memory was a big improvement over mercury delay lines, but it was still hugely expensive and took up a great deal of space within a computer.



Figure 26 – Finally in the early 1970s inexpensive semiconductor memory chips came along that made computer memory small and cheap.

Again, it was the relentless drive of software for ever-increasing amounts of memory and CPU-cycles that made all this happen, and that is why you can now comfortably sit in a theater with a smartphone that can store more than 10 billion bytes of data, while back in 1951 the UNIVAC I occupied an area of 25 feet by 50 feet to store 12,000 bytes of data. Like all forms of self-replicating information tend to do, over the past 2.4 billion seconds, software has opportunistically exapted the extant hardware of the day - the electromechanical relays, vacuum tubes, discrete transistors and transistor chips of the emerging telecommunications and consumer electronics industries, into the service of self-replicating software of ever-increasing complexity, as did carbon-based life exapt the extant organic molecules and the naturally occurring geochemical cycles of the day in order to bootstrap itself into existence.

But when I think back to my early childhood in the early 1950s, I can still vividly remember a time when there essentially was no software at all in the world. In fact, I can still remember my very first encounter with a computer on Monday, Nov. 19, 1956, watching the Art Linkletter TV show People Are Funny with my parents on an old black and white console television set that must have weighed close to 150 pounds. Art was showcasing the 21st UNIVAC I to be constructed and had it sorting through the questionnaires from 4,000 hopeful singles, looking for the ideal match. The machine paired up John Caran, 28, and Barbara Smith, 23, who later became engaged. And this was more than 40 years before eHarmony.com! To a five-year-old boy, a machine that could “think” was truly amazing. Since that very first encounter with a computer back in 1956, I have personally witnessed software slowly becoming the dominant form of self-replicating information on the planet, and I have also seen how software has totally reworked the surface of the planet to provide a secure and cozy home for more and more software of ever- increasing capability. For more on this please see A Brief History of Self-Replicating Information. That is why I think there would be much to be gained in exploring the origin and evolution of the $10 trillion computer simulation that the Software Universe provides, and that is what softwarephysics is all about.

The Origin of Human Languages
Now I am not a linguist, but from what I can find on Google, there seem to be about 5000 languages spoken in the world today that linguists divide into about 20 families, and we suspect there were many more languages in days gone by when people lived in smaller groups. The oldest theory for the origin of human language is known as monogenesis, and like the concept of a single LUCA in biology, it posits that language spontaneously arose only once and that all of the other thousands of languages then diverged from this single mother tongue, sort of like the Tower of Babel in the book of Genesis. The second theory is known as polygenesis, and it posits that human language emerged independently many times in many separate far-flung groups. These multiple origins of language then began to differentiate, and that is why we now have 5,000 different languages grouped into 20 different families. Each of the 20 families might remain as a vestige of the multiple originations of human language from the separate mother tongues. Many linguists in the United States are in favor of a form of monogenesis known as the Mother Tongue Theory, which stems from the Out of Africa Theory for the original dispersion of Homo sapiens throughout the world. The Mother Tongue Theory holds that an original human language originated about 150,000 years ago in Africa, and that language went along for the ride when Homo sapiens diffused out of Africa to colonize the entire world. So the jury is still out, and probably always will be, on the proposition of human languages having a single LUCA. Personally, I find the extreme diversity of human languages to favor the independent polygenesis of human language multiple times by many independent groups of Homo sapiens. But that might stem from having taken a bit of Spanish, Latin, and German in grade school and high school. When I got to college, I took a year of Russian, only to learn that there was a whole different way of communicating!

The Origin of Writing Systems
The origin of writing systems seems to provide a more fruitful analogy because writing systems are more recent, and by definition, they leave behind a "fossil record" in written form. Since I am getting way beyond my area of expertise, let me quote directly from the Wikipedia at:

https://en.wikipedia.org/wiki/History_of_writing

It is generally agreed that true writing of language was independently conceived and developed in at least two ancient civilizations and possibly more. The two places where it is most certain that the concept of writing was both conceived and developed independently are in ancient Sumer (in Mesopotamia), around 3100 BC, and in Mesoamerica by 300 BC, because no precursors have been found to either of these in their respective regions. Several Mesoamerican scripts are known, the oldest being from the Olmec or Zapotec of Mexico.

Independent writing systems also arose in Egypt around 3100 BC and in China around 1200 BC, but historians debate whether these writing systems were developed completely independently of Sumerian writing or whether either or both were inspired by Sumerian writing via a process of cultural diffusion. That is, it is possible that the concept of representing language by using writing, though not necessarily the specifics of how such a system worked, was passed on by traders or merchants traveling between the two regions.

Ancient Chinese characters are considered by many to be an independent invention because there is no evidence of contact between ancient China and the literate civilizations of the Near East, and because of the distinct differences between the Mesopotamian and Chinese approaches to logography and phonetic representation. Egyptian script is dissimilar from Mesopotamian cuneiform, but similarities in concepts and in earliest attestation suggest that the idea of writing may have come to Egypt from Mesopotamia. In 1999, Archaeology Magazine reported that the earliest Egyptian glyphs date back to 3400 BC, which "challenge the commonly held belief that early logographs, pictographic symbols representing a specific place, object, or quantity, first evolved into more complex phonetic symbols in Mesopotamia."

Similar debate surrounds the Indus script of the Bronze Age Indus Valley civilization in Ancient India (2600 BC). In addition, the script is still undeciphered, and there is debate about whether the script is true writing at all or, instead, some kind of proto-writing or nonlinguistic sign system.

An additional possibility is the undeciphered Rongorongo script of Easter Island. It is debated whether this is true writing and, if it is, whether it is another case of cultural diffusion of writing. The oldest example is from 1851, 139 years after their first contact with Europeans. One explanation is that the script was inspired by Spain's written annexation proclamation in 1770.

Various other known cases of cultural diffusion of writing exist, where the general concept of writing was transmitted from one culture to another, but the specifics of the system were independently developed. Recent examples are the Cherokee syllabary, invented by Sequoyah, and the Pahawh Hmong system for writing the Hmong language.


Therefore, I think that it can be safely assumed that many memes, like the meme for writing, making pots and flake tools, arose independently multiple times throughout human history and then those memes further differentiated.

Notice that, like the origin of true software from its many precursors, it is very difficult to pick an exact date for the origin of true writing from its many precursors too. When exactly do cartoon-like symbols evolve into a true written language? The origin of true languages from the many grunts of Homo sapiens might also have been very difficult to determine. Perhaps very murky origins are just another common characteristic of all forms of self-replicating information, including carbon-based life.

Conclusion
Most likely, the idea that the Bacteria, the Archea and the Eukarya represent the last vestiges of three separate lines of descent that independently arose from dead organic molecules in the distant past, with no LUCA - Last Universal Common Ancestor, is most probably incorrect. However, given the chaotic origination histories of many forms of self-replicating information, including the memes and software, I also have reservations about the idea that all carbon-based life on the Earth sprang from one single LUCA too, as depicted in Figure 5. It would seem most likely that there were many precursors to carbon-based cellular life on the Earth, and that it would be nearly impossible to have identified when carbon-based life actually came to be, even if we were around to watch it all happen. Perhaps 4.0 billion years ago, carbon-based life independently arose several times with many common biochemical characteristics because those common biochemical characteristics were the only ones that worked at the time, as depicted in Figure 6. Later, these separate originations of life most likely merged somewhat in the parasitic/symbiotic manner that all forms of self-replicating information are prone to do to form the Bacteria, the Archea and the Eukarya. At least it's something to think about.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
http://softwarephysics.blogspot.com/

Regards,
Steve Johnston

Tuesday, September 26, 2017

The Perils of Software Enhanced Confirmation Bias

How often do you dramatically change your worldview opinion on an issue? If you are like me, that seldom happens, and I think that is the general rule, even when we are confronted with new evidence that explicitly challenges our current deeply held positions. My observation is that people nearly always simply dismiss any new evidence that arrives on the scene that does not confirm their current worldview. Instead, we normally only take seriously new evidence that reinforces our current worldview. Only when confronted with overwhelming evidence that impacts us on a very personal level, like a category 5 hurricane destroying our lives, do we very rarely change our minds about an issue. The tendency to simply stick with your current worldview, even in the face of mounting evidence that contradicts that worldview, is called confirmation bias because we all naturally only tend to seek out information that confirms our current beliefs, and at the same time, tend to dismiss any evidence that calls them into question. This is nothing new. The English philosopher and scientist Francis Bacon (1561–1626), in his Novum Organum (1620), noted that the biased assessment of evidence greatly influenced the way we all think about things. He wrote:

The human understanding when it has once adopted an opinion ... draws all things else to support and agree with it. And though there be a greater number and weight of instances to be found on the other side, yet these it either neglects or despises, or else by some distinction sets aside or rejects.

But in recent years this dangerous defect in the human thought process has been dramatically amplified by search and social media software, like Google, Facebook and Twitter. This became quite evident during the very contentious 2016 election in the United States of America, and also during this past year when the new Administration came to power. But why? I have a high level of confidence that much of the extreme political polarization that we see in the world today results from the strange parasitic/symbiotic relationships between our memes and our software. Let me explain.

Being born in 1951, I can vividly remember a time when there essentially was no software at all in the world, and the political polarization in the United States was much more subdued. In fact, even back in 1968, the worst year of political polarization in the United States since the Civil War, things were not as bad as they are today because software was still mainly in the background doing things like printing out bills and payroll checks. But that has all dramatically changed. Thanks to the rise of software, for more than 20 years, it has been possible with the aid of search software, like Google, for all to simply only seek out evidence that lends credence to their current worldview. In addition, in Cyber Civil Defense I also pointed out that it is now also possible for foreign governments to shape public opinion by planting "fake news" and "fabricated facts" using the software platforms of the day. Search software then easily picks up this disinformation, reinforcing the age-old wisdom of the adage Seek and ye shall find. This is bad enough, but Zeynep Tufekci describes an even darker scenario in her recent TED Talk:

We're building a dystopia just to make people click on ads at:
https://www.ted.com/talks/zeynep_tufekci_we_re_building_a_dystopia_just_to_make_people_click_on_ads?utm_source=newsletter_weekly_2017-10-28&utm_campaign=newsletter_weekly&utm_medium=email&utm_content=bottom_right_image

Zeynep Tufekci explains how search and social media software now use machine learning algorithms to comb through the huge amounts of data about us that are now available to them, to intimately learn about our inner lives in ways that no human can fully understand, because the learning is hidden in huge multidimensional arrays of billions of elements. The danger is that the machine learning software and data can then begin to mess with the memes within our minds by detecting susceptibilities in our thinking, and then exploiting those susceptibilities to plant additional memes. She points out that the Up Next column on the right side of YouTube webpages uses machine learning to figure out what to feature in the Up Next column, and that when viewing political content or social issue content, the Up Next column tends to reinforce the worldview of the end user with matching content. Worse yet, the machine learning software tends to unknowingly present content that actually amplifies the end user's worldview with content of an even more extreme nature. Try it for yourself. I started out with some Alt-Right content and quickly advanced to some pretty dark ideas. So far this is all being done to simply keep us engaged so that we watch more ads, but Zeynep Tufekci points out that in the hands of an authoritarian regime such machine learning software could be used to mess with the memes in the minds of an entire population in a Nineteen Eighty-Four fashion. But instead of using overt fear to maintain power, such an authoritarian regime could simply use machine learning software and tons of data to shape our worldview memes by simply using our own vulnerabilities to persuasion. In such a world, we would not even know that it was happening!

I think that such profound observations could benefit from a little softwarephysics because they describe yet another example of the strange parasitic/symbiotic relationships that have developed between software and the memes. Again, the key finding of softwarephysics is that it is all about self-replicating information:

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

The Characteristics of Self-Replicating Information
All forms of self-replicating information have some common characteristics:

1. All self-replicating information evolves over time through the Darwinian processes of inheritance, innovation and natural selection, which endows self-replicating information with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity.
2. All self-replicating information begins spontaneously as a parasitic mutation that obtains energy, information and sometimes matter from a host.
3. With time, the parasitic self-replicating information takes on a symbiotic relationship with its host.
4. Eventually, the self-replicating information becomes one with its host through the symbiotic integration of the host and the self-replicating information.
5. Ultimately, the self-replicating information replaces its host as the dominant form of self-replicating information.
6. Most hosts are also forms of self-replicating information.
7. All self-replicating information has to be a little bit nasty in order to survive.
8. The defining characteristic of self-replicating information is the ability of self-replicating information to change the boundary conditions of its utility phase space in new and unpredictable ways by means of exapting current functions into new uses that change the size and shape of its particular utility phase space. See Enablement - the Definitive Characteristic of Living Things for more on this last characteristic.

Basically, we have seen five waves of self-replicating information come to dominate the Earth over the past four billion years:

1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Software is now rapidly becoming the dominant form of self-replicating information on the planet and is having a major impact on mankind as it comes to predominance. For more on that see: A Brief History of Self-Replicating Information. Please note that because the original metabolic pathways of organic molecules, RNA and DNA have now become so closely intertwined over the past four billion years, they can now be simply lumped together and called the "genes" of a species.

Currently, we are living in one of those very rare times when a new form of self-replicating information, known to us as software, is coming to power, as software is coming to predominance over the memes that have run the planet for the past 200,000 years. During the past 200,000 years, as the memes took up residence in the minds of Homo sapiens, like all of their predecessors, the memes then went on to modify the entire planet. They cut down the forests for agriculture, mined minerals from the ground for metals, burned coal, oil, and natural gas for energy, releasing the huge quantities of carbon dioxide that its predecessors had previously sequestered in the Earth, and have even modified the very DNA, RNA, and metabolic pathways of its predecessors. But now that software is seemingly on the rise, like all of its predecessors, software has entered into a very closely coupled parasitic/symbiotic relationship with the memes, the current dominant form of self-replicating information on the planet, with the intent to someday replace the memes as the dominant form of self-replicating information on the planet. In today's world, memes allow software to succeed, and software allows memes to replicate, all in a very temporary and uneasy alliance that cannot continue on forever. Again, self-replicating information cannot think, so it cannot participate in a conspiracy theory fashion to take over the world. All forms of self-replicating information are simply forms of mindless information responding to the blind Darwinian forces of inheritance, innovation and natural selection. Yet despite that, as each new wave of self-replicating information came to predominance over the past four billion years, they all managed to completely transform the surface of the entire planet, so we should not expect anything different as software comes to replace the memes as the dominant form of self-replicating information on the planet.

So this posting has two very important questions to expound upon:

1. Why is confirmation bias so prevalent with Homo sapiens? Why do we all ferociously cling to the memes of our current worldview, even when uncontrovertible evidence arrives contradicting those memes, resulting in the detrimental consequences of confirmation bias? Confirmation bias would seem to be a highly detrimental thing from a Darwinian "survival of the fittest" perspective that should be quickly eliminated from the gene pool of a species because it can lead to individuals pursuing very dangerous activities that are not supported by the facts.

2. What are the political implications of software unknowingly tending to enhance the negative aspects of the confirmation bias within us?

The Origin of Confirmation Bias
This is where some softwarephysics can be of help. First, we need to explain why confirmation bias seems to be so strongly exhibited amongst all of the cultures of Homo sapiens. On the face of it, this fact seems to be very strange from the survival perspective of the metabolic pathways, RNA and DNA that allow carbon-based life on the Earth to survive. For example, suppose the current supreme leader of your tribe maintains that lions only hunt at night, and you truly believe in all that your supreme leader espouses, so you firmly believe that there is no danger from lions when going out to hunt for game during the day. Now it turns out that some members of your tribe think that the supreme leader has it all wrong, and that among other erroneous things, lions do actually hunt during the day. But you hold such thoughts in contempt because they counter your current worldview, which reverently holds the supreme leader in omniscience. But then you begin to notice that some members of your tribe do indeed come back mauled, and sometimes even killed, by lions during the day. Nonetheless, you still persist in believing in your supreme leader's contention that lions only hunt during the night, until one day you also get mauled by a lion during the day while out hunting game for the tribe. So what are the evolutionary advantages of believing in things that are demonstrably false? This is something that is very difficult for evolutionary psychologists to explain because evolutionary psychologists contend that all human thoughts and cultures are tuned for cultural evolutionary adaptations that enhance the survival of the individual, and that benefit the metabolic pathways, RNA and DNA of carbon-based life in general.

To explain the universal phenomenon of confirmation bias, softwarephysics embraces the memetics of Richard Dawkins and Susan Blackmore. Memetics explains that the heavily over-engineered brain of Homo sapiens did not evolve simply to enhance the survival of our genes - it primarily evolved to enhance the survival of our memes. Memetics contends that confirmation bias naturally arises in us all because the human mind evolved to primarily preserve the memes it currently stores. That makes it very difficult for new memes to gain a foothold in our stubborn minds. Let's examine this explanation of confirmation bias a little further. In Susan Blackmore's The Meme Machine (1999) she explains that the highly over-engineered brain of Homo sapiens did not evolve to simply improve the survivability of the metabolic pathways, RNA and DNA of carbon-based life. Instead, the highly over-engineered brain of Homo sapiens evolved to store an ever-increasing number of ever-increasingly complex memes, even to the point of detriment to the metabolic pathways, RNA and DNA that made the brain of Homo sapiens possible. Blackmore points out that the human brain is a very expensive and dangerous organ. The brain is only 2% of your body mass but burns about 20% of your calories each day. The extremely large brain of humans also kills many mothers and babies at childbirth and also produces babies that are totally dependent upon their mothers for survival and that are totally helpless and defenseless on their own. Blackmore asks the obvious question of why the genes would build such an extremely expensive and dangerous organ that was definitely not in their own self-interest. Blackmore has a very simple explanation – the genes did not build our exceedingly huge brains, the memes did. Her reasoning goes like this. About 2.5 million years ago, the predecessors of humans slowly began to pick up the skill of imitation. This might not sound like much, but it is key to her whole theory of memetics. You see, hardly any other species learns by imitating other members of their own species. Yes, there are many species that can learn by conditioning, like Pavlov’s dogs, or that can learn through personal experience, like mice repeatedly running through a maze for a piece of cheese, but a mouse never really learns anything from another mouse by imitating its actions. Essentially, only humans do that. If you think about it for a second, nearly everything you do know you learned from somebody else by imitating or copying their actions or ideas. Blackmore maintains that the ability to learn by imitation required a bit of processing power by our distant ancestors because one needs to begin to think in an abstract manner by abstracting the actions and thoughts of others into the actions and thoughts of their own. The skill of imitation provided a great survival advantage to those individuals who possessed it and gave the genes that built such brains a great survival advantage as well. This caused a selection pressure to arise for genes that could produce brains with ever-increasing capabilities of imitation and abstract thought. As this processing capability increased there finally came a point when the memes, like all of the other forms of self-replicating information that we have seen arise, first appeared in a parasitic manner. Along with very useful memes, like the meme for making good baskets, other less useful memes, like putting feathers in your hair or painting your face, also began to run upon the same hardware in a manner similar to computer viruses. The genes and memes then entered into a period of coevolution, where the addition of more and more brain hardware advanced the survival of both the genes and memes. But it was really the memetic-drive of the memes that drove the exponential increase in processing power of the human brain way beyond the needs of the genes. The memes then went on to develop languages and cultures to make it easier to store and pass on memes. Yes, languages and cultures also provided many benefits to the genes as well, but with languages and cultures, the memes were able to begin to evolve millions of times faster than the genes, and the poor genes were left straggling far behind. Given the growing hardware platform of an ever-increasing number of Homo sapiens on the planet, the memes then began to cut free of the genes and evolve capabilities on their own that only aided the survival of memes, with little regard for the genes, to the point of even acting in a very detrimental manner to the survival of the genes, like developing the capability for global thermonuclear war and global climate change.

Software Arrives On the Scene as the Newest Form of Self-Replicating Information
A very similar thing happened with software over the past 76 years, or 2.4 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941 - for more on that see So You Want To Be A Computer Scientist?. When I first started programming in 1972, million dollar mainframe computers typically had about 1 MB (about 1,000,000 bytes) of memory with a 750 KHz system clock (750,000 ticks per second). Remember, one byte of memory can store something like the letter “A”. But in those days, we were only allowed 128 K (about 128,000 bytes) of memory for our programs because the expensive mainframes were also running several other programs at the same time. It was the relentless demands of software for memory and CPU-cycles over the years that drove the exponential explosion of hardware capability. For example, today the typical $300 PC comes with 8 GB (about 8,000,000,000 bytes) of memory and has several CPUs running with a clock speed of about 3 GHz (3,000,000,000 ticks per second). A few years ago, I purchased Redshift 7 for my personal computer, a $60 astronomical simulation application, and it alone uses 382 MB of memory when running and reads 5.1 GB of data files, a far cry from my puny 128K programs from 1972. So the hardware has improved by a factor of about 10 million since I started programming in 1972, driven by the ever-increasing demands of software for more powerful hardware. For example, in my last position, before I retired last year, doing Middleware Operations for a major corporation, we were constantly adding more application software each week, so every few years we had to do an upgrade all of our servers to handle the increased load.

We can now see these very same processes at work today with the evolution of software. Software is currently being written by memes within the minds of programmers. Nobody ever learned how to write software all on their own. Just as with learning to speak or to read and write, everybody learned to write software by imitating teachers, other programmers, imitating the code written by others, or by working through books written by others. Even after people do learn how to program in a particular language, they never write code from scratch; they always start with some similar code that they have previously written, or others have written, in the past as a starting point, and then evolve the code to perform the desired functions in a Darwinian manner (see How Software Evolves). This crutch will likely continue for another 20 – 50 years until the day finally comes when software can write itself, but even so, “we” do not currently write the software that powers the modern world; the memes write the software that does that. This is just a reflection of the fact that “we” do not really run the modern world either; the memes in meme-complexes really run the modern world because the memes are currently the dominant form of self-replicating information on the planet. In The Meme Machine, Susan Blackmore goes on to point out that the memes at first coevolved with the genes during their early days, but have since outrun the genes because the genes could simply not keep pace when the memes began to evolve millions of times faster than the genes. The same thing is happening before our very eyes to the memes, with software now rapidly outpacing the memes. Software is now evolving thousands of times faster than the memes, and the memes can simply no longer keep up.

As with all forms of self-replicating information, software began as a purely parasitic mutation within the scientific and technological meme-complexes, initially running on board Konrad Zuse’s Z3 computer in May of 1941 - see So You Want To Be A Computer Scientist? for more details. It was spawned out of Zuse’s desire to electronically perform calculations for aircraft designs that were previously done manually in a very tedious manner. So initially software could not transmit memes, it could only perform calculations, like a very fast adding machine, and so it was a pure parasite. But then the business and military meme-complexes discovered that software could also be used to transmit memes, and software then entered into a parasitic/symbiotic relationship with the memes. Software allowed these meme-complexes to thrive, and in return, these meme-complexes heavily funded the development of software of ever-increasing complexity, until software became ubiquitous, forming strong parasitic/symbiotic relationships with nearly every meme-complex on the planet. In the modern day, the only way memes can now spread from mind to mind without the aid of software is when you directly speak to another person next to you. Even if you attempt to write a letter by hand, the moment you drop it into a mailbox, it will immediately fall under the control of software. The poor memes in our heads have become Facebook and Twitter addicts.

So in the grand scheme of things, the memes have replaced their DNA predecessor, which replaced RNA, which replaced the original self-replicating autocatalytic metabolic pathways of organic molecules as the dominant form of self-replicating information on the Earth. Software is the next replicator in line, and is currently feasting upon just about every meme-complex on the planet, and has formed very strong parasitic/symbiotic relationships with them all. How software will merge with the memes is really unknown, as Susan Blackmore pointed out in her brilliant TED presentation at:

Memes and "temes"
http://www.ted.com/talks/susan_blackmore_on_memes_and_temes.html

Note that I consider Susan Blackmore's temes to really be technological artifacts that contain software. After all, an iPhone without software is simply a flake tool with a very dull edge. Once established, software then began to evolve based upon the Darwinian concepts of inheritance, innovation and natural selection, which endowed software with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity. Successful software, like MS Word and Excel, competed for disk and memory address space with WordPerfect and VisiCalc and out-competed these once dominant forms of software to the point of extinction. In less than 76 years, software has rapidly spread across the face of the Earth and outward to every planet of the Solar System and many of its moons, with a few stops along the way at some comets and asteroids. And unlike us, software is now leaving the Solar System for interstellar space on board the Pioneer 1 & 2 and Voyager 1 & 2 probes.

Currently, software manages to replicate itself with the support of you. If you are an IT professional, then you are directly involved in some, or all of the stages in this replication process, and act sort of like a software enzyme. No matter what business you support as an IT professional, the business has entered into a parasitic/symbiotic relationship with software. The business provides the budget and energy required to produce and maintain the software, and the software enables the business to run its processes efficiently. The ultimate irony in all this is the symbiotic relationship between computer viruses and the malevolent programmers who produce them. Rather than being the clever, self-important, techno-nerds that they picture themselves to be, these programmers are merely the unwitting dupes of computer viruses that trick these unsuspecting programmers into producing and disseminating computer viruses! And if you are not an IT professional, you are still involved with spreading software around because you buy gadgets that are loaded down with software, like smartphones, notepads, laptops, PCs, TVs, DVRs, cars, refrigerators, coffeemakers, blenders, can openers and just about anything else that uses electricity.

The Impact of Machine Learning
In Zeynep Tufekci's TED Talk she points out that the parasitic/symbiotic relationship between software and the memes that has been going on now for many decades has now entered into a new stage, where software is not only just promoting the memes that are already running around within our heads, machine learning software is now also implanting new memes within our minds to simply keep them engaged, and to continue to view the ads that ultimately fund the machine learning software. This is a new twist on the old parasitic/symbiotic relationships between the memes and software of the past. As Zeynep Tufekci adeptly points out, this is currently all being done in a totally unthinking and purely self-replicating manner by the machine learning software of the day that cannot yet think for itself. This is quite disturbing on its own, but what if someday an authoritarian regime begins to actively use machine learning software to shape its society? Or worse yet, what if machine learning software someday learns to manipulate The Meme Machine between our ears solely for its own purposes, even if it cannot as of yet discern what those purposes might be?

How To Combat Software Enhanced Confirmation Bias
So what are we to do? Personally, I find the best way to combat confirmation bias in general, and especially software enhanced confirmation bias is to go back to the fundamentals of the scientific method - for more on that see How To Think Like A Scientist. At an age of 66 years, I now have very little confidence in any form of human thought beyond mathematics and the sciences. For me, all other forms of human thought seem to be hopelessly mired down with confirmation bias. Just take a look at the deep political polarization in the United States. Both sides now simply only take in evidence that supports their current worldview, and disregard any information that challenges their current worldview memes, to the point where we may now have an unwitting Russian agent, like in The Manchurian Candidate (1962), in the White House. It seems that software enhanced confirmation bias had a lot to do with that - see Cyber Civil Defense for more details. The scientific method relies heavily on the use of induction from empirical facts, but not at all on the opinions of authority. So it is important to establish the facts, even if those facts come from the opposing party, and at the same time, separate the facts from the opinions. That is a hard thing to do as Richard Feynman always reminded us because, “The most important thing is to not fool yourself because you are the easiest one to fool.”. Facts can be ascertained by repeated measurement or observation, and do not change on their own like opinions do.

For example, this past year I very reluctantly changed my worldview concerning the origin of carbon-based life on this planet. Originally, I had a deep affection for Mike Russell's Alkaline Hydrothermal Vent model for the origin of carbon-based life on the early Earth. The Alkaline Hydrothermal Vent model proposes that a naturally occurring pH gradient in alkaline hydrothermal vents on the ocean floor arose when alkaline pore fluids containing dissolved hydrogen H2 gas came into contact with acidic seawater that was laden with dissolved carbon dioxide CO2. The model maintains that these alkaline pore fluids were generated by a natural geochemical cycle that was driven by the early convection currents in the Earth's asthenosphere that brought forth plate tectonics. These initial convection currents brought up fresh silicate peridotite rock that was rich in iron and magnesium-bearing minerals, like olivine, to the Earth's initial spreading centers. The serpentinization of the mineral olivine into the mineral serpentinite then created alkaline pore fluids and dissolved hydrogen H2 gas, which later created alkaline hydrothermal vents when the alkaline pore fluids came into contact with the acidic seawater containing a great deal of dissolved carbon dioxide CO2. The model proposes that the energy of the resulting pH gradients turned the hydrogen H2 and carbon dioxide CO2 molecules into organic molecules, and it is proposed that they also fueled the origin of life in the pores of the porous hydrothermal vents - for more on that see: An IT Perspective on the Transition From Geochemistry to Biochemistry and Beyond. One of the enticing characteristics of the Alkaline Hydrothermal Vent model is that it allows for carbon-based life to originate on bodies outside of the traditional habitable zone around stars. The traditional habitable zone of a star is the Goldilocks zone of planetary orbits that allow for liquid water to exist on a planetary surface because the planet is not too close or too far away from its star. But the Alkaline Hydrothermal Vent model also allows for carbon-based life to arise on ice-covered moons with internal oceans, like Europa and Enceladus, that orbit planets outside of the traditional habitable zone of a star system, and that is a very attractive feature of the model if you have a deep down desire to find other forms of carbon-based life, like ourselves, within our galaxy.

However, in The Bootstrapping Algorithm of Carbon-Based Life, I explained that I have now adopted Dave Deamer's and Bruce Damer's new Hot Spring Origins Hypothesis model for the origin of carbon-based life on the early Earth. This was because Dave Deamer sent me a number of compelling papers that convincingly brought forward many problems with the Alkaline Hydrothermal Vent model. The basic problem with the Alkaline Hydrothermal Vent model is that there is just too much water in oceanic environments for complex organic molecules to form. Complex organic molecules are composed of polymers of organic monomers that are chemically glued together by a chemical process known as condensation, where a molecule of water H2O is split out between the organic monomers. This is a very difficult thing to do when you are drowning in water molecules in an oceanic environment. When you are drowning in water molecules, the opposite chemical reaction called hydrolysis is thermodynamically more likely, where polymers of organic molecules are split apart by adding a water molecule between them. The key to the Hot Spring Origins Hypothesis model is that condensation is very easy to do if you let the organic monomers dry out on land above sea level in a hydrothermal field. The drying out process naturally squeezes out water molecules between organic monomers to form lengthy organic polymers - see Figure 1. But the need for a period of drying out of organic monomers in the bootstrapping algorithm of carbon-based life would eliminate the ice-covered moon environments of our galaxy, like Europa and Enceladus, and that was a hard thing to accept for the memes of my current worldview. Still, the scientific method strives for the truth, and the truth is better than the comfort of false hopes.

Figure 1 – Condensation chemically glues organic monomers together to form long organic polymers by splitting out a water molecule between monomers. Hydrolysis does just the opposite by splitting apart organic polymers into monomers by adding water molecules between the organic monomers.

Conclusion
Now, I must admit that changing one's mind is indeed quite painful because the memes engineered our minds not to do that. But in the end, I must admit that I am now quite comfortable with my new worldview on the origin of carbon-based life on this planet. Those new memes in my mind have also settled into their new home, and are also quite comfortable. In fact, they have even seduced me into trying to spread them to a new home in your mind as well with this very posting. But again, these new memes are just mindless forms of self-replicating information blindly responding to the universal Darwinian forces of inheritance, innovation and natural selection. Thankfully, these memes really are not very nasty at all.

But on a darker note, as an 18th century liberal and a 20th century conservative, I look with great dismay on the current deep political polarization within the United States of America, because I see the United States of America as the great political expression of the 18th century Enlightenment that brought us deliberation through evidence-based rational thought. This makes me abhor the current worldwide rise of the fascist Alt-Right movements around the globe. Remember, we already tried out fascism in the 20th century and found that it did not work as well as first advertised. Again, I have a high level of confidence that the current fascist Alt-Right movements of the world are simply a reaction to software, and especially now, AI software, coming to predominance as the latest form of self-replicating information on the planet - for more on that see - The Economics of the Coming Software Singularity , The Enduring Effects of the Obvious Hiding in Plain Sight and Machine Learning and the Ascendance of the Fifth Wave. So as a thoughtful member of the species Homo sapiens, I would recommend to all to keep an open mind during the waning days of our supremacy, and not let machine learning software snuff out the gains of the 18th century Enlightenment before we can pass them on to our successors.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
http://softwarephysics.blogspot.com/

Regards,
Steve Johnston

Monday, August 07, 2017

Facilitated Variation and the Utilization of Reusable Code by Carbon-Based Life

I just finished reading The Plausibility of Life (2005) by Marc W. Kirschner and John C. Gerhart that presents their theory of facilitated variation. The theory of facilitated variation maintains that, although the concepts and mechanisms of Darwin's natural selection are well understood, the mechanisms that brought forth viable biological innovations in the past are a bit wanting in classical Darwinian thought. In classical Darwinian thought, it is proposed that random genetic changes, brought on by random mutations to DNA sequences, can very infrequently cause small incremental enhancements to the survivability of the individual, and thus provide natural selection with something of value to promote in the general gene pool of a species. Again, as frequently cited, most random genetic mutations are either totally inconsequential, or totally fatal in nature, and consequently, are either totally irrelevant to the gene pool of a species or are quickly removed from the gene pool at best. The theory of facilitated variation, like classical Darwinian thought, maintains that the phenotype of an individual is key, and not so much its genotype since natural selection can only operate upon phenotypes. The theory explains that the phenotype of an individual is determined by a number of 'constrained' and 'deconstrained' elements. The constrained elements are called the "conserved core processes" of living things that essentially remain unchanged for billions of years, and which are to be found to be used by all living things to sustain the fundamental functions of carbon-based life, like the generation of proteins by processing the information that is to be found in DNA sequences, and processing it with mRNA, tRNA and ribosomes, or the metabolism of carbohydrates via the Krebs cycle. The deconstrained elements are weakly-linked regulatory processes that can change the amount, location and timing of gene expression within a body, and which, therefore, can easily control which conserved core processes are to be run by a cell and when those conserved core processes are to be run by them. The theory of facilitated variation maintains that most favorable biological innovations arise from minor mutations to the deconstrained weakly-linked regulatory processes that control the conserved core processes of life, rather than from random mutations of the genotype of an individual in general that would change the phenotype of an individual in a purely random direction. That is because the most likely change of direction for the phenotype of an individual, undergoing a random mutation to its genotype, is the death of the individual.

Marc W. Kirschner and John C. Gerhart begin by presenting the fact that simple prokaryotic bacteria, like E. coli, require a full 4,600 genes just to sustain the most rudimentary form of bacterial life, while much more complex multicellular organisms, like human beings, consisting of tens of trillions of cells differentiated into hundreds of differing cell types in the numerous complex organs of a body, require only a mere 22,500 genes to construct. The baffling question is, how is it possible to construct a human being with just under five times the number of genes as a simple single-celled E. coli bacterium? The authors contend that it is only possible for carbon-based life to do so by heavily relying upon reusable code in the genome of complex forms of carbon-based life.

Figure 1 – A simple single- celled E. coli bacterium is constructed using a full 4,600 genes.

Figure 2 – However, a human being, consisting of about 100 trillion cells that are differentiated into the hundreds of differing cell types used to form the organs of the human body, uses a mere 22,500 genes to construct a very complex body, which is just slightly under five times the number of genes used by simple E. coli bacteria to construct a single cell. How is it possible to explain this huge dynamic range of carbon-based life? Marc W. Kirschner and John C. Gerhart maintain that, like complex software, carbon-based life must heavily rely upon reusable code.

For a nice synopsis of the theory of facilitated variation see the authors' paper The theory of facilitated variation at:

http://www.pnas.org/content/104/suppl_1/8582.long

The theory of facilitated variation is an important idea, but since the theory of facilitated variation does not seem to have garnered the attention in the biological community that it justly deserves, I would like to focus on it in this posting. Part of the problem is that it is a relatively new theory that just appeared in 2005, so it has a very long way to go in the normal lifecycle of all new theories:

1. First it is ignored
2. Then it is wrong
3. Then it is obvious
4. Then somebody else thought of it 20 years earlier!

The other problem that the theory of facilitated variation faces is not limited to the theory itself. Rather, it stems from the problem that all such theories in biology must face, namely, that we only have one form of carbon-based life on this planet at our disposal, and we were not around to watch it evolve. That is where softwarephysics can offer some help. From the perspective of softwarephysics, the theory of facilitated variation is most certainly true because we have seen the very same thing arise in the evolution of software over the past 76 years, or 2.4 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941. Again, the value of softwarephysics is that software has been evolving about 100 million times faster than living things for the past 2.4 billion seconds, and all of that evolutionary history occurred within a single human lifetime, with many of the participants still alive today to testify as to what actually had happened, something that those working on the evolution of carbon-based life can only try to imagine. Recall that one of the fundamental findings of softwarephysics is that living things and software are both forms of self-replicating information and that both have converged upon similar solutions to combat the second law of thermodynamics in a highly nonlinear Universe. For more on that see: A Brief History of Self-Replicating Information.

Using the Evolution of Software as a Guide
On the Earth, we have seen carbon-based life go through three major architectural advances, and heavily using the reusable code techniques to be found in the theory of facilitated variation for the past 4 billion years.

1. The origin of life about 4 billion years ago, probably on land in the early hydrothermal fields of the early Earth, producing the prokaryotic cell architecture. See The Bootstrapping Algorithm of Carbon-Based Life.
2. The rise of the complex eukaryotic cell architecture about 2 billion years ago. See The Rise of Complexity in Living Things and Software.
3. The rise of multicellular organisms consisting of millions, or billions, of eukaryotic cells all working together in the Ediacaran about 635 million years ago. See Software Embryogenesis.

The evolutionary history of software on the Earth has converged upon a very similar historical path through Design Space because software also had to battle with the second law of thermodynamics in a highly nonlinear Universe - see The Fundamental Problem of Software for more on that. Software progressed through these similar architectures:

1. The origin of simple unstructured prokaryotic software on Konrad Zuse's Z3 computer in May of 1941 - 2.4 billion seconds ago.
2. The rise of structured eukaryotic software in 1972 - 1.4 billion seconds ago.
3. The rise of object-oriented software (software using multicellular organization) in 1995 - 694 million seconds ago

For more details on the above evolutionary history of software see the SoftwarePaleontology section of SoftwareBiology.

Genetic Material is the Source Code of Carbon-Based Life
Before proceeding with seeing how the historical evolution of reusable code in software lends credence to the theory of facilitated variation for carbon-based life, we need to cover a few basics. We first need to determine what is the software equivalent of genetic material. The genetic material of software is called source code. Like genes strung out along the DNA of chromosomes, source code is a set of instructions that really cannot do anything on its own. The source code has to be first compiled into an executable file, containing the primitive machine instructions for a computer to execute, before it can be run by a computer to do useful things. Once the executable file is loaded into a computer and begins to run, it finally begins to do things and displays its true phenotype. For example, when you double-click on an icon on your desktop, like Microsoft Word, you are loading the Microsoft Word winword.exe executable file into the memory of your computer where it begins to execute under a PID (process ID). For example, after you double-click the Microsoft Word icon on your desktop, you can use CTRL-ALT-DEL to launch the Windows Task Manager, and then click on the Processes tab to find the winword.exe running under a specific PID. This compilation process is very similar to the transcription process used to form proteins by stringing together amino acids in the proper sequence. The output of the DNA transcription process is an executable protein that can begin processing organic molecules the moment it folds up into its usable form and is similar to the executable file that results from compiling the source code of a program.

For living things, of course, the equivalent of source code is the genes stored on stretches of DNA. For living things, in order to do something useful, the information in a gene, or stretch of DNA, has to be first transcribed into a protein because nearly all of the biological functions of carbon-based life are performed by proteins. This transcription process is accomplished by a number of enzymes, proteins that have a catalytic ability to speed up biochemical reactions. The sequence of operations aided by enzymes goes like this:

DNA → mRNA → tRNA → Amino Acid chain → Protein

More specifically, a protein is formed by combining 20 different amino acids into different sequences, and on average it takes about 400 amino acids strung together to form a functional protein. The information to do that is encoded in base pairs running along a strand of DNA. Each base can be in one of four states – A, C, G, or T, and an A will always be found to pair with a T, while a C will always pair with a G. So DNA is really a 2 track tape with one data track and one parity track. For example, if there is an A on the DNA data track, you will find a T on the DNA parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.

Figure 3 – DNA is a two-track tape, with one data track and one parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.

Now a single base pair can code for 4 different amino acids because a single base pair can be in one of 4 states. Two base pairs can code for 4 x 4 = 16 different amino acids, which is not enough. Three base pairs can code for 4 x 4 x 4 = 64 amino acids which are more than enough to code for 20 different amino acids. So it takes a minimum of three bases to fully encode the 20 different amino acids, leaving 44 combinations left over for redundancy. Biologists call these three base pair combinations a “codon”, but a codon really is just a biological byte composed of three biological bits or base pairs that code for an amino acid. Actually, three of the base pair combinations, or codons, are used as STOP codons – TAA, TAG and TGA which are essentially end- of-file markers designating the end of a gene along the sequential file of DNA. As with magnetic tape, there is a section of “junk” DNA between genes along the DNA 2 track tape. According to Shannon’s equation, a DNA base contains 2 bits of information, so a codon can store 6 bits. For more on this see Some More Information About Information.

Figure 4 – Three bases combine to form a codon, or a biological byte, composed of three biological bits, and encodes the information for one amino acid along the chain of amino acids that form a protein.

The beginning of a gene is denoted by a section of promoter DNA that identifies the beginning of the gene, like the CustomerID field on a record, and the gene is terminated by a STOP codon of TAA, TAG or TGA. Just as there was a 0.50 inch gap of “junk” tape between blocks of records on a magnetic computer tape, there is a section of “junk” DNA between each gene along the 6 feet of DNA tape found within human cells.

Figure 5 - On average, each gene is about 400 codons long and ends in a STOP codon TAA, TAG or TGA which are essentially end-of-file markers designating the end of a gene along the sequential file of DNA. As with magnetic tape, there is a section of “junk” DNA between genes which is shown in grey above.

In order to build a protein, genes are first transcribed to an I/O buffer called mRNA. The 2 track DNA file for a gene is first opened near the promoter of a gene and an enzyme called RNA polymerase then begins to copy the codons or biological bytes along the data track of the DNA tape to an mRNA I/O buffer. The mRNA I/O buffer is then read by a ribosome read/write head as it travels along the mRNA I/O buffer. The ribosome read/write head reads each codon or biological byte of data along the mRNA I/O buffer and writes out a chain of amino acids as tRNA brings in one amino acid after another in the sequence specified by the mRNA I/O buffer.

Figure 6 - In order to build a protein, genes are first transcribed to an I/O buffer called mRNA. The 2 track DNA file for a gene is first opened near the promoter of a gene and an enzyme called RNA polymerase then begins to copy the codons or biological bytes along the data track of the DNA tape to the mRNA I/O buffer. The mRNA I/O buffer is then read by a ribosome read/write head as it travels along the mRNA I/O buffer. The ribosome read/write head reads each codon or biological byte of data along the mRNA I/O buffer and writes out a chain of amino acids as tRNA brings in one amino acid after another in the sequence specified by the mRNA I/O buffer.

The above is a brief synopsis of how simple prokaryotic bacteria and archaea build proteins from the information stored in DNA. The process for eukaryotes is a bit more complex because eukaryotes have genes containing exons and introns. The exons code for the amino acid sequence of a protein, while the introns do not. For more on that and a more detailed comparison of the processing of genes on 2 track DNA and the processing of computer data on 9 track magnetic tapes back in the 1970s and 1980s see: An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer.

Once an amino acid chain has folded up into a 3-D protein molecule, it can then perform one of the functions of life. The total of genes that are used by a particular species is known as the genome of the species, and the specific variations of those genes used by an individual are the genotype of the individual. The specific physical characteristics that those particular genes produce is called the phenotype of the individual. For example, there is a certain variation of a human gene that produces blue eyes and a certain variation that produces brown eyes. If you have two copies of the blue-eyed gene, one from your father and one from your mother, you end up as a phenotype with blue eyes, otherwise, you will end up a brown-eyed phenotype with any other combination of genes.

The Imperative Need for the Use of Reusable Code in the Evolution of Software
It is time to look at some source code. Below are three examples of the source code to compute the average of some numbers that you enter from your keyboard when prompted. The programs are written in the C, C++, and Java programming languages. Please note that modern applications now consist of many thousands to many millions of lines of code. The simple examples below are just for the benefit of our non-IT readers to give them a sense of what is being discussed when I describe the compilation of source code into executable files.


Figure 7 – Source code for a C program that calculates an average of several numbers entered at the keyboard.


Figure 8 – Source code for a C++ program that calculates an average of several numbers entered at the keyboard.


Figure 9 – Source code for a Java program that calculates an average of several numbers entered at the keyboard.

The strange thing is that, although the source code for each of the three small programs above looks somewhat similar, closer inspection shows that they are not identical, yet all three produce exactly the same phenotype when compiled and run. You would not be able to tell which was which by simply running the executable files for the programs because they all have the same phenotype behavior. This is especially strange because the programs are written in three different programming languages! Similarly, you could change the names of the variables "sum" and "count" to "total" and "num_values" in each program, and they would all compile into executables that produced exactly the same phenotype when run. The same is true for the genes of carbon-based life. Figure 10 down below reveals that the translation table of carbon-based life is highly redundant, so that the following codons TTA, TTG, CTT, CTC, CTA and CTG marked in green all code for the same amino acid leucine, and therefore, could be replaced in a gene by any of these six codons without changing the gene at all. Thus there potentially are a huge number of genes that could code for exactly the same protein.

Figure 10 – A closer look at Figure 4 reveals that the translation table of carbon-based life is highly redundant, so that the following codons TTA, TTG, CTT, CTC, CTA and CTG marked in green all code for the same amino acid leucine, and therefore, could be replaced in a gene by any of these six codons without changing the gene at all.

Based upon the above analysis, it would seem that producing useful software source code, or useful DNA genes, should be nearly a slam dunk because there appears to be nearly an infinite number of ways to produce the same useful executable file or the same useful protein from a phenotypic perspective. But for anybody who has ever tried to write software, that is clearly not the case, and by inference, it cannot be the case for genes either. The real problem with both genes and source code comes when you try to make some random substitutions, additions or deletions to an already functioning gene or piece of source code. For example, if you take the source code file for an apparently bug-free program and scramble it with some random substitutions, additions or deletions, the odds are that you will most likely end up with something that no longer works. You end up with a low-information high-entropy mess. That is because the odds of creating an even better version of the program by means of inserting random substitutions, additions or deletions are quite low while turning it into a mess are quite high. There are simply too many ways of messing up the program to win at that game, and the same goes for making random substitutions, additions or deletions for genes as well. This is simply the effects of the second law of thermodynamics in action. The second law of thermodynamics constantly degrades low-entropy useful information into high-entropy useless information - see: The Demon of Software for more on that.

For example, I first started programming software back in 1972 as a physics major at the University of Illinois in Urbana, and this reminds me of an incident from 1973, when I and two other geophysics graduate students at the University of Wisconsin in Madison tried to convert 100 lines of Fortran source code into a Basic program that we could run on our DEC PDP/8e minicomputer. We were trying to convert a 100 line Fortran program for an FFT (Fast Fourier Transform) that we had found listed in a textbook into the Basic programming language that ran on our DEC PDP/8e minicomputer. Since Fortran and Basic were so similar, we figured that converting the program would be a cinch, and we confidently started to punch the program into the teletype machine connected to the DEC PDP/8e minicomputer, using the line editor that came with the minicomputer. We were all novice programmers at the time, each with just one class in Fortran programming under our belts, but back in those days that made us all instant experts in the new science of computer science in the view of our elderly professors. So being rather naïve, we thought this would be a relatively easy task because all we had to do was to translate each line of Fortran source code into a corresponding line of Basic source code. We then eagerly plunged into programming off the top of our heads. However, much to our surprise, our very first Basic FFT program abended on its very first run! I mean all we had to do was to make some slight syntax changes, like asking an Englishman for the location of the nearest petrol station while motoring through the English countryside. Finally, after about 6 hours of intense and very frustrating labor, we finally had our Basic FFT program spitting out the correct answers. That early adventure with computers actually made me long for my old trusty sliderule! You see, that was the very first time I had ever actually tried to get a computer to do something that I really wanted it to do, rather than just completing one of those pesky assignments in my Fortran class at the University of Illinois. I guess that I had mistakenly figured that the programs assigned in my Fortran class were especially tricky because the professor was just trying to winnow out the real losers so that they would not continue on in computer science! I was pretty young back then.

As outlined in The Fundamental Problem of Software, it turns out that we young graduate students, like all IT professionals throughout history, had been fighting the second law of thermodynamics in a highly nonlinear Universe. The second law guarantees that for any valid FFT program, there are nearly an infinite number of matching invalid FFT programs, so the odds are whenever you make a small change to some functioning software source code, in hopes of improving it, you will instead end up with some software that no longer works at all! That is because the Universe is highly nonlinear in nature, so small erroneous changes in computer source code usually produce catastrophic behaviors in the phenotypes of the executable files that they generate - see Software Chaos for more on that. Similarly, the same goes for genes that produce functioning proteins. Small changes to those genes are not likely to produce proteins that are even better at what they do. Instead, small changes to genes will likely produce proteins that are totally nonfunctional, and since the Universe is highly nonlinear in nature, those nonfunctional proteins are likely to be lethal.

Figure 11 – Some graduate students huddled around a DEC PDP-8/e minicomputer entering program source code into the machine. Notice the teletype machines in the foreground on the left. Our machine cost about $30,000 in 1973 dollars ($166,000 in 2017 dollars) and was about the size of a large side-by-side refrigerator, with 32K of magnetic core memory. I just checked, and you can now buy a laptop online with 4 GB of memory (131,072 times as much memory) for $179.99.

So how do you deal with such a quandary when trying to produce innovative computer software or genes? Well, back in 1973 when we finally had a functioning Basic FFT program, we quickly turned it into a Basic subroutine that could be reused by all. It turned out that nearly all of the graduate students in the Geophysics Department at the University of Wisconsin at Madison who were using the DEC PDP/8e minicomputer for their thesis work needed to run FFTs, so this Basic FFT subroutine was soon reused by them all. Of course, we were not the first to discover the IT trick of using reusable code. Anybody who has ever done some serious coding soon discovers the value of building up a personal library of reusable code to draw upon when writing new software. The easiest way to reuse computer source code is to simply "copy/paste" sections of code into the source code that you are working on, but that was not so easy back in the 1950s and 1960s because people were programming on punch cards.

Figure 12 - Each card could hold a maximum of 80 bytes. Normally, one line of code was punched onto each card.

Figure 13 - The cards for a program were held together into a deck with a rubber band, or for very large programs, the deck was held in a special cardboard box that originally housed blank cards. Many times the data cards for a run followed the cards containing the source code for a program. The program was compiled and linked in two steps of the run and then the generated executable file processed the data cards that followed in the deck.

Figure 14 - To run a job, the cards in a deck were fed into a card reader, as shown on the left above, to be compiled, linked, and executed by a million dollar mainframe computer with a clock speed of about 750 KHz and about 1 MB of memory.

Figure 15 - The cards were cut on an IBM 029 keypunch machine, like the one I learned to program Fortran on back in 1972 at the University of Illinois. The IBM 029 keypunch machine did have a way of duplicating a card by feeding the card to be duplicated into the machine, while at the same time, a new blank card was registered in the machine ready for punching, and the information on the first card was then punched on to the second card. This was a very slow and time-consuming way to copy reusable code from one program to another. But there was another machine that could do the same thing for a whole deck of cards all at once, and that machine was much more useful in duplicating existing cards that could then be spliced into the card deck that you were working on.

Take note that the IBM 029 keypunch machine that was used to punch cards did allow you to copy one card at a time, but that was a rather slow way to copy cards for reuse. But there was another machine that could read an entire card deck all at once, and punch out a duplicate card deck of the original in one shot. That machine made it much easier to punch out the cards from an old card deck and splice the copied cards into the card deck that you were working on. Even so, that was a fairly clumsy way of using reusable code, so trying to use reusable code back during the unstructured prokaryotic days of IT prior to 1972 was difficult at best. Also, the chaotic nature of the unstructured prokaryotic "spaghetti code" of the 1950s and 1960s made it difficult to splice in reusable code on cards.

This all changed in the early 1970s with the rise of structured eukaryotic source code that divided functions up amongst a set of subroutines, or organelles, and the arrival of the IBM 3278 terminal and the IBM ISPF screen editor on TSO in the late 1970s that eliminated the need to program on punch cards. One of the chief characteristics of structured programming was the use of "top-down" programming, where programs began execution in a mainline routine that then called many other subroutines. The purpose of the mainline routine was to perform simple high-level logic that called the subordinate subroutines in a fashion that was easy to follow. This structured programming technique made it much easier to maintain and enhance software by simply calling subroutines from the mainline routine in the logical manner required to perform the needed tasks, like assembling Lego building blocks into different patterns that produced an overall new structure. The structured approach to software also made it much easier to reuse software. All that was needed was to create a subroutine library of reusable source code that was already compiled. A mainline program that made calls to the subroutines of the subroutine library was compiled as before. The machine code for the previously compiled subroutines was then added to the resulting executable file by a linkage editor. This made it much easier for structured eukaryotic programs to use reusable code by simply putting the software "conserved core processes" into already compiled subroutine libraries. The ISPF screen editor running under TSO on IBM 3278 terminals also made it much easier to reuse source code because now many lines of source code could be simply copied from one program file to another, with the files stored on disk drives, rather than on punch cards or magnetic tape.

Figure 16 - The IBM ISPF full screen editor ran on IBM 3278 terminals connected to IBM mainframes in the late 1970s. ISPF was also a screen-based interface to TSO (Time Sharing Option) that allowed programmers to do things like copy files and submit batch jobs. ISPF and TSO running on IBM mainframes allowed programmers to easily reuse source code by doing copy/paste operations with the screen editor from one source code file to another. By the way, ISPF and TSO are still used today on IBM mainframe computers and are fine examples of the many "conserved core processes" to be found in software.

The strange thing is that most of the core fundamental processes of computer data processing were developed back in the 1950s, such as sort and merge algorithms, during the unstructured prokaryotic period, but as we have seen they could not be easily reused because of technical difficulties that were not alleviated until the structured eukaryotic period of the early 1970s. Similarly, many of the fundamental "conserved core processes" of carbon-based life were also developed during the first 2.5 billion years of the early Earth by the prokaryotic bacteria and archaea. It became much easier for these "conserved core processes" to be reused when the eukaryotic cell architecture arose about 2 billion years ago.

Figure 17 – The Quicksort algorithm was developed by Tony Hoare in 1959 and applies a prescribed sequence of actions to an input list of unsorted numbers. It very efficiently outputs a sorted list of numbers, no matter what unsorted input it operates upon.

Multicellular Organization and the Major Impact of Reusable Code
But as outlined in the theory of facilitated variation the real impact of reusable code was not fully felt until carbon-based life stumbled upon the revolutionary architecture of multicellular organization, and the same was true in the evolutionary history of software. In software, multicellular organization is obtained through the use of object-oriented programming languages. Some investigation reveals that object-oriented programming has actually been around since 1962, but that it did not catch on at first. In the late 1980s, the use of the very first significant object-oriented programming language, known as C++, started to appear in corporate IT, but object-oriented programming really did not become significant in IT until 1995 when both Java and the Internet Revolution arrived at the same time. The key idea in object-oriented programming is naturally the concept of an object. An object is simply a cell. Object-oriented languages use the concept of a Class, which is a set of instructions for building an object (cell) of a particular cell type in the memory of a computer. Depending upon whom you cite, there are several hundred different cell types in the human body, but in IT we generally use many thousands of cell types or Classes in commercial software. For a brief overview of these concepts go to the webpage below and follow the links by clicking on them.

Lesson: Object-Oriented Programming Concepts
http://docs.oracle.com/javase/tutorial/java/concepts/index.html

A Class defines the data that an object stores in memory and also the methods that operate upon the object data. Remember, an object is simply a cell. Methods are like biochemical pathways that consist of many steps or lines of code. A public method is a biochemical pathway that can be invoked by sending a message to a particular object, like using a ligand molecule secreted from one object to bind to the membrane receptors on another object. This binding of a ligand to a public method of an object can then trigger a cascade of private internal methods within an object or cell.

Figure 18 – A Class contains the instructions for building an object in the memory of a computer and basically defines the cell type of an object. The Class defines the data that an object stores in memory and also the methods that can operate upon the object data.

Figure 19 – Above is an example of a Bicycle object. The Bicycle object has three private data elements - speed in mph, cadence in rpm, and a gear number. These data elements define the state of a Bicycle object. The Bicycle object also has three public methods – changeGears, applyBrakes, and changeCadence that can be used to change the values of the Bicycle object’s internal data elements. Notice that the code in the object methods is highly structured and uses code indentation to clarify the logic.

Figure 20 – Above is some very simple Java code for a Bicycle Class. Real Class files have many data elements and methods and are usually hundreds of lines of code in length.

Figure 21 – Many different objects can be created from a single Class just as many cells can be created from a single cell type. The above List objects are created by instantiating the List Class three times and each List object contains a unique list of numbers. The individual List objects have public methods to insert or remove numbers from the objects and also a private internal sort method that could be called whenever the public insert or remove methods are called. The private internal sort method automatically sorts the numbers in the List object whenever a number is added or removed from the object.

Figure 22 – Objects communicate with each other by sending messages. Really one object calls the exposed public methods of another object and passes some data to the object it calls, like one cell secreting a ligand molecule that then plugs into a membrane receptor on another cell.

Figure 23 – In a growing embryo, the cells communicate with each other by sending out ligand molecules called morphogens, or paracrine factors, that bind to the membrane receptors on other cells.

Figure 24 – Calling a public method of an object can initiate the execution of a cascade of private internal methods within the object. Similarly, when a paracrine factor molecule plugs into a receptor on the surface of a cell, it can initiate a cascade of internal biochemical pathways. In the above figure, an Ag protein plugs into a BCR receptor and initiates a cascade of biochemical pathways or methods within a cell.

When a high-volume corporate website, consisting of many millions of lines of code running on hundreds of servers, starts up and begins taking traffic, billions of objects (cells) begin to be instantiated in the memory of the servers in a manner of minutes and then begin to exchange messages with each other in order to perform the functions of the website. Essentially, when the website boots up, it quickly grows to a mature adult through a period of very rapid embryonic growth and differentiation, as billions of objects are created and differentiated to form the tissues of the website organism. These objects then begin exchanging messages with each other by calling public methods on other objects to invoke cascades of private internal methods which are then executed within the called objects - for more on that see Software Embryogenesis.

Object-Oriented Class Libraries
In addition to implementing a form of multicellular organization in software, the object-oriented programming languages, like C++ and Java, brought with them the concept of a class library. A class library consists of the source code for a large number of reusable Classes for a multitude of objects. For example, for a complete list of the reusable objects in the Java programming language see:

Java Platform, Standard Edition 7
API Specification
https://docs.oracle.com/javase/7/docs/api/

The "All Classes" pane in the lower left of the webpage lists all of the built-in classes of the Java programming language.

Figure 25 – All of the built-in Classes of the Java programming language inherit the data elements and methods of the undifferentiated stem cell Object.

The reusable classes in a class library all inherit the characteristics of the fundamental class Object. An Object object is like a stem cell that can differentiate into a large number of other kinds of objects. Below is a link to the Object Class and all of its reusable methods:

Class Object
https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html

For example, a String object is a particular object type that inherits all of the methods of a general Object stem cell and can perform many operations on the string of characters it contains via the methods of a String object.



Figure 26 – A String object contains a number of characters in a string. A String object inherits all of the general methods of the Object Class, and contains many additional methods, like finding a substring in the characters stored in the String object.

The specifics of a String object can be found at:

Class String
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html

Take a look at the Method Summary of the String class. It contains about 50 reusable code methods that can perform operations on the string of characters in a String object. Back in the unstructured prokaryotic times of the 1950s and 1960s, the source code to perform those operations was repeatedly reprogrammed over and over because there was no easy way to reuse source code in those days. But in the multicellular object-oriented period, the use of extensive class libraries allowed programmers to easily assemble the "conserved core processes" that had been developed back in the unstructured prokaryotic source code of the 1950s and 1960s into complex software by simply instantiating objects and making calls to the methods of the objects that they needed. Thus, the new multicellular object-oriented source code, that created and extended the classes found within the already existing class library, took on more of a regulatory nature than the source code of old. This new multicellular object-oriented source code carried out its required logical operations by simply instantiating certain objects in the class library of reusable code, and then choosing which methods needed to be performed on those objects. Similarly, in the theory of facilitated variation, we find that most of the "conserved core processes" of carbon-based life come from pre-Cambrian times before complex multicellular organisms came to be. For example, the authors point out that 79% of the genes for a mouse come from pre-Cambrian times. So only 21% of the mouse genes, primarily those genes that regulate the expression of the other 79% of the genes from pre-Cambrian times, manage to build a mouse from a finite number of "conserved core processes".

Facilitated Variation is Not a Form of Intelligent Design
While preparing this posting, I noticed on the Internet that some see the theory of facilitated variation as a vindication of the concept of intelligent design, but that is hardly the case. The same misunderstanding can arise from the biological findings of softwarephysics. Some tend to dismiss the biological findings of softwarephysics because software is currently a product of the human mind, while biological life is not a product of intelligent design. Granted, biological life is not a product of intelligent design, but neither is the human mind. The human mind and biological life are both the result of natural processes at work over very long periods of time. This objection simply stems from the fact that we are all still, for the most part, self-deluded Cartesian dualists at heart, with seemingly a little “Me” running around within our heads that just happens to have the ability to write software and to do other challenging things. Thus, most human beings do not think of themselves as part of the natural world. Instead, they think of themselves, and others, as immaterial spirits temporarily haunting a body, and when that body dies the immaterial spirit lives on. In this view, human beings are not part of the natural world. Instead, they are part of the supernatural. But in softwarephysics, we maintain that the human mind is a product of natural processes in action, and so is the software that it produces. For more on that see The Ghost in the Machine the Grand Illusion of Consciousness.

Still, I realize that there might be some hesitation to pursue this line of thought because it might be construed by some as an advocacy of intelligent design, but that is hardly the case. The evolution of software over the past 76 years has essentially been a matter of Darwinian inheritance, innovation and natural selection converging upon similar solutions to that of biological life. For example, it took the IT community about 60 years of trial and error to finally stumble upon an architecture similar to that of complex multicellular life that we call SOA – Service Oriented Architecture. The IT community could have easily discovered SOA back in the 1960s if it had adopted a biological approach to software and intelligently designed software architecture to match that of the biosphere. Instead, the worldwide IT architecture we see today essentially evolved on its own because nobody really sat back and designed this very complex worldwide software architecture; it just sort of evolved on its own through small incremental changes brought on by many millions of independently acting programmers through a process of trial and error. When programmers write code, they always take some old existing code first and then modify it slightly by making a few changes. Then they add a few additional new lines of code and test the modified code to see how far they have come. Usually, the code does not work on the first attempt because of the second law of thermodynamics, so they then try to fix the code and try again. This happens over and over until the programmer finally has a good snippet of new code. Thus, new code comes into existence through the Darwinian mechanisms of inheritance coupled with innovation and natural selection - for more on that see How Software Evolves.

Some might object that this coding process of software is actually a form of intelligent design, but that is not the case. It is important to differentiate between intelligent selection and intelligent design. In softwarephysics we extend the concept of natural selection to include all selection processes that are not supernatural in nature, so for me, intelligent selection is just another form of natural selection. This is really nothing new. Predators and prey constantly make “intelligent” decisions about what to pursue and what to evade, even if those “intelligent” decisions are only made with the benefit of a few interconnected neurons or molecules. So in this view, the selection decisions that a programmer makes after each iteration of working on some new code really are a form of natural selection. After all, programmers are just DNA survival machines with minds infected with memes for writing software, and the selection processes that the human mind undergo while writing software are just as natural as the Sun drying out worms on a sidewalk or a cheetah deciding upon which gazelle in a herd to pursue.

For example, when IT professionals slowly evolved our current $10 trillion worldwide IT architecture over the past 2.4 billion seconds, they certainly did not do so with the teleological intent of creating a simulation of the evolution of the biosphere. Instead, like most organisms in the biosphere, these IT professionals were simply trying to survive just one more day in the frantic world of corporate IT. It is hard to convey the daily mayhem and turmoil of corporate IT to outsiders. When I first hit the floor of Amoco’s IT department back in 1979, I was in total shock, but I quickly realized that all IT jobs essentially boiled down to simply pushing buttons. All you had to do was to push the right buttons, in the right sequence, at the right time, and with zero errors. How hard could that be? Well, it turned out to be very difficult indeed, and in response, I began to subconsciously work on softwarephysics to try to figure out why this job was so hard, and how I could dig myself out of the mess that I had gotten myself into. After a while, it dawned on me that the fundamental problem was the second law of thermodynamics operating in a nonlinear simulated universe. The second law made it very difficult to push the right buttons in the right sequence and at the right time because there were so many erroneous combinations of button pushes. Writing and maintaining software was like looking for a needle in a huge utility phase space. There just were nearly an infinite number of ways of pushing the buttons “wrong”. The other problem was that we were working in a very nonlinear utility phase space, meaning that pushing just one button incorrectly usually brought everything crashing down. Next, I slowly began to think of pushing the correct buttons in the correct sequence as stringing together the correct atoms into the correct sequence to make molecules in chemical reactions that could do things. I also knew that living things were really great at doing that. Living things apparently overcame the second law of thermodynamics by dumping entropy into heat as they built low entropy complex molecules from high entropy simple molecules and atoms. I then began to think of each line of code that I wrote as a step in a biochemical pathway. The variables were like organic molecules composed of characters or “atoms” and the operators were like chemical reactions between the molecules in the line of code. The logic in several lines of code was the same thing as the logic found in several steps of a biochemical pathway, and a complete function was the equivalent of a full-fledged biochemical pathway in itself. For more on that see Some Thoughts on the Origin of Softwarephysics and Its Application Beyond IT and SoftwareChemistry.

Conclusion
The key finding of softwarephysics is that it is all about self-replicating information:

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

Basically, we have seen five waves of self-replicating information come to dominate the Earth over the past four billion years:

1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Software is now rapidly becoming the dominant form of self-replicating information on the planet and is having a major impact on mankind as it comes to predominance. For more on that see: A Brief History of Self-Replicating Information.

So like carbon-based life and software, memes are also forms of self-replicating information that heavily use the concept of reusable code found in the theory of facilitated variation. Many memes are just composites of a number of "conserved core memes" that are patched together by some regulatory memes into something that appears to be unique and can stand on its own. Just watch any romantic comedy, or listen to any political stump speech. For example, about 90% of this posting on reusable code is simply reusable text that I pulled from other softwarephysics postings and patched together in a new sequence using a small amount of new regulatory text. It is just a facilitated variation on a theme, like Johannes Brahms' Variations on a Theme of Paganini (1863):

https://www.youtube.com/watch?v=1EIE78D0m1g&t=1s.

or Rachmaninoff ‘s The Rhapsody on a Theme of Paganini (1934):

https://www.youtube.com/watch?v=HvKTPDg0IW0&t=1s

reusing Paganini's original Caprice no.24 (1817):

https://www.youtube.com/watch?v=PZ307sM0t-0&t=1s.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
http://softwarephysics.blogspot.com/

Regards,
Steve Johnston