The RNA-World Hypothesis holds that self-replicating RNA molecules came before DNA or proteins appeared on the early Earth. In this post, I would like to cover a few recent papers in support of the RNA-World Hypothesis and also use the findings from these papers to comment on the current parasitic/symbiotic relationships between social media software and the cultures of the world. I would also like to incorporate this work into my current working hypothesis for the origin of carbon-based life on the Earth.
The key finding of softwarephysics is that it is all about the powers of self-replicating information to survive in a Universe dominated by the second law of thermodynamics and nonlinearity. Softwarephysics suggests that for those interested in such diverse topics as the very significant impact that social media software is now making on the cultures of the world, the origin of carbon-based life on the Earth, why writing and maintaining software is so difficult and what Advanced AI software may be leading us to should first study the general characteristics that all forms of self-replicating information seem to share. This would be of great value in trying to understand the very complex parasitic/symbiotic relationships between the cultural memes of the world with social media. To expand on that, let me once again repeat the fundamental characteristics of self-replicating information for those of you new to softwarephysics.
Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.
Over the past 4.56 billion years we have seen five waves of self-replicating information sweep across the surface of the Earth and totally rework the planet, as each new wave came to dominate the Earth:
1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software
Software is currently the most recent wave of self-replicating information to arrive upon the scene and is rapidly becoming the dominant form of self-replicating information on the planet. For more on the above see A Brief History of Self-Replicating Information. Recently, the memes and software have formed a very powerful newly-formed parasitic/symbiotic relationship with the rise of social media software. In that parasitic/symbiotic relationship, the memes are now mainly being spread by means of social media software and social media software is being spread and financed by means of the memes. But again, this is nothing new. All 5 waves of self-replicating information are all coevolving by means of eternal parasitic/symbiotic relationships. For more on that see The Current Global Coevolution of COVID-19 RNA, Human DNA, Memes and Software.
Again, self-replicating information cannot think, so it cannot participate in a conspiracy-theory-like fashion to take over the world. All forms of self-replicating information are simply forms of mindless information responding to the blind Darwinian forces of inheritance, innovation and natural selection. Yet despite that, as each new wave of self-replicating information came to predominance over the past four billion years, they all managed to completely transform the surface of the entire planet, so we should not expect anything less from software as it comes to replace the memes as the dominant form of self-replicating information on the planet. But this time might be different. What might happen if software does eventually develop a Mind of its own? After all, that does seem to be the ultimate goal of all the current AI software research that is going on.
The Characteristics of Self-Replicating Information
All forms of self-replicating information have some common characteristics:
1. All self-replicating information evolves over time through the Darwinian processes of inheritance, innovation and natural selection, which endows self-replicating information with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity.
2. All self-replicating information begins spontaneously as a parasitic mutation that obtains energy, information and sometimes matter from a host.
3. With time, the parasitic self-replicating information takes on a symbiotic relationship with its host.
4. Eventually, the self-replicating information becomes one with its host through the symbiotic integration of the host and the self-replicating information.
5. Ultimately, the self-replicating information replaces its host as the dominant form of self-replicating information.
6. Most hosts are also forms of self-replicating information.
7. All self-replicating information has to be a little bit nasty in order to survive.
8. The defining characteristic of self-replicating information is the ability of self-replicating information to change the boundary conditions of its utility phase space in new and unpredictable ways by means of exapting current functions into new uses that change the size and shape of its particular utility phase space. See Enablement - the Definitive Characteristic of Living Things for more on this last characteristic. That posting discusses Stuart Kauffman's theory of Enablement in which living things are seen to exapt existing functions into new and unpredictable functions by discovering the “AdjacentPossible” of springloaded preadaptations.
Note that because the self-replicating autocatalytic metabolic pathways of organic molecules, RNA and DNA have become so heavily intertwined over time that now I sometimes simply refer to them as the “genes”.
Softwarephysics maintains that all apparent Design in the Universe arises from Universal Darwinism as the Darwinian processes of inheritance, innovation and natural selection operate on various forms of self-replicating information that are also all coevolving with each other on this planet. Currently, we are all living in one of those very rare times when a new form of self-replicating information, in the form of software, is now rapidly coming to predominance as it overtakes the memes as the dominant form of self-replicating information on the planet. On a daily basis, we are all witnessing the very complex parasitic/symbiotic interactions between the memes and software as the two battle for predominance. This is especially true for the parasitic/symbiotic relationships between social media software and the cultural memes of the times. In today's world, nearly all cultural memes are now replicated by means of social media software and social media software is replicated by all the cultural memes of the world desperately trying to self-replicate. After all, all cultural memes are just one generation away from extinction, and those cultural memes that did not have the built-in characteristics to deal with that fact are now long gone. Most dangerously for the future of mankind, this is especially true of the parasitic/symbiotic interactions between the political-cultural memes and the social media platforms of the world that are currently tearing apart, with abandon, the very social fabric of the United States of America and many other nations around the world.
So in this post, I would like to cover the very first ancient parasitic/symbiotic interactions between the original self-replicating autocatalytic metabolic pathways of organic molecules on the early Earth and RNA that brought forth an RNA-World on the Earth about 4.0 billion years ago. Looking back to this very first transition to predominance of one form of self-replicating information to another form of self-replicating information might be of help with our current transition to the rise of software to predominance as the most prominent form of self-replicating information on the planet. But since, in my view, the transition of self-replicating autocatalytic metabolic pathways of organic molecules to dominance by RNA was not the very first step in the rise of carbon-based life on the Earth, let me first recap my current working hypothesis for the origin of carbon-based life on the planet.
My Current Working Hypothesis for the Origin of Carbon-Based Life on the Earth
In my personal working hypothesis for the rise of carbon-based life on the Earth about 4.0 billion years ago, I currently favor the Hot Spring Origins Hypothesis of Dave Deamer and Bruce Damer out of the University of California at Santa Cruz that suggests that a rocky planet like the Earth is a necessary condition to bring forth carbon-based life. Such a planet also requires the presence of liquid water on its surface, but not too much water. In the Hot Spring Origins Hypothesis, a rocky planet requires some water but also some dry land in order to bring forth carbon-based life. There needs to be some dry land that allows for the organic molecules in volcanic hydrothermal pools to periodically dry out and condense organic monomers into long polymer chains of organic molecules. For more on that see The Bootstrapping Algorithm of Carbon-Based Life. Thus, the Hot Spring Origins Hypothesis rules out water-worlds that are completely covered by a deep worldwide ocean as a home for carbon-based life even if the water-world resides in the habitable zone of a planetary system because there is no dry land for volcanic hydrothermal pools to form and dry out to condense organic monomers into polymers. The Hot Spring Origins Hypothesis also rules out the origin of carbon-based life at the hydrothermal vents of water worlds at the bottoms of oceans because the continuous presence of water tends to dissolve and break apart the organic polymers of life.
Figure 1 – Above is Bumpass Hell, a hydrothermal field on the volcanic Mount Lassen in California that Dave Deamer and Bruce Damer cite as a present-day example of the type of environment that could have brought forth carbon-based life about four billion years ago.
Dave Deamer is best known for his work on the Membrane-First Hypothesis for the origin of carbon-based life on the Earth. The Membrane-First Hypothesis maintains that in order for carbon-based life to arise from complex organic molecules we first need something with a definable "inside" and "outside" that lets the stuff on the "inside" interact with the stuff on the "outside" in a controlled manner.
Figure 2 – A cell membrane consists of a phospholipid bilayer with embedded molecules that allow for a controlled input-output to the cell. Once we have a membrane, we can fill the "inside" with organic molecules that are capable of doing things that then interact with organic molecules on the "outside".
Figure 3 – Water molecules are polar molecules that have a positive end and a negative end because oxygen atoms attract the bonding electrons more strongly than do the hydrogen atoms. The positive ends of water molecules attract the negative ends of other water molecules to form a loosely coupled network of water molecules with a minimum of free energy.
Figure 4 – How soap and water work. The lipids in a bar of soap have water-loving polar heads and water-hating nonpolar tails. When in water, the soap lipids can form a spherical micelle that has all of the water-hating nonpolar tails facing inwards. Then the spherical micelles can surround the greasy nonpolar molecules of body oils and allow them to be flushed away by a stream of polar water molecules. The lipids in a bar of soap can also form a cell-like liposome with a bilayer of lipid molecules that can surround the monomers and polymers of life.
Similarly, in The Role of Membranes in the Evolution of Software, I explained how the isolation of processing functions within membranes progressed as the architecture of software slowly evolved over the past 81 years or 2.55 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941. As I outlined in SofwareChemistry, as a programmer, your job is to assemble characters (atoms) into variables (molecules) that interact in lines of code to perform the desired functions of the software under development. During the Unstructured Period (1955 - 1975), we ran very tiny prokaryotic programs that ran in less than 128 KB of memory with very little internal structure. These very tiny programs communicated with each other in a batch job stream via sequential files on input/output tapes that passed from one small program to another. Then, during the Structured Period (1975 - 1995) programs exploded in size to become many megabytes in size and structured programming came about in which the mainline() of a program called many subroutines() or functions() that were isolated from the mainline() by functional membranes. When the Object-Oriented Period came along in 1995, software architecture evolved to using membrane-enclosed objects() that contained a number of membrane-enclosed methods() to process information. Later such Objects() were distributed across a number of physical servers, and, most recently, they have been moved to the Cloud as cloud-based microservices.
All the Way with RNA
Now once we have a naturally-forming membrane to isolate the "inside" from the "outside", we next need to fill the "inside" with something that does things that can then interact with things on the "outside". Previously, in The Origin of Software the Origin of Life we examined Stuart Kauffman’s ideas about how naturally-forming Boolean nets of autocatalytic chemical reactions might have kick-started the whole thing off as an emergent behavior of an early chaotic pre-biotic environment on Earth, solely with the aid of the extant organic molecules of the day. Now, if networks of autocatalytic chemical reactions found themselves surrounded and protected by the membranes of Dave Deamer's Membrane-First work on the origin of carbon-based life on the Earth, they might be ripe for attack by RNA parasites as outlined in Freeman Dyson's two-step hypothesis for the origin of carbon-based life on the Earth that he proposed in Origins of Life (1999). In Self-Replicating Information, I described Freeman Dyson’s two-step theory for the origin of carbon-based life on the Earth. In Dyson’s two-step theory, metabolic protocells first arise and are later parasitized, first by RNA, and then by DNA parasitizing the RNA even further. In this view, Membrane-First protocells first provide a refuge for networks of autocatalytic chemical reactions to exist in a protective membrane with an "inside" that protects them from a dangerous "outside". The protocell membrane also allows for the selective entry of valuable extant organic molecules to become the feedstock for the networks of autocatalytic chemical reactions. As a professional IT developer, you also do the same thing. First, you define an empty programming Function(){......} or an Object Method(){......} with calling parameters, and then you fill the Function() or Object Method() with lines of code that do things. The calling parameters in the Function() or Method() define a set of selected entry values for the Function(){......} or Object Method(){......} and the lines of code within process the selected entry values.
But like all forms of near-life, such protocells were subject to attack by parasitic molecules such as RNA. But then the question becomes, where did these parasitic molecules of RNA come from? Did they first arise from within the protocells containing self-replicating autocatalytic networks of organic molecules surrounded by protective membranes of phospholipids, or did the RNA parasites arise from the "outside" and then invaded the already-existing protocells? Before trying to answer that question we need to first look at the structure and function of RNA as it stands today.
Figure 5 – The structure of RNA.
Figure 6 – A closeup of one RNA nucleotide or RNA "bit"
Figure 7 – In the current world, RNA is used as an I/O buffer to construct protein molecules from amino acids. Ribosomes are complex read/write heads composed of RNA and some additional protein molecules. When constructing a protein molecule from amino acids, a ribosome read/write head behaves much like the read/write head of a Turing Machine. The ribosome reads an mRNA(messenger RNA) tape that was previously transcribed from a section of DNA tape that encoded the information for a gene. The ribosome read/write head then reads the A, C, G, and U nucleobase "RNA bits" that code for amino acids three at a time. As each 3-bit RNA-byte is read on the mRNA tape, the ribosome writes out an amino acid to a growing polypeptide chain, as tRNA units bring in one amino acid at a time. The polypeptide chain then goes on to fold up into a 3-D protein molecule.
RNA is composed of three molecular LEGO blocks stuck together - a phosphate group, a ribose sugar and a base. There are four different bases adenine (A), cytosine (C), guanine (G) and uracil (U). Thus, each RNA nucleotide can be considered to be an RNA "bit" that can be in one of four states, A, C, G or U. Now, all of these molecular LEGO blocks have been found in interstellar molecular clouds and meteorites that have fallen to the Earth but the real problem is how do you put them all together to produce a form of parasitic RNA that could successfully replicate itself within a protocell surrounded by a phospholipid membrane? The long-standing "which came first, the chicken or the egg" riddle in biology has always been that currently, all forms of carbon-based life produce RNA with the aid of some protein molecules that do things in the replication process. As shown in Figure 7, we currently need some RNA and proteins in a ribosome to make proteins. So how could a parasitic form of self-replicating RNA ever come to be? Essentially, we need to find a way for some biological software to bootstrap itself into existence, all by itself, by writing itself into existence on its own. Now, all IT professionals should know from experience that creating, debugging, implementing and maintaining such a piece of biological software on a fragile strand of RNA would not be an easy thing to do, so let's explore that problem next.
But Why is Creating and Maintaining Software so Difficult?
As we saw in The Fundamental Problem of Software the second law of thermodynamics operating in a nonlinear Universe makes it very difficult to write and maintain software. This is largely due to the second law of thermodynamics introducing small bugs into software whenever software is written or changed and also to the nonlinear nature of software that allows small software bugs to frequently produce catastrophic effects. And the same goes when trying to construct a successful form of parasitic RNA that could replicate itself within a protocell surrounded by a phospholipid membrane.
Anyway, back in 1979 when I first transitioned from being an exploration geophysicist to becoming an IT professional, I soon came to realize that to perform any IT job, all you have to do is push the right buttons, in the right sequence, at the right time and with zero errors. How hard could that be? When I was in Middleware Operations for the Discover credit card company, you also had to be able to push the buttons rather quickly in tense situations during website outages, with impatient upper-level managers listening in on the conference call through the whole thing! As you can imagine, this could make an IT professional rather tense.
Figure 8 - As depicted back in 1962, George Jetson was a computer engineer in the year 2062, who had a full-time job working 3 hours a day, 3 days a week, pushing the same buttons that I pushed for 40 years as an IT professional.
But it was not supposed to be this way. As a teenager growing up in the 1960s, I was led to believe that in the 21st century, I would be leading the life of George Jetson, who first appeared on ABC-TV Sunday nights from September 23, 1962, to March 3, 1963, in 24 episodes that were later replayed for many decades. George Jetson was a computer engineer in the year 2062, who had a full-time job working 3 hours a day, 3 days a week, pushing buttons. This was three years before the IBM OS/360 was introduced in 1965, so who knew things would turn out quite differently in the 21st century? The Jetsons did get some things right, but certainly not the 9-hour IT workweek! Anyway, little did I realize back in 1962 that, like George Jetson, I would be spending most of my adult life just pushing buttons for a living!
Now, why is pushing buttons for a living so hard? Suppose you are paged into a website outage conference call. In order to resolve the problem, I am going to let you push a maximum of 1,000 buttons. This includes pressing the buttons to find the necessary support documents, logging into a large number of servers, looking at many log files, running diagnostics and health checks, and finally punching in the necessary Unix commands to resolve the problem. Now 1,000 buttons might seem like a lot, but it really is not. This posting itself comes to pushing 104,647 buttons in the right sequence. There are 90 buttons on my laptop, so I can push 1,000 buttons in 901,000 different ways. That comes to:
1.75 x 101954 = 175 with 1952 zeroes behind it!
Remember, as an IT professional, all you have to do is to quickly push the right buttons, in the right sequence, at the right time and with zero errors. But as I pointed out in Entropy - the Bane of Programmers and The Demon of Software, it is very difficult indeed, and the problem lies with the second law of thermodynamics. There are only a very few sequences of button pushes in the vast number of possible button pushes that will actually get the job done properly, and the odds of you doing that are quite small indeed. Worse yet, in Software Chaos I showed that pushing just one button in the sequence incorrectly can lead to catastrophic results. Now the right sequence of button pushes already exists, all you have to do is find it and execute it perfectly! As we saw in The Demon of Software, in order to accomplish this, we have to turn some unknown information, called entropy, into known information that we can use, and the only way we can do that is to turn an ordered form of energy into the disordered form of energy we call heat. Some of this heat generation will take place in your brain as you think through the problem. A programmer on a 2400-calorie diet (2400 kcal/day) produces about 100 watts of heat sitting at her desk and about 20 watts of that heat comes from her brain. The remainder of this heat generation will come from pushing the buttons themselves. This puts us in a bit of a bind. The second law of thermodynamics states that it is impossible to turn unknown information, entropy, into useable known information without an accompanying increase in entropy someplace else in the Universe because the total amount of entropy in the Universe must always increase whenever a change is made. So we are forced to extract the correct sequence of button pushes from the vast number of possible button pushes by dumping some entropy into heat as we degrade the high-grade chemical energy in our brains and fingers into heat energy. However, there is another possibility open to us.
All forms of self-replicating information evolve over time using the Darwinian processes of inheritance, innovation and natural selection to blindly find solutions to problems and the same must have been true for the early parasitic RNA molecules that invaded the early protocells. The blind Darwinian processes of inheritance, innovation and natural selection make it possible to push the right sequence of buttons necessary to create a self-replicating strand of RNA that could then invade the already-existing protocells of autocatalytic self-replicating pathways of organic molecules residing within the protective phospholipid membranes of protocells. For the remainder of this post let's explore how that might have happened.
Some New Work on the Rise of an RNA-World 4.0 Billion Years Ago
As I mentioned in many previous posts, one of my most favorite scientific YouTubers is Anton Petrov. Anton reads tons of scientific papers and somehow manages to produce a new YouTube video every day covering some of his latest reads. His videos come out so frequently, that it is hard to keep up, but they are always very excellent. A few weeks back, I watched his YouTube video:
We Just Got Closer To Solving How Life Started on Earth
https://www.youtube.com/watch?v=hdiBC_z9F0M
in which he covered two papers that are available for download. The first paper is at:
Evolutionary transition from a single RNA replicator to a multiple replicator network
https://www.nature.com/articles/s41467-022-29113-x
Abstract
In prebiotic evolution, self-replicating molecules are believed to have evolved into complex living systems by expanding their information and functions open-endedly. Theoretically, such evolutionary complexification could occur through the successive appearance of novel replicators that interact with one another to form replication networks. Here we perform long-term evolution experiments of RNA that replicates using a self-encoded RNA replicase. The RNA diversifies into multiple coexisting host and parasite lineages, whose frequencies in the population initially fluctuate and gradually stabilize. The final population, comprising five RNA lineages, forms a replicator network with diverse interactions, including cooperation to help the replication of all other members. These results support the capability of molecular replicators to spontaneously develop complexity through Darwinian evolution, a critical step for the emergence of life.
This first paper describes an experiment in which the investigators constructed a strip of RNA containing, among other things, the instructions for a ribosome read/write head to construct a protein called Qβ replicase. Qβ replicase can replicate a strand of RNA from an already existing strand of RNA, so Qβ replicase is another read/write head that can essentially read a strip of RNA and write out a copy of the RNA. If the RNA copy also contains the RNA nucleotide "bits" for a ribosome to read and write out even more Qβ replicase protein, then this RNA self-replication process should be able to go on forever so long as a supply of ribosomes and RNA nucleotides is provided. However, RNA is a single-track tape without a parity track that can allow for the correction of copying errors. So RNA replication is very prone to errors and that can be a good thing for a parasite. For example, in recent years we have all have been parasitized by the viral parasitic RNA that causes COVID-19. Because the COVID-19 virus uses RNA to encode the information required to build COVID-19 viruses instead of using a more stable strand of DNA, the COVID-19 virus has been mutating into many different strains in an attempt to evade our preventive measures.
Figure 9 – Above is the structure of the COVID-19 virus that carries COVID-19 RNA in its center.
Figure 10 - The investigators first took stretches of HL0 (Host Lineage Zero) RNA that encoded for the protein Qβ replicase and encapsulated them into micro-sized water-in-oil droplets. Then they added micro-sized water-in-oil droplets containing the translation system of the E. coli bacterium. The translation system consisted of E. coli ribosomes and a mixture of nucleotides containing A, C, G, and U nucleotides. Then the mixture was rapidly stirred to mix all of the micro-sized water-in-oil droplets so that they contained both the HL0 (Host Lineage Zero) RNA with em>E. coli ribosomes and a mixture of nucleotides containing A, C, G, and U nucleotides.
The investigators first took stretches of HL0 (Host Lineage Zero) RNA that encoded for the protein Qβ replicase and encapsulated them into micro-sized water-in-oil droplets. Then they added micro-sized water-in-oil droplets containing the translation system of the E. coli bacterium. The translation system consisted of E. coli ribosomes and a mixture of nucleotides containing A, C, G, and U nucleotides. Then the mixture was rapidly stirred to mix all of the micro-sized water-in-oil droplets so that they contained both the HL0 (Host Lineage Zero) RNA with E. coli ribosomes and a mixture of nucleotides containing A, C, G, and U nucleotides. Then they kept repeating this process for many iterations. What happened was quite remarkable. The original HL0 version of RNA continued on for many iterations but around iteration 50, some mutated variations of the HL0 variant began to appear such as variants HL1, HL2 and HL3. All of these variants were slightly different than the original parent HL0 variant, but they all contained functional variations of the Qβ replicase protein that could replicate their version of the original HL0 RNA. Some of the new variants generated Qβ replicase proteins that could replicate other host variants. For example, the HL1 variant might produce a Qβ replicase protein that could replicate the HL2 variant of RNA. More interestingly, around iteration 120 parasites began to evolve. Eventually, three parasites formed PL1, PL2 and PL3. These parasites did not contain the instructions to build a functional Qβ replicase protein. Instead, the PL1, PL2 and PL3 parasites had to rely on the various Qβ replicase proteins produced by the HL1, HL2 and HL3 RNA strands. Figure 11 shows that the amounts of HL0, HL1, HL2, HL3, PL1, PL2 and PL3 RNA initially bounced up and down quite a bit over time. The original HL0 strain even went extinct before round 120! But by round 240, the investigators found that the amounts of HL1, HL2, HL3, PL1, PL2 and PL3 RNA had reached an equilibrium forming a stable network that could essentially continue on forever. The parasitic/symbiotic relationships amongst the HL1, HL2, HL3, PL1, PL2, and PL3 RNAs had reached an equilibrium. The paper concludes with a rather profound finding. Once a self-sustaining network of HL1, HL2, HL3, PL1, PL2, and PL3 RNAs had reached an equilibrium of interrelated parasitic/symbiotic relationships, removing any of the members of the co-evolving network of RNAs caused unpredictable damage to the network to point of extinction for certain members!
Figure 11 - Above we see the original HL0 strain of RNA evolve over time to become strains HL1, HL2 and HL3 that contained their own versions of the Qβ replicase protein that could replicate themselves or other HL1, HL2 and HL3 strains of RNA. We also see the original HL0 strain of RNA go extinct before round 120 and the rise of parasitic strains PL1, PL2 and PL3 around round 120. The parasitic strains PL1, PL2 and PL3 did not contain instructions for a functional Qβ replicase protein that could replicate the PL1, PL2 and PL3 RNA strains, so they had to rely in a parasitic manner on the Qβ replicase proteins generated by the other HL1, HL2 and HL3 strains of RNA that could generate Qβ replicase proteins.
This is the first time that an experimental team has been able to take a single strain of RNA and have it evolve by means of the Darwinian processes of inheritance, innovation and natural selection into a multitude of cooperating and parasitic strains of RNA that formed a stable self-sustaining network of RNA strains that demonstrate the true power of a form of parasitic/symbiotic self-replicating information.
Now, this experiment was a beautiful and a very important breakthrough in unraveling the mystery of the rise of carbon-based life on this planet, but it did rely heavily on one crutch that would not be available to an early Earth with no already-existing form of carbon-based life. It relied on the read/write heads of ribosomes from E. coli bacteria to string together amino acids into proteins using the sequence specified by the strands of RNA, and ribosome read/write heads were not for sale prior to the rise of carbon-based life on the Earth because ribosomes are made of RNA and proteins stuck together. So once again, we are confronted by the "which came first, the chicken or the egg" riddle of biology. The next paper sheds some light on that riddle by claiming that they both came first and coevolved with each other during the early times of the Earth.
A prebiotically plausible scenario of an RNA–peptide world
https://www.nature.com/articles/s41586-022-04676-3
Abstract
The RNA world concept is one of the most fundamental pillars of the origin of life theory. It predicts that life evolved from increasingly complex self-replicating RNA molecules. The question of how this RNA world then advanced to the next stage, in which proteins became the catalysts of life and RNA reduced its function predominantly to information storage, is one of the most mysterious chicken-and-egg conundrums in evolution. Here we show that non-canonical RNA bases, which are found today in transfer and ribosomal RNAs, and which are considered to be relics of the RNA world are able to establish peptide synthesis directly on RNA. The discovered chemistry creates complex peptide-decorated RNA chimeric molecules, which suggests the early existence of an RNA–peptide world from which ribosomal peptide synthesis may have emerged. The ability to grow peptides on RNA with the help of non-canonical vestige nucleosides offers the possibility of an early co-evolution of covalently connected RNAs and peptides, which then could have dissociated at a higher level of sophistication to create the dualistic nucleic acid–protein world that is the hallmark of all life on Earth.
This paper is a rather difficult read for the non-professional but, nonetheless, it presents a very important finding for those interested in the origins of carbon-based life on the Earth. First, we need to clarify some terms. An RNA nucleotide is one RNA "bit" as shown in Figure 6. It consists of a phosphate group, a ribose sugar and a base all chemically glued together. An RNA nucleoside is very similar, but it is just an RNA nucleotide without an attached phosphate group, so it consists only of a base chemically glued to a ribose sugar. A canonical nucleoside is a nucleoside made by simply stripping off the phosphate group from one of the four canonical nucleotides - adenine (A), cytosine (C), guanine (G) and uracil (U). On the other hand, a non-canonical nucleoside consists of a different base than the bases found in adenine (A), cytosine (C), guanine (G) and uracil (U) simply chemically glued to a ribose sugar but with no phosphate group attached. Now non-canonical nucleotides are not found in modern RNA. But non-canonical nucleosides that are attached to amino acids are sometimes found to be attached to the tRNA that is used to bring in amino acids one at a time as shown in Figure 7 above. Some biologists think that the non-canonical nucleosides found attached to tRNA might be fossil molecules from a time before the read/write ribosomes even existed. The paper then goes on to experimentally explore how such ancient non-canonical nucleosides could have performed the read/write functions of modern ribosomes before ribosomes came to be. This would allow for ancient forms of self-replicating RNA to produce protein polypeptide chains of amino acids that could do useful things for RNA, like replicate RNA, as we saw the Qβ replicase protein do in the previous paper.
Figure 13 - In this experiment, two strands of RNA that terminate with a non-canonical nucleoside attached to an amino acid can form a growing polypeptide chain of amino acids chemically glued together.
In the experiment above we see how two stretches of RNA that have a non-canonical nucleoside attached to an amino acid can chemically glue amino acids together into a polypeptide chain. In the first step, an RNA molecule ending with a non-canonical nucleoside attached to the amino acid G (glycine) pairs up with a stretch of RNA ending with a non-canonical nucleoside attached to the amino acid V (valine) forming the polypeptide VG. In the next step, another G is brought in to get chemically pasted onto the already existing VG to form the polypeptide VGG. Then another G is brought in to form VGGG. Next an A (alanine) amino acid is brought in to form VGGGA. Finally, another G is brought in to form the polypeptide chain VGGGAG. This could continue on and on to form very long polypeptide chains that then fold up into protein molecules that can do useful things. This activity over millions of years would be very much like a huge bunch of monkeys pushing buttons on a keyboard that in very rare circumstances punched out the source code for a program that could actually compile and do useful things!
Figure 14 - Two strands of RNA ending with a non-canonical nucleoside attached to a growing polypeptide chain of amino acids chemically glued together were experimentally found to be able to append one polypeptide chain to the end of another polypeptide chain of amino acids chemically glued together.
In the next experiment above, the team showed how two strands of RNA with already-existing polypeptide chains of amino acids could append one chain of amino acids to another chain of amino acids to create a very long polypeptide chain.
Figure 15 - Finally, it was experimentally shown that the non-canonical nucleosides with one or more attached polypeptide amino acid chains chemically glued together could be attached at multiple sites along an RNA molecule. This would allow for all of the above to operate all at the same time in parallel along existing RNA molecules, just as a single mRNA molecule can be read by multiple ribosome read/write heads all at the same time in modern living cells.
In this last experiment, the team showed that it was possible to attach a non-canonical nucleoside that was attached to an amino acid anywhere along the chain of RNA nucleotides. The non-canonical nucleoside did not have to only be attached at the end of the RNA molecule. That means that a single strand of RNA could build multiple long polypeptide chains in parallel all at the same time! We see this happen in the modern world where many ribosome read/write heads can read a single mRNA chain and construct many polypeptide chains in parallel all at the same time.
The above experiments were all conducted using conditions compatible with the Hot Spring Origins Hypothesis of Dave Deamer and Bruce Damer out of the University of California at Santa Cruz as explained in the paper above:
In contrast to earlier investigations of the origin of translation, we used naturally occurring non-canonical vestige nucleosides and conditions compatible with aqueous wet-dry cycles.
So let us now imagine a time 4.0 billion years ago when networks of parasitic/symbiotic RNA molecules infected the Membrane-First membranes of Dave Deamer and Bruce Damer running through billions of wet-dry cycles in the hydrothermal pools displayed in Figure 1. These networks of parasitic RNA molecules that could produce polypeptide chains of amino acids chemically glued together needed a source for amino acids, the canonical nucleotides A, C, G and U, and the non-canonical nucleosides required to build polypeptide chains of amino acids. These components must have come from the interstellar molecular clouds from which the Sun and Earth formed and from the original autocatalytic pathways of organic molecules that had already taken up residence in the protocells of Dave Deamer and Bruce Damer in hydrothermal pools. Hiding in the Membrane-First membranes of Dave Deamer and Bruce Damer in hydrothermal pools during periods of drying would certainly be beneficial to the networks of parasitic/symbiotic RNA molecules and the polypeptide chains of amino acids that they generated. Now if this wet-dry cycle in hydrothermal pools should continue on for several million years, surely the monkeys at the RNA keyboard would be able to accidentally punch out many very useful polypeptide chains of amino acids that promoted the survival of the RNA that produced them. This would provide for membrane-enclosed protocells containing many coevolving autocatalytic pathways of organic molecules and many strains of self-replicating RNA strands churning out huge numbers of polypeptide chains of amino acids. This strange brew of parasitic/symbiotic organic molecules encased in such membrane-enclosed protocells would slowly evolve by the Darwinian processes of inheritance, innovation and natural selection into something that might be seen by some as living things. If we had been there at the time, I am sure that we would all be still arguing about that today.
The Rise of DNA
Now protocells using coevolving chains of RNA nucleotides and polypeptide chains of amino acids to get by would certainly make sense for a pure parasite that needs to quickly evolve and can live with the high replication errors that RNA is prone to. A pure parasite needs random RNA errors to quickly explore the survival terrain for a protocell. This is what the RNA-based COVID-19 virus has been doing for the past couple of years.
Similarly, when developing and implementing software, it makes sense to allow for the development team to interact with the software user community in real-time to make rapid changes to the software under development. That is why Agile software development arose in the 1990s. For more on that see Agile vs. Waterfall Programming and the Value of Having a Theoretical Framework. But once a stable software product has been obtained, it also makes sense to slow down a bit with the software alterations. There is long-term value in stability in software products like Microsoft Word.
The problem with RNA is that RNA is a one-track tape that does not allow for error corrections when RNA is duplicated. In order to perform error corrections, you need a two-track tape and that is what DNA provides. In this view, DNA started off as a parasite feeding off the already-existing RNA nucleotides in protocells that already contained self-replicating RNA molecules that were also producing primitive proteins used to enhance the survival and self-replication of the RNA molecules. Over time, these parasitic DNA molecules then entered into a parasitic/symbiotic relationship with the RNA molecules and their generated proteins. Then each group of molecules settled down with the functions they were best suited for. DNA transitioned to become a two-track tape for long-term data storage. RNA transitioned to become a temporary one-track tape that could be read by ribosome read/write heads. RNA became an I/O buffer that could be used to temporarily cache the information stored in long-term DNA while it was read by multiple ribosomes in parallel. The generated proteins then went on to become the molecules that could perform the functions of carbon-based life. For more on the rise of DNA to predominance see An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer.
Figure 16 - RNA is a one-track tape, while DNA is a two-track tape. DNA has a data track and a parity track that allows for error corrections after DNA replicates. DNA uses a slightly different version of the ribose sugar and also uses the nucleotide of T (Thymine) instead of the U (Uracil) used by RNA.
To explore the advantages of using two-track DNA to store the information to build polypeptide chains of amino acids that then fold up into useful protein molecules, let us look back to the processing of batch tape jobs in IT during the 1960s and 1970s.
Tape Sequential Access Methods
One of the simplest and oldest tape sequential access methods is called QSAM - Queued Sequential Access Method:
Queued Sequential Access Method
http://en.wikipedia.org/wiki/Queued_Sequential_Access_Method
I did a lot of magnetic tape processing in the 1970s and early 1980s using QSAM. At the time we used 9-track tapes that were 1/2 inch wide and 2400 feet long on a reel with a 10.5-inch diameter. The tape had 8 data tracks and one parity track across the 1/2 inch tape width. That way we could store one byte across the 8 1-bit data tracks in a frame, and we used the parity track to check for errors. We used odd parity, if the 8 bits on the 8 data tracks in a frame added up to an even number of 1s, we put a 1 in the parity track to make the total number of 1s an odd number. If the 8 bits added up to an odd number of 1s, we put a 0 in the parity track to keep the total number of 1s an odd number. Originally, 9-track tapes had a density of 1600 bytes/inch of tape, with a data transfer rate of 15,000 bytes/second. Remember, a byte is 8 bits and can store one character, like the letter “A” which we encode in the ASCII code set as A = “01000001”.
Figure 17 – A 1/2 inch wide 9-track magnetic tape on a 2400-foot reel with a diameter of 10.5 inches
Figure 18 – 9-track magnetic tape had 8 data tracks and one parity track using odd parity which allowed for the detection of bad bytes with parity errors on the tape.
Later, 6250 bytes/inch tape drives became available, and I will use that density for the calculations that follow. Now suppose you had 50 million customers and the current account balance for each customer was stored on an 80-byte customer record. A record was like a row in a spreadsheet. The first field of the record was usually a CustomerID field that contained a unique customer ID like a social security number and was essentially the equivalent of a promoter region on the front end of a gene in DNA. The remainder of the 80-byte customer record contained fields for the customer’s name and billing address, along with the customer’s current account information. Between each block of data on the tape, there was a 0.5-inch gap of “junk” tape. This “junk” tape allowed for the acceleration and deceleration of the tape reel as it spun past the read/write head of a tape drive and perhaps occasionally reversed direction. Since an 80-byte record only came to 80/6250 = 0.0128 inches of tape, which is quite short compared to the overhead of the 0.5-inch gap of “junk” tape between records, it made sense to block many records together into a single block of data that could be read by the tape drive in a single I/O operation. For example, blocking 100 80-byte records increased the block size to 8000/6250 = 1.28 inches and between each 1.28-inch block of data on the tape, there was the 0.5-inch gap of “junk” tape. This greatly reduced the amount of wasted “junk” tape on a 2400-foot reel of tape. So each 100 record block of data took up a total of 1.78 inches of tape and we could get 16,180 blocks on a 2400-foot tape or the data for 1,618,000 customers per tape. The advantage of QSAM, over an earlier sequential access method known as BSAM, was that you could read and write an entire block of records at a time via an I/O buffer. In our example, a program could read one record at a time from an I/O buffer that contained the 100 records from a single block of data on the tape. When the I/O buffer was depleted of records, the next 100 records were read in from the next block of records on the tape. Similarly, programs could write one record at a time to the I/O buffer, and when the I/O buffer was filled with 100 records, the entire I/O buffer with 100 records in it was written as the next block of data on an output tape.
The use of a blocked I/O buffer provided a significant distinction between the way data was physically stored on tape and the way programs logically processed the data. The difference between the way things are physically implemented and the way things are logically viewed by software is a really big deal in IT. The history of IT over the past 81 years has really been a history of logically abstracting physical things through the increasing use of layers of abstraction, to the point where today, IT professionals rarely think of physical things at all. Everything just resides in a logical “Cloud”. I think that taking more of a logical view of things, rather than taking a physical view of things, would greatly help biologists at this point in the history of biology. Biologists should not get so hung up about where the information for biological software is physically located. Rather, biologists should take a cue from IT professionals, and start thinking more of biological software in logical terms, rather than physical terms.
Figure 19 – Between each record, or block of records, on a magnetic tape, there was a 0.5 inch gap of “junk” tape. The “junk” tape allowed for the acceleration and deceleration of the tape reel as it spun past the read/write head on a tape drive. Since an 80-byte record only came to 80/6250 = 0.0128 inches, it made sense to block many records together into a single block that could be read by the tape drive in a single I/O operation. For example, blocking 100 80-byte records increased the block size to 8000/6250 = 1.28 inches, and between each 1.28-inch block of data on the tape, there was a 0.5-inch gap of “junk” tape for a total of 1.78 inches per block.
Figure 20 – Blocking records on tape allowed data to be stored more efficiently.
So it took 31 tapes to just store the rudimentary account data for 50 million customers. The problem was that each tape could only store 123 MB of data. Not too good, considering that today you can buy a 1 TB PC disk drive that can hold 8525 times as much data for about $100! Today, you could also store about 67 times as much data on a $7.00 8 GB thumb drive. So how could you find the data for a particular customer on 74,000 feet (14 miles) of tape? Well, you really could not do that reading one block of data at a time with the read/write head of a tape drive, so we processed data with batch jobs using lots of input and output tapes. Generally, we had a Master Customer File on 31 tapes and a large number of Transaction tapes with insert, update and delete records for customers. All the tapes were sorted by the CustomerID field, and our programs would read a Master tape and a Transaction tape at the same time and apply the inserts, updates and deletes on the Transaction tape to a new Master tape. So your batch job would read a Master and Transaction input tape at the same time and would then write to a single new Master output tape. These batch jobs would run for many hours, with lots of mounting and unmounting of dozens of tapes.
Figure 21 – Batch processing of 50 million customers took a lot of tapes and tape drives.
The Advantages of the DNA Access Method
Nearly all biological functions are performed by proteins. A protein is formed by combining 20 different amino acids into different sequences, and on average it takes about 400 amino acids strung together to form a functional protein. The information to do that is encoded in base pairs running along a strand of DNA. Each base can be in one of four states – A, C, G, or T, and an A will always be found to pair with a T, while a C will always pair with a G. So DNA is really a 2-track tape with one data track and one parity track. For example, if there is an A on the DNA data track, you will find a T on the DNA parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.
Figure 22 – DNA is a two-track tape, with one data track and one parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.
Now a single base pair can code for 4 different amino acids because a single base pair can be in one of 4 states. Two base pairs can code for 4 x 4 = 16 different amino acids, which is not enough. Three base pairs can code for 4 x 4 x 4 = 64 amino acids which are more than enough to code for 20 different amino acids. So it takes a minimum of three bases to fully encode the 20 different amino acids, leaving 44 combinations left over for redundancy. Biologists call these three base pair combinations a “codon”, but a codon really is just a biological byte composed of three biological bits or base pairs that code for an amino acid. Actually, three of the base pair combinations, or codons, are used as STOP codons – TAA, TAG and TGA which are essentially end-of-file markers designating the end of a gene along the sequential file of DNA. As with magnetic tape, there is a section of “junk” DNA between genes along the DNA 2-track tape. According to Shannon’s equation, a DNA base contains 2 bits of information, so a codon can store 6 bits. For more on this see Some More Information About Information.
Figure 23 – Three bases combine to form a codon, or a biological byte, composed of three biological bits, and encodes the information for one amino acid along the chain of amino acids that form a protein.
The beginning of a gene is denoted by a section of promoter DNA that identifies the beginning of the gene, like the CustomerID field on a record, and the gene is terminated by a STOP codon of TAA, TAG or TGA. Just as there was a 0.50-inch gap of “junk” tape between blocks of records on a magnetic computer tape, there is a section of “junk” DNA between each gene along the 6 feet of DNA tape found within human cells.
Figure 24 - On average, each gene is about 400 codons long and ends in a STOP codon TAA, TAG or TGA which are essentially end-of-file markers designating the end of a gene along the sequential file of DNA. As with magnetic tape, there is a section of “junk” DNA between genes which is shown in grey above.
In order to build a protein, genes are first transcribed to an I/O buffer called mRNA. The 2-track DNA file for a gene is first opened near the promoter of a gene and an enzyme called RNA polymerase then begins to copy the codons or biological bytes along the data track of the DNA tape to an mRNA I/O buffer. The mRNA I/O buffer is then read by a ribosome read/write head as it travels along the mRNA I/O buffer. The ribosome read/write head reads each codon or biological byte of data along the mRNA I/O buffer and writes out a chain of amino acids as tRNA brings in one amino acid after another in the sequence specified by the mRNA I/O buffer.
Figure 25 - In order to build a protein, genes are first transcribed to an I/O buffer called mRNA. The 2-track DNA file for a gene is first opened near the promoter of a gene and an enzyme called RNA polymerase then begins to copy the codons or biological bytes along the data track of the DNA tape to the mRNA I/O buffer. The mRNA I/O buffer is then read by a ribosome read/write head as it travels along the mRNA I/O buffer. The ribosome read/write head reads each codon or biological byte of data along the mRNA I/O buffer and writes out a chain of amino acids as tRNA brings in one amino acid after another in the sequence specified by the mRNA I/O buffer.
The Bizarre Nature of Eukaryotic DNA
From an IT perspective, I have always marveled at the dramatic change in the data storage technology used by the eukaryotes, compared to the simple DNA loops of the prokaryotes.
Figure 26 – The prokaryotic cell architecture of the bacteria and archaea is very simple and designed for rapid replication. Prokaryotic cells do not have a nucleus enclosing their DNA. Eukaryotic cells, on the other hand, store their DNA on chromosomes that are isolated in a cellular nucleus. Eukaryotic cells also have a very complex internal structure with a large number of organelles, or subroutine functions, that compartmentalize the functions of life within the eukaryotic cells such as mitochondria, chloroplasts, Golgi bodies, and the endoplasmic reticulum.
Prokaryotic cells essentially consist of a tough outer cell wall enclosing an inner cell membrane and contain a minimum of internal structure. The cell membrane is composed of phospholipids and proteins. The DNA within prokaryotic cells generally floats freely as a large loop of DNA, and their ribosomes used to help translate mRNA into proteins, float freely within the entire cell as well. The ribosomes in prokaryotic cells are not attached to membranes like they are in eukaryotic cells, which have membranes called the rough endoplasmic reticulum for that purpose. The chief advantage of prokaryotic cells is their simple design and the ability to thrive and rapidly reproduce even in very challenging environments, like little AK-47s that still manage to work in environments where modern tanks will fail. Eukaryotic cells, on the other hand, are found in the bodies of all complex organisms, from single-celled yeasts to you and me, and they divide up cell functions amongst a collection of organelles (functional subroutines), such as mitochondria, chloroplasts, Golgi bodies, and the endoplasmic reticulum.
It then took about another 2,500 million years of habitable climate for the Eukarya to appear which used the much more complicated eukaryotic cell architecture used by all of the "higher" forms of carbon-based life found on the Earth today. This was probably the most difficult step in producing a carbon-based form of life with Intelligence and that is why it took so very long. The arrival of the eukaryotic cell architecture with built-in mitochondrial power supplies on a planet with an atmosphere containing oxygen was necessary to power multicellular "higher" forms of carbon-based life with Intelligence. After all, it requires a full 20 watts of power to run a human Mind and all that it can contemplate and that is a lot of energy. For more on that see The Rise of Complexity in Living Things and Software. All of the "higher" forms of life that we are familiar with are simply made of aggregations of eukaryotic cells. Even the simple yeasts that make our bread, and get us drunk, consist of very complex eukaryotic cells. The troubling thing is that only an expert could tell the difference between a yeast eukaryotic cell and a human eukaryotic cell because they are so similar, while any school child could easily tell the difference between the microscopic images of a prokaryotic bacterial cell and a eukaryotic yeast cell - see Figure 26. The other thing about eukaryotic cells, as opposed to prokaryotic cells, is that eukaryotic cells are HUGE! They are like 15,000 times larger by volume than prokaryotic cells! See Figure 27 for a true-scale comparison of the two.
Figure 27 – Not only are eukaryotic cells much more complicated than prokaryotic cells, they are also HUGE!
Figure 28 – Complex carbon-based multicellular life consisting of large numbers of eukaryotic cells all working together as a single organism did not arise until the Ediacaran Period 635 million years ago.
Complex multicellular life did not arise until just 635 million years ago during the Ediacaran Period. But very complex carbon-based multicellular life did not really take off until the Cambrian Explosion 541 million years ago. The Cambrian Explosion may have been initiated by the advancement of rudimentary forms of vision by certain Cambrian predators. See An IT Perspective of the Cambrian Explosion for more on that.
Figure 29 – Complex carbon-based multicellular life then really took off during the Cambrian Explosion 541 million years ago.
So prokaryotes are single-celled organisms, like bacteria, with very little internal structure. Eukaryotes are much larger cells with a great deal of internal structure and a nucleus that contains DNA wound up into chromosomes. For me, it has always seemed very reminiscent of the dramatic change in the data storage architecture of commercial software that took place when commercial software shifted from the batch processing of sequential files on magnetic tape to the interactive processing of data on disk drives using indexed access methods like ISAM, VSAM and relational databases. It seems that the genomes of eukaryotic multicellular organisms may have adopted hierarchical indexed access methods to locate groups of genes and to turn them on or off. So it is of value to see how IT transitioned from using sequential access methods on tape in the 1950s, 1960s and 1970s, like BSAM and QSAM, to using hierarchical indexed access methods like ISAM, VSAM and relational databases in the 1980s and beyond.
How Hierarchical Indexed Acess Methods Work
Recall that using simple tape sequential files that it took 31 tapes holding 14 miles of magnetic tape to just store the rudimentary account data for 50 million customers. Clearly, this technology could not work for a customer calling in and wanting to know his current account status at this very moment. The solution was to use multiple transcription sites along the 14 miles of tape. This was accomplished by moving the customer data to disk drives. A disk drive is like a stack of old phonograph records on a rapidly rotating spindle. Each platter has its own access arm, like the tone arm on an old turntable that has a read/write head. To quickly get to the data on a disk drive IT invented new access methods that used indexes, like ISAM and VSAM. These hierarchical indexes work like this. Suppose you want to find one customer out of 50 million via their CustomerID. You first look up the CustomerID in a book that only contains an index of other books. The index entry for the particular CustomerID tells you which book to look in next. The next book also just consists of an index of other books too. Finally, after maybe 4 or 5 reads, you get to a book that has an index of books with “leaf” pages. This index tells you what book to get next and on what “leaf page” you can find the customer record for the CustomerID that you are interested in. So instead of spending many hours reading through perhaps 14 miles of tape on 31 tapes, you can find the customer record in a few milliseconds and put it on a webpage. For example, suppose you have 200 customers instead of 50 million and you would like to find the information on customer 190. If the customer data were stored as a sequential file on magnetic tape, you would have to read through the first 189 customer records before you finally got to customer 190. However, if the customer data were stored on a disk drive, using an indexed sequential access method like ISAM or QSAM, you could get to the customer after 3 reads that get you to the leaf page containing records 176 – 200, and you would only have to read 14 records on the leaf page before you got to record 190. For more on these indexed access methods see:
ISAM Indexed Sequential Access Method
http://en.wikipedia.org/wiki/ISAM
VSAM Virtual Storage Access Method
http://en.wikipedia.org/wiki/VSAM
Figure 30 – Disk drives allowed for indexed access methods like ISAM and VSAM to quickly access an individual record.
Figure 31 – To find customer 190 out of 200 on a magnetic tape would require sequentially reading 189 customer records. Using the above hierarchical index would only require 3 reads to get to the leaf page containing records 176 – 200. Then an additional 14 reads would get you to customer record 190.
The key advance that came with the ISAM and VSAM access methods over QSAM was that it allowed commercial software to move from batch processing to interactive processing in the 1970s and 1980s. That was a major revolution in IT.
Today we store all commercial data on relational databases, like IBM’s DB2 or Oracle’s database software, but these relational databases still use hierarchical indexing like VSAM under the hood. Relational databases logically store data on tables. A table is much like a spreadsheet and contains many rows of data that are formatted into a number of well-defined columns. A large number of indexes are then formed using combinations of data columns to get to a particular row in the table. Tables can also be logically joined together into composite tables with logical rows of data that contain all of the data on several tables merged together, and indexes can be created on the joined tables to allow programs to quickly access the data. For large-scale commercial software, these relational databases can become quite huge and incredibly complicated, with huge numbers of tables and indexes, forming a very complicated nonlinear network of components, and the database design of these huge networks of tables and indexes is crucial to processing speed and throughput. A large-scale relational database may contain several thousand tables and indexes, and a poorly designed relational database design can be just as harmful to the performance of a high-volume corporate website as buggy software. A single corrupted index can easily bring a high-volume corporate website crashing down, resulting in the loss of thousands of dollars for each second of downtime.
Figure 32 – Modern relational databases store data on a large number of tables and use many indexes to quickly access the data in a particular row of a table or a row in a combination of joined tables. Large-scale commercial applications frequently have databases with several thousand tables and several thousand indexes.
But remember, under the hood, these relational databases are all based upon indexed access methods like VSAM, and VSAM itself is just a logical view of what is actually going on in the software controlling the disk drives themselves, so essentially we have a lengthy series of logical indexes of logical indexes, of logical indexes, of logical indexes…. The point is that in modern commercial software there is a great deal of information stored in the network of components that are used to determine how information is read and written. If you dig down deep into the files running a relational database, you can actually see things like the names and addresses of customers, but you will also find huge amounts of control information that lets programs get to those names and addresses efficiently, and if any of that control information gets messed up your website comes crashing down.
Biological Access Methods
Nearly all biological functions are performed by proteins. A protein is formed by combining 20 different amino acids into different sequences, and on average it takes about 400 amino acids strung together to form a functional protein. The information to do that is encoded in base pairs running along a strand of DNA. Each base can be in one of four states – A, C, G, or T, and an A will always be found to pair with a T, while a C will always pair with a G. So DNA is really a 2-track tape with one data track and one parity track. For example, if there is an A on the DNA data track, you will find a T on the DNA parity track. This allows not only for the detection of parity errors but also for the correction of parity errors in DNA by enzymes that run up and down the DNA tape looking for parity errors and correcting them.
The prokaryotes store their genes in one large loop of DNA and in a number of smaller loops of DNA called plasmids. The plasmids are easier to share with other bacteria than the whole main loop of bacterial DNA and provide for a rudimentary form of sexual reproduction amongst prokaryotes when they exchange plasmids.
Figure 33 – Prokaryotes, like bacteria and archaea, store their DNA in simple loops like magnetic computer tape.
In contrast to the prokaryotes, in eukaryotic cells there are three general levels of chromatin organization:
1. DNA first wraps around histone proteins, like magnetic computer tape around little reels, forming nucleosomes.
2. Multiple nucleosomes wrap up into a 30 nm fiber consisting of compact nucleosome arrays.
3. When cells are dividing the 30 nm fibers fold up into the familiar chromosomes that can be seen in optical microscopes.
In eukaryotic cells, the overall structure of chromatin depends upon the stage that a cell is in its lifecycle. Between cell divisions, known as the interphase, the chromatin is more loosely structured to allow RNA and DNA polymerases to easily transcribe and replicate the DNA. During the interphase, the DNA of genes that are “turned on” is more loosely packaged than is the DNA of the genes that have been “turned off”. This allows RNA polymerase enzymes to access the “turned on” DNA easier and then transcribe it to an mRNA I/O buffer that can escape from the nucleus and be translated into a number of identical protein molecules by several ribosome read/write heads.
This very complex structure of DNA in eukaryotic cells, composed of chromosomes and chromatin, has always puzzled me. I can easily see how the organelles found within eukaryotic cells, like the mitochondria and chloroplasts, could have become incorporated into eukaryotic cells based on the Endosymbiosis theory of Lynn Margulis, which holds that the organelles of eukaryotic cells are the remainders of invading parasitic prokaryotic cells that took up residence within proto-eukaryotic cells, and entered into a parasitic/symbiotic relationship with them, but where did the eukaryotic nucleus, chromosomes, and the complex structure of chromatin come from? Bacteria do not have histone proteins, but certain archaea do have histone proteins, so the origin of these complex structures in eukaryotic cells might have arisen from bacteria invading the cells of certain archaea. The archaea are known for their abilities to live under extreme conditions of heat and salinity, so the origin of histone proteins and chromatin might go back to a need to stabilize DNA under the extreme conditions that archaea are fond of.
For more on the origin and functions of histone proteins see: http://en.wikipedia.org/wiki/Histone
Figure 34 – Eukaryotic DNA is wrapped around histone proteins like magnetic computer tape wrapped around little reels, forming nucleosomes, and then is packed into chromatin fibers that are then wound up into chromosomes.
Figure 35 – Chromatin performs the functions of the tape racks of old and allows DNA to be highly compacted for storage and also allows for the controlled expression of genes by means of epigenetic factors in play. Each tape in a rack had an external label known as a volume serial number which identified the tape.
As of yet nobody really knows why the eukaryotes have such a complicated way of storing DNA, but my suspicion is that eukaryotic cells may have also essentially come up with a hierarchical indexing of DNA by storing genes in chromatin on chromosomes and by attaching proteins and other molecules to the compacted DNA, and by using other epigenetic techniques, to enhance or suppress transcription of genes into proteins. IT professionals always think of software in terms of a very large and complex network of logical operations, and not simply as a parts list of software components. In commercial software, it is the network of logical operations that really counts and not the individual parts. Now that biologists have sequenced the genomes of many species and found that large numbers of them essentially contain the same genes, they are also beginning to realize that biological software cannot simply be understood in terms of a list of genes. After all, what really counts in a multicellular organism is the kinds of proteins each cell generates and the amounts of those proteins that it generates after it has differentiated into a unique cell type. And biologists are discovering that biological software uses many tricks to control which proteins are generated and in what amounts, so the old model of “one gene produces one protein, which generates one function” has been replaced by the understanding that there are many tricky positive and negative feedback loops in operation that come into play as the information encoded in genes eventually becomes fully-formed 3-D protein molecules. For example, genes in eukaryotic cells are composed of sections of DNA known as exons that code for sequences of amino acids. The exons are separated by sections of “junk” DNA known as introns. The exons can be spliced together in alternative patterns, allowing a single gene to code for more than one protein. Also, eukaryotic cells have much more “junk” DNA between genes than do the prokaryotes and this “junk” DNA also gets transcribed to microRNA or miRNA. An average miRNA strand is about 22 bases long and it can base-pair with a complementary section of mRNA that has already been transcribed from a DNA gene. Once that happens the mRNA can no longer be read by a ribosome read/write head and the strand of mRNA becomes silenced and eventually degrades without ever producing a protein. The human genome contains more than 1,000 miRNAs that can target about 60% of human genes, so miRNAs play a significant role in regulating gene expression.
Figure 36 – Genes in eukaryotic cells are composed of sections of DNA known as exons that code for sequences of amino acids. The exons are separated by sections of “junk” DNA known as introns. The exons can be spliced together in alternative patterns, allowing a single gene to code for more than one protein.
The key point is that the DNA access methods used by eukaryotic cells are probably just as important, or perhaps even more important to eukaryotic cells, than is the content of the individual genes themselves because the network of genes probably contains as much information as the individual genes themselves.
The Impact of Social Media Software on the Cultures of the World
The appearance of the very first parasitic RNA molecules on the Earth, about 4.0 billion years ago, represented a dramatic turning point for the planet. It was the first time a new Replicator had appeared that threatened the predominance of the already-existing self-replicating autocatalytic metabolic pathways of organic molecules hiding in the phospholipid membranes of the protocells of the day. Suddenly, for the first time, a new form of self-replicating information was in play. This new revolutionary form of self-replicating information must have created a great deal of turmoil as it quickly parasitized its helpless predecessors. But eventually, RNA settled down to form a strong parasitic/symbiotic relationship with the self-replicating autocatalytic metabolic pathways of organic molecules that provided the feedstock of organic molecules for RNA survival. Successful parasites exploit their hosts, they do not kill them. For more on parasites and the potential for interstellar parasitic software to replicate across the galaxy see SETS - The Search For Extraterrestrial Software.
We are seeing this same turmoil today as software is rising to predominance and forming strong parasitic/symbiotic relationships with the meme-complexes of the world. The most striking examples are the parasitic/symbiotic relationships that have formed between social media software and the cultural meme-complexes of the world, particularly the current world political memes. For example, looking back to the birth of the RNA-World might be of value in trying to understand the very complex parasitic/symbiotic relationships between the Fascist Alt-Right memes of the world with social media software that now seem to be jointly sweeping across the world in a cooperative parasitic/symbiotic manner. This would be most important for the United States of America because we now have a political movement advocating for the violent overthrow of the government. When I took on my very first job after college back in 1975, I actually had to take a loyalty oath swearing that I had never belonged to any such movements. I doubt that the American Alt-Right Fascist movement could have come this far without the amplification provided by social media software. For more on that see
The Perils of Software Enhanced Confirmation Bias, The Very Strange American Political Worldviews Fostered by Artificial Reality Software and The QAnon Phenomenon - Why Does Social Media Software Make People So Nutty?.
Figure 37 – During the Rebellion of 2021 the Capital Building of the United States of America was breached for the first time by domestic insurrectionists.
Figure 38 – The 2021 insurrectionists desecrated many symbols of American democracy.
Figure 39 – The QAnon Shaman and other insurrectionists managed to reach the floor of the Senate Chamber.
Figure 40 – The madness of American Alt-Right Fascism.
Figure 41 – As a child of the 1950s, President Eisenhower was a national hero for me who had fought the European Fascists of the 1930s and 1940s. I wonder what he would think of American Alt-Right Fascism today?
Comments are welcome at
scj333@sbcglobal.net
To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/
Regards,
Steve Johnston