Monday, May 19, 2014

Software Embryogenesis

In my last posting An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer I proposed that the chromatin and chromosomes of multicellular eukaryotic organisms, like ourselves, might have arisen in nature as a common solution to the same problem that IT faced when trying to process 14 miles of magnetic computer tape just to process the rudimentary account data for 50 million customers. The problem was how do you quickly find the account information for a single customer on 14 miles of tape? Multicellular organisms faced this very same challenge when large-scale multicellular organism first appeared during the Cambrian explosion 541 million years ago. How does each cell in a multicellular organism, consisting of billions or trillions of differentiated cells, find the genes that it needs in order to differentiate into the correct cell type for the tissue that it is forming? For example, Humans are made up of about 100 trillion eukaryotic cells, and each cell of the 100 trillion contains about 23,000 genes for coding proteins, stored on a small percentage of the 6 feet of DNA that is carefully wound up around a large number of histone proteins, and is packaged into 23 pairs of chromosomes within each cell, like the magnetic computer tape of yore that was wound up around 2400 foot reels, and was carefully stored in the tape racks of times gone by. The 100 trillion eukaryotic cells of a human are composed of several hundred different cell types, and each cell at each location within a human body must somehow figure out what kind of cell to become. So how does each differentiated cell within a human being find the proper genes, and at the proper time, to develop into a human baby from a single fertilized egg cell? This is a very perplexing problem because each human being begins as a spherically symmetric fertilized egg cell. How can it possibly grow and differentiate into 100 trillion cells, composed of several hundred different cell types, and ultimately forming the myriad varied tissues within a body that perform the functions of life? In biology, the study of this incredible feat is called embryogenesis or developmental biology, and this truly amazing process from a data processing perspective is certainly worthy of investigation from an IT perspective.

Human Embryogenesis
Most multicellular organisms follow a surprisingly similar sequence of steps to form a complex body, composed of billions or trillions of eukaryotic cells, from a single fertilized egg. This is a sure sign of some inherited code at work that has been tweaked many times to produce a multitude of complex body plans or phyla by similar developmental processes. Since many multicellular life forms follow a similar developmental theme let us focus, as always, upon ourselves and use the development of human beings as our prime example of how developmental biology works. For IT professionals and other readers not familiar with embryogenesis, it would be best now to view this short video before proceeding:

Medical Embryology - Difficult Concepts of Early Development Explained Simply https://www.youtube.com/watch?annotation_id=annotation_1295988581&feature=iv&src_vid=rN3lep6roRI&v=nQU5aKKDwmo

Basically, a fertilized egg, or zygote, begins to divide many times over, without the zygote really increasing in size at all. After a number of divisions, the zygote becomes a ball of undifferentiated cells that are all the same and is known as a morula. The morula then develops an interior hollow center called a blastocoel. The hollow ball of cells is known as a blastula and all the cells in the blastula are undifferentiated, meaning that they are all still identical in nature.

Figure 1 – A fertilized egg, or zygote, divides many times over to form a solid sphere of cells called a morula. The morula then develops a central hole to become a hollow ball of cells known as a blastula. The blastula consists of identical cells. When gastrulation begins some cells within the blastula begin to form three layers of differentiated cells – the ectoderm, mesoderm, and endoderm. The above figure does not show the amnion which forms just outside of the infolded cells that create the gastrula. See Figure 2 for the location of the amnion.

The next step is called gastrulation. In gastrulation one side of the blastula breaks symmetry and folds into itself and eventually forms three differentiated layers – the ectoderm, mesoderm and endoderm. The amnion forms just outside of the gastrulation infold.

Figure 2 – In gastrulation three layers of differentiated cells form - the ectoderm, mesoderm and endoderm by cells infolding and differentiating.

Figure 3 – Above is a close-up view showing the ectoderm, mesoderm and endoderm forming from the primitive streak.

The cells of the endoderm go on to differentiate into the internal organs or guts of a human being. The cells of the mesoderm, or the middle layer, go on to form the muscles and connective tissues that do most of the heavy lifting. Finally, the cells of the ectoderm go on to differentiate into the external portions of the human body, like the skin and nerves.

Figure 4 – Some examples of the cell types that develop from the endoderm, mesoderm and ectoderm.

Figure 5 – A human being develops from the cells in the ectoderm, mesoderm and endoderm as they differentiate into several hundred different cell types.

This is all incredibly impressive from a data processing perspective. The nagging question in biology has always been if each of the 100 trillion cells in the human body all have the very same 23,000 genes strung out along some small percentage of the 6 feet of DNA found in the 23 pairs of chromosomes of each cell, how do the 100 trillion cells figure out what to do?

Alan Turing’s Morphogenesis Model
In biology, the currently accepted paradigm for how the spherically symmetric cells of a blastula differentiate into the 100 trillion cells of the human body, forming very complex tissues and organs, stems from a paper that Alan Turing published in 1952 entitled The Chemical Basis of Morphogenesis. Yes, the very same Alan Turing of early computer science fame. In 1936 Alan Turing developed the mathematical concept of the Turing Machine in On Computable Numbers, with an Application to the Entscheidungsproblem that today underlies the architecture for all modern computers. Alan Turing’s work was completely conceptual in nature, and in the paper, he proposed the theoretical concept of a Turing Machine. A Turing Machine was composed of a read/write head and an infinitely long paper tape. On the paper tape was stored a sequential series of 1s and 0s, and the read/write head could move back and forth along the paper tape in a motion based upon the 1s and 0s that it read. The read/write head could also write 1s and 0s to the paper tape as well. In Turing’s paper, he mathematically proved that such an arrangement could be used to encode any mathematical algorithm, like multiplying two very large numbers together and storing the result on the paper tape. In many ways, a Turing Machine is much like a ribosome reading mRNA and writing out the amino acids of a polypeptide chain that eventually fold up into an operational protein.

Figure 6 - A Turing Machine had a read/write head and an infinitely long paper tape. The read/write head could read instructions on the tape that were encoded as a sequence of 1s and 0s and could write out the results of following the instructions on the paper tape back to the tape as a sequence of 1s and 0s.

Figure 7 – A ribosome read/write head behaves much like the read/write head of a Turing Machine. The ribosome reads an mRNA tape that was transcribed earlier from a section of DNA tape that encodes the information in a gene. The ribosome read/write head then reads the A, C, G, and U nucleobases that code for amino acids three at a time. As each 3-bit byte is read on the mRNA tape, the ribosome writes out an amino acid to a growing polypeptide chain, as tRNA units bring in one amino acid at a time. The polypeptide chain then goes on to fold up into a 3-D protein molecule.

In a sense, all modern computers are loosely based upon the concept of a Turing Machine. Turing did not realize it, but at the same time he was formulating the concept of a Turing Machine back in 1936, Konrad Zuse was constructing his totally mechanical Z1 computer in the bathroom of his parent’s apartment in Germany, and the Z1 really did use a paper tape to store the program and data that it processed, much like a Turing Machine. Neither one of these early computer pioneers had any knowledge of the other at the time. For more about how Konrad Zuse independently developed a physical implementation of many of Alan Turing’s mathematical concepts, but also implemented them in practical terms in the form of the world’s very first real computers, see the following article that was written in his own words:

http://ei.cs.vt.edu/~history/Zuse.html

Figure 8 - A reconstructed mechanical Z1 computer completed by Konrad Zuse in 1989. The Z1 was not a full-fledged modern computer, like Zuse’s Z3 computer that became operational in May of 1941 because it read programs from a punched tape that were not stored in the mechanical memory of the Z1. In that regard, the Z1 was more like a Turing Machine than are modern computers.

Now back to Turing and morphogenesis. The essential element of Turing’s model for the development of embryos was the concept of a morphogen. A morphogen is an organic molecule that is found in the cells of an embryo or diffuses between the cells of an embryo and that can affect embryonic development. In Turing’s model when a morphogen reached a critical concentration it could activate or inhibit some of the genes in the growing embryo that controlled the differentiation and migration of the embryonic cells. Today we call morphogens that are secreted by one cell and diffuse to other nearby cells a paracrine factor and they are primarily protein molecules that are generated by the cells of a developing embryo. The problem with morphogenesis was that if all the cells in the hollow ball of cells that formed a blastula were all identical, how could embryonic development get initiated? Turing proposed that there would naturally be some slight variations in the concentrations of the morphogens from place to place along the surface of the blastula, and eventually, these variations, or instabilities, in the concentrations of the morphogen molecules would naturally cause the blastula to break spherical symmetry. It’s like trying to balance a pencil on its point. Initially, the pencil stands straight up and is totally symmetric with respect to all directions. But eventually the pencil will fall down, due to a slight instability, and then it will point in some specific direction, like due north.

Turing proposed that the spherical symmetry of the blastula could be broken in a similar manner, by means of varying diffusion rates of the morphogen molecules. For example, suppose the genes within a human being can generate two proteins A and B. Protein A can enhance the generation of protein A itself, and can also enhance the generation of another protein B by epigenetic means, like binding to the promoter sections of the DNA for the genes that make proteins A and B. Now suppose that protein B can also inhibit the production of protein A by similar means and that protein B is a smaller molecule that diffuses faster than protein A. A negative feedback loop will then develop between proteins A and B. If protein A increases, it will enhance the production of protein B in the nearby cells of the blastula, which will then inhibit the production of protein A in the local vicinity, and consequently, will help to keep the local production of protein A in check. Proteins A and B will then arrive at some equilibrium level that never changes due to the controlling negative feedback loops operating in the vicinity of the cells. But what if in one isolated section of the blastula an instability should develop, and the concentration of protein A spontaneously peaks far above normal? This will produce more of protein A in the neighboring cells, and also more of protein B too because protein A enhances the production of both proteins A and B. But because protein B can diffuse faster than protein A, the concentration level of protein B at some distance from the protein A peak will be higher than normal and will suppress the production of protein A in a surrounding ring centered upon the protein A peak. The end result will be a protein A peak surrounded by a ring of protein B, like the foothills around a mountain peak. This will break the spherical symmetry of the blastula because now we no longer have constant concentrations of protein A and B throughout the blastula. Once the spherical symmetry of the blastula has been broken an explosive cascade of logical operations are unleashed as a torrent of morphogens, or paracrine factors, are released in a large number of parallel chain reactions that transform the spherically symmetric blastula into a highly nonsymmetrical human being with great rapidity.

Figure 9 – A spontaneous spike in the concentration of protein A can create a permanent peak of protein A surrounded by a foothill ring of protein B and break the spherical symmetry of the hollow ball of cells that form a blastula.

The huge expanse of logical operations that are encoded in the network of genes, combined with epigenetic information that is encoded within the structures of the chromosomes themselves, is quite remarkable because not only do they have to rapidly develop the embryo into a viable organism that can live on its own, but they also have to keep the growing embryo alive at all stages of development, as it greatly changes in size, shape and function.

Figure 10 – The cascades of morphogens, or paracrine factors, rapidly change the spherical blastula into a highly nonsymmetrical human being.

Many of the morphogen, or paracrine factor, cascades are very similar for all multicellular organisms, leading to very similar patterns of development, a sure sign that inherited reusable code is in action.

Figure 11 – The development of embryos across species is remarkably similar because of the reuse of the code found within the cascades of morphogen, or paracrine factors.

I recently finished the ninth edition of Developmental Biology 2010 by Scott F. Gilbert, a 711 page college textbook. Now that I am 62 years old, I frequently like to read current college textbooks from cover to cover, without the least worry about problem sets or exams. The striking realization that I came to from reading this textbook was that for IT professionals struggling with the new SOA architecture, it is critical to focus upon the network of logical operations that the billions of Objects that the SOA architecture generates, and to focus less upon the individual methods within any given Object. There will be more on that in the next section.

The Development of Embryos in Commercial Software
With the advent of SOA (Service Oriented Architecture) about 10 years ago we have seen the evolution of a very similar pattern of embryogenesis in commercial software. For more about SOA please see:

Service-oriented architecture
http://en.wikipedia.org/wiki/Service-oriented_architecture

As I have mentioned in many previous softwarephysics postings, commercial software has been evolving about 100 million times faster than biological software over the last 70 years, or 2.2 billion seconds, ever since Konrad Zuse cranked up his Z3 computer in May of 1941, and that the architectural history of commercial software has essentially recapitulated the evolutionary history of life on Earth over this same period of time through a process of convergence. Over the years, the architecture of commercial software has passed through a very lengthy period of prokaryotic architecture (1941 – 1972), followed by a period of single-celled eukaryotic architecture (1972 – 1992). Multicellular organization took off next with the Object-Oriented revolution of the early 1990s, especially with the arrival of Java in 1995. About 10 years ago, commercial software entered into a Cambrian explosion of its own with the advent of SOA (Service Oriented Architecture) in which large-scale multicellular applications first appeared, chiefly in the form of high-volume corporate websites. For more on this see the SoftwarePaleontology section of:SoftwareBiology.

Object-Oriented Programming Techniques Allow for the Multicellular Organization of Software
Before proceeding with the development of embryos in commercial software, we first need to spend some time exploring how multicellular organization is accomplished in commercial software. Multicellular organization in commercial software is based upon the use of Object-Oriented programming languages. Object-Oriented programming actually began in 1962, but it did not catch on at first. In the late 1980s, the use of the very first significant Object-Oriented programing language, known as C++, started to appear in corporate IT, but Object-Oriented programming really did not become significant in IT until 1995 when both Java and the Internet Revolution arrived at the same time. The key idea in Object-Oriented programming is naturally the concept of an Object. An Object is simply a cell. Object-oriented languages use the concept of a Class, which is a set of instructions for building an Object (cell) of a particular cell type in the memory of a computer. Depending upon whom you cite, there are several hundred different cell types in the human body, but in IT we generally use many thousands of cell types or Classes in commercial software. For a brief overview of these concepts go to the webpage below and follow the links by clicking on them.

Lesson: Object-Oriented Programming Concepts
http://docs.oracle.com/javase/tutorial/java/concepts/index.html

A Class defines the data that an Object stores in memory and also the methods that operate upon the Object data. Remember, an Object is simply a cell. Methods are like biochemical pathways that consist of many steps or lines of code. A public method is a biochemical pathway that can be invoked by sending a message to a particular Object, like using a ligand molecule secreted from one Object to bind to the membrane receptors on another Object. This binding of a ligand to a public method of an Object can then trigger a cascade of private internal methods within an Object or cell.

Figure 12 – A Class contains the instructions for building an Object in the memory of a computer and basically defines the cell type of an Object. The Class defines the data that an Object stores in memory and also the methods that can operate upon the Object data.

Figure 13 – Above is an example of a Bicycle Object. The Bicycle Object has three data elements - speed in mph, cadence in rpm, and a gear number. These data elements define the state of a Bicycle Object. The Bicycle Object also has three methods – changeGears, applyBrakes, and changeCadence that can be used to change the values of the Bicycle Object’s internal data elements.

Figure 14 – Above is some very simple Java code for a Bicycle Class. Real Class files have many data elements and methods and are usually hundreds of lines of code in length.

Figure 15 – Many different Objects can be created from a single Class just as many cells can be created from a single cell type. The above List Objects are created by instantiating the List Class three times and each List Object contains a unique list of numbers. The individual List Objects have public methods to insert or remove numbers from the Objects and also an internal sort method that could be called whenever the public insert or remove methods are called. The internal sort method automatically sorts the numbers in the List Object whenever a number is added or removed from the Object.

Figure 16 – Objects communicate with each other by sending messages. Really one Object calls the exposed public methods of another Object and passes some data to the Object it calls, like one cell secreting a ligand molecule that then plugs into a membrane receptor on another cell.

Figure 17 – In Turing’s model cells in a growing embryo communicate with each other by sending out ligand molecules called morphogens or paracrine factors that bind to membrane receptors on other cells.

Figure 18 – Calling a public method of an Object can initiate the execution of a cascade of private internal methods within the Object. Similarly, when a paracrine factor molecule plugs into a receptor on the surface of a cell, it can initiate a cascade of internal biochemical pathways. In the above figure, an Ag protein plugs into a BCR receptor and initiates a cascade of biochemical pathways or methods within a cell.

Embryonic Growth and Differentiation of a High-Volume Corporate Website
When a high-volume corporate website, consisting of many millions of lines of code running on hundreds of servers, starts up and begins taking traffic, billions of Objects (cells) begin to be instantiated in the memory of the servers in a manner of minutes and then begin to exchange messages with each other in order to perform the functions of the website. Essentially, when the website boots up, it quickly grows to a mature adult through a period of very rapid embryonic growth and differentiation, as billions of Objects are created and differentiated to form the tissues of the website organism. These Objects then begin exchanging messages with each other by calling public methods on other Objects to invoke cascades of private internal methods which are then executed within the called Objects.

For example, today most modern high-volume corporate websites use the MVC pattern – the Model-View-Controller pattern. In the mid-1990s IT came upon the concept of application patterns. An application pattern is basically a phylum, a basic body plan for an application, and the MVC pattern is the most familiar. For example, when you order something from Amazon, you are using an MVC application. The Model is the endoderm or “guts” of the application that stores all of the data on tables in relational databases. A database table is like an Excel spreadsheet, containing many rows of data, and each table consists of a number of columns with differing datatypes and sizes. For example, there may be columns containing strings, numbers and dates of various sizes in bytes. Most tables will have a Primary Index, like a CustomerID, that uniquely identifies each row of data. By joining tables together via their columns it is possible to create composite rows of data. For example, by combining all the rows in the Customers table with the rows in the Orders table via the CustomerID column in each table, it is possible to find information about all of the orders a particular customer has placed. Amazon has a database Model consisting of thousands of tables of data that store all of the information about their products on millions of rows, like descriptions of the products and how many they have in stock, as well as tables on all of their orders and customers. The View in an MVC application comprises the ectoderm tissues of the application and defines how the application looks to the end-user from the outside. The View consists mainly of screens and reports. When you place an order with Amazon, you do so by viewing their products online and then filling out webpage screens with data. When you place an order, the View code takes in the data and validates it for errors. Reports are static output, like the final webpage you see with your order information and the email you receive confirming your order. The Controller code of an MVC application forms the muscular mesoderm connective tissue that connects the View (ectoderm) layer to the Model (endoderm) layer and does most of the heavy lifting. The Controller code has to connect the data from the Model and format it into the proper View that the end-user sees on the surface. The Controller code also has to take the data from the View and create orders from it and send instructions to the warehouse to put the order together. The Controller has to also do things like debit your credit card. So as you can see, Controller code, like mesoderm, is the most muscular code and also the most costly to build.

Figure 19 – The endoderm of an MVC application forms the “guts” of the application and consists of a large number of database Objects that hold the selected data from thousands of relational database tables.

Figure 20 – An online order screen is displayed by Objects in your browser that form the ectoderm layer of an MVC application. The information on the screen comes from HTML that is sent to your browser from the middleware (mesoderm) layer of an MVC application.

The mesoderm layer of website MVC applications runs on middleware that lies between the ectoderm Objects running on an end-user’s browser and the database Objects running on the endoderm layer. Figure 21 shows that the middleware (mesoderm) layer is composed of components that are protected by several layers of firewalls to ward off attacks from the outside. The middleware feeds HTML to the end-user’s browser that kicks off Objects within the end-users browser that display the HTML as webpages.

Figure 21 - The mesoderm layer of website MVC applications runs on middleware that lies between the ectoderm Objects running on an end-user’s browser and the database Objects running on the endoderm layer. The middleware (mesoderm) layer is composed of components that are protected by several layers of firewalls to ward off attacks from the outside. The middleware feeds HTML to the end-user’s browser that kicks off Objects within the end-users browser that display the HTML as webpages.

This is accomplished with mesoderm middleware running on J2EE Application Server software like IBM’s WebSphere or Oracle’s WebLogic. A J2EE Application Server contains a WEB Container that stores pools of Servlet Objects and an EJB Container that stores pools of EJB Objects (see Figure 22). The EJB Objects get data from relational databases (DB) and process the data and then pass the information on to Servlet Objects. The Servlet Objects generate HTML based upon the data processed by the EJB Objects and pass the HTML to HTTP webservers like Apache. The HTTP webservers then send out the HTML to the Objects in your browser to be displayed on your PC or smartphone. When you fill out an order screen on your PC to purchase an item, the flow of information is reversed and ultimately updates the data in the relational databases (DB). Each J2EE Application Server runs in its own JVM (Java Virtual Machine), and a modern high-volume corporate website might be powered by thousands of J2EE Application Servers in JVMs, running on dozens of physical servers, and each J2EE Application Server might contain millions of Objects.

With SOA (Service Oriented Architecture) some of the J2EE Application Servers run in Service Cells that provide basic services to other J2EE Application Servers running in Consumer Cells. The Objects in Service Cells perform basic functions, like looking up a customer’s credit score or current account information and provide the information as a service via SOAP or REST calls to Objects in Consumer Cells. Essentially, the Objects in a Service Cell of J2EE Application Servers perform the services that the cells in an organ, like the lungs or kidneys perform, for other somatic cells elsewhere in the body of an organism.

Figure 22 - A J2EE Application Server contains a WEB Container that stores pools of Servlet Objects and an EJB Container that stores pools of EJB Objects. The EJB Objects get data from relational databases (DB) and processes the data and then passes the information to Servlet Objects. The Servlet Objects generate HTML based upon the data processed by the EJB Objects and pass the HTML to HTTP webservers like Apache.

As you can see the middleware mesoderm tissues, composed of billions of Objects running on thousands of J2EE Application Server JVMs, does most of the heavy lifting in the MVC applications running on high-volume corporate websites. This is accomplished by running the middleware software on banks of clustered servers with load balancers between each layer that spray traffic to the next layer. This allows for great flexibility and allows MVC applications to scale to any load by simply adding more servers to each layer to handle more Objects in the middleware mesoderm tissues.

Figure 23 – As you can see the middleware mesoderm tissues, composed of billions of Objects, does most of the heavy lifting in the MVC applications running on high-volume corporate websites. This is accomplished by running the middleware software on banks of clustered servers with load balancers between each layer that spray traffic to the next layer. This allows for great flexibility and allows MVC applications to scale to any load by simply adding more servers to each layer to handle more Objects in the middleware mesoderm tissues.

When you login to a high-volume corporate website many thousands of Objects are created for your particular session. These Objects consume a certain amount of memory on the banks of servers in each layer of middleware, and this can lead to problems. For example, one of the most common forms of software disease is called an OOM (Out Of Memory) condition. As I mentioned previously, there are several billion Objects (cells) running at any given time within the middleware mesoderm tissues of a major corporation. These Objects are created and destroyed as users login and then later leave a corporate website. These Objects reside in the JVMs of J2EE Appservers. These JVMs periodically run a “garbage collection” task every few minutes to release the memory used by the “dead” Objects left behind by end-users who have logged off the website. The garbage collection task frees up memory in the JVM so that new “live” Objects can be created for new end-users logging into the website. In biology, the programmed death of cells is called apoptosis. For example, between 50 and 70 billion cells die each day due to apoptosis in the average human adult, and when apoptosis fails, it can be a sign of cancer and the uncontrolled growth of tumor cells. Similarly, sometimes, for seemingly unknown reasons, Objects in the JVMs refuse to die and begin to proliferate in a neoplastic and uncontrolled manner, similar to the cells in a cancerous tumor, until the JVM finally runs out of memory and can no longer create new Objects. The JVM essentially dies at this point and generates a heap dump file. MidOps has a tool that allows us to look at the heap dump of the JVM that died. The tool is much like the microscope that my wife used to look at the frozen and permanent sections of a biopsy sample when she was a practicing pathologist. The heap dump will show us information about the tens of millions of Objects that were in the JVM at the time of its death. Object counts in the heap dump will show us which Objects metastasized, but will not tell us why they did so. So after a lot of analysis by a lot of people, nobody can really figure out why the OOM event happened and that does not make IT management happy. IT management always wants to know what the “root cause” of the problem was so that we can remove it. I keep trying to tell them that it is like trying to find the “root cause” of a thunderstorm! Yes, we can track the motions of large bodies of warm and cold air intermingling over the Midwest, but we cannot find the “root cause” of a particular thunderstorm over a particular suburb of Chicago because the thunderstorm is an emergent behavior of a complex nonlinear network of software Objects. See Software Chaos for more details.

Do Biologists Have It All Wrong?
As we have seen, the evolution of the architecture of commercial software over the past 70 years, or 2.2 billion seconds, has closely followed the same architectural path that life on Earth followed over the past 4 billion years. The reason for this is that both commercial software and living things are forms of self-replicating information. For more on that see A Brief History of Self-Replicating Information. Because both commercial software and living things have converged upon very similar solutions to combat the second law of thermodynamics in a nonlinear Universe, I contend that the study of commercial software by biologists would provide just as much research value as studying any alien life forms that we might eventually find on Mars, or the moons Europa, Enceladus or Titan, if we should ever get there, and why bother, when all you have to do is spend some time in the nearby IT department of a major corporation in the city that your University resides?

Based upon this premise, I would like to highlight some of Professor Dick Gordon’s work on embryogenesis which goes well beyond Alan Turing’s theory that gradients of morphogens are solely responsible for embryonic growth and differentiation. I recently attended the 2014 winter session of Professor Gordon’s Embryo Physics course which met every Wednesday afternoon at 4:00 PM CT in a Second Life virtual world session. I would highly recommend this course to all IT professionals willing to think a bit outside of the box. For more on this very interesting ongoing course please see:

Embryogenesis Explained
http://embryogenesisexplained.com/

Dick Gordon’s theory of embryogenesis is best explained by his The Hierarchical Genome and Differentiation Waves - Novel Unification of Development, Genetics and Evolution (1999). But here is a nice summary of his theory that he presented as a lecture to the students of the Embryo Physics class on March 20, 2012:

Cause and Effect in the Interaction between Embryogenesis and the Genome
http://embryogenesisexplained.com/files/presentations/Gordon2012.pdf

Basically, his theory of morphogenesis is that the genes in the genomes of multicellular organisms that control embryogenesis are organized in a hierarchical manner and that as mechanical differentiation waves pass through the cells of a growing embryo, they trigger cascades of epigenetic events within the cells of the embryo that cause them to split along differentiation trees. As more and more differentiation waves impinge upon the differentiating cells of an embryo, the cells continue down continuously bifurcating differentiation trees. This model differs significantly from Alan Turing’s model of morphogenesis that relies upon the varying diffusion rates of morphogens, creating chemical gradients that turn genes on and off. In Dick Gordon’s model it is the arrival of mechanical expansion and contraction waves at each cell that determines how the cell will differentiate by turning specific genes on and off in cascades, and consequently, the differentiation waves determine what proteins each cell ultimately produces and in what quantities. In his theory, each cell has a ring of microtubules that essentially performs the functions of a seismometer that senses the passage of differentiation waves and is called a cell state splitter. When an expansion differentiation wave arrives, it causes the cell state splitter to expand, and when a contraction differential wave arrives, it causes the cell state splitter to contract. The expansion or contraction of the cell state splitter then causes the nucleus of the cell to distort in a similar manner.

Figure 24 – A circular ring of microfilaments performs the functions of a seismometer that senses the passage of expansion and contraction differentiation waves passing by and is called a cell state splitter. The cell state splitter then passes along the signal to the nucleus of the cell (From slide 75 of Dick Gordon’s lecture).

The distortion of the cell’s nucleus then causes one of two possible gene cascades to fire within the cell. Dick Gordon calls this binary logical operation the nuclear state splitter.

Figure 25 – Changes in the cell state splitter seismometer, caused by a passing contraction or expansion differentiation wave triggers the nuclear state splitter to fire in one of two possible gene cascades (From slide 96 of Dick Gordon’s lecture).

Figure 26 – Groups of cells of one cell type bifurcate along differentiation tree paths when a cell state splitter seismometer fires (From slide 51 of Dick Gordon’s lecture).

Figure 27 – As each contraction or expansion wave impinges upon a cell it causes the cell to split down one branch or the other of a differentiation tree by launching a gene cascade within the cell (From slide 52 of Dick Gordon’s lecture).

The distinguishing characteristic of Dick Gordon’s model is that the information needed by a cell to differentiate properly is not biochemically passed from one cell to another. Rather the information is transmitted via differentiation waves. For example, in Alan Turing’s morphogen model, morphogenic proteins are passed from one cell to another as paracrine factors that diffuse from one cell to another along a gradient. Or the morphogenic proteins pass directly from one cell to another across their cell membranes. Or morphogen generating cells are transported to other sites within an embryo as the embryo grows and then do both of the above.

Figure 28 – In Alan Turing’s model for morphogenesis, the information necessary for a cell to differentiate properly is passed from cell to cell purely in a biochemical manner (From slide 92 of Dick Gordon’s lecture).

In Dick Gordon’s model, it is the passage of differentiation waves that transmits the information required for cells to differentiate. In Figure 29 we see that as a differentiation wave passes by and each cell gets squished, the cell state splitter of the cell launches a cascade of genes that generate internal morphogenic proteins within the cell that cause the cell to differentiate. Superficially, this leaves one with the false impression that there is a gradient of such morphogenic proteins in play.

Figure 29 - As a differentiation wave passes by, each cell gets squished, and the cell state splitter of the cell launches a cascade of genes that generate internal morphogenic proteins within the cell. (From slide 93 of Dick Gordon’s lecture).

I am certainly not a biologist, but from the perspective of an IT professional and a former exploration geophysicist, I find that Dick Gordon’s theory merits further investigation for a number of reasons.

1. From an IT perspective it seems that the genomes of eukaryotic multicellular organisms may have adopted hierarchical indexed access methods to locate groups of genes and to turn them on or off during embryogenesis
As I pointed out in An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer, IT professionals had a very similar problem when trying to find a particular customer record out of 50 million customer records stored on 14 miles of magnetic tape back in the 1970s. To overcome that problem, IT moved the customer data to disk drives and invented hierarchical indexed access methods using hierarchical indexes, like ISAM and VSAM to quickly find the single customer record. Today we store all commercial data on relational databases, but those relational databases still use hierarchical indexes under the hood. For example, suppose you have 200 customers, rather than 50 million, and would like to find the information on customer 190. If the customer data were stored as a sequential file on magnetic tape, you would have to read through the first 189 customer records before you finally got to customer 190. However, if the customer data were stored on a disk drive, using an indexed sequential access method like ISAM or VSAM, you could get to the customer after 3 reads that get you to the leaf page containing records 176 – 200, and you would only have to read 14 records on the leaf page before you got to record 190. Similarly, the differentiating cells within a growing embryo must have a difficult time finding the genes they need among the 23,000 genes of our genome that are stored on some small percentage of the 6 feet of DNA tape in each cell. So it is possible that the chromatin and chromosomes of the eukaryotic cells found within multicellular organisms provide for a hierarchical indexed access method to locate groups of genes and individual genes, and that the passage of Dick Gordon’s differentiation waves provide for the epigenetic means to initiate the hierarchical indexed access methods needed to differentiate the cells of a growing embryo. Comparing Figure 27 and Figure 30 you will find that they both form hierarchical tree structures. In Figure 30 if you think of the intermediate levels as being composed of gene cascades, rather than pointers to customer records, you essentially get an upside-down version of Dick Gordon’s Figure 27.

Figure 30 – To find customer 190 out of 200 on a magnetic tape would require sequentially reading 189 customer records. Using the above hierarchical B-Tree index would only require 3 reads to get to the leaf page containing records 176 – 200. Then an additional 14 reads would get you to customer record 190.

2. Where is the system clock that controls embryogenesis?
The current paradigm seems to rely heavily upon Alan Turing's theory of chemical morphogenesis, where cascades of regulatory proteins are generated at different foci within a growing embryo, and the concentration gradients of these regulatory proteins turn genes on and off in neighboring cells by enhancing or suppressing the promoters of genes. Based upon a great deal of experimental work, I think much of this may be true at a superficial level. But based on my IT experience, I also think something is missing. Alan Turing's theory of chemical morphogenesis is very much like the arrays of logic gates on a CPU chip that come together to form a basic function in the instruction set of a CPU, like adding two binary numbers together in two registers. At the lowest level we have billions of transistor switches turning on and off in a synchronized manner to form AND, OR, and NOT logic gates which are then combined to form the instruction set of a CPU. So we have billions of transistor switches causing billions of other transistor switches to turn on or off in cascades, just as we have billions or trillions of genes turning on and off in cascades of morphogenic proteins. But where is the overall system clock in a growing embryo? We definitely have a system clock in all CPUs that synchronizes all of the switch cascades.

Figure 31 – All CPUs have a system clock that fires many billions of times each second. The system clock sends out an electromagnetic wave as a series of pulses to all of the billions of transistors on a CPU chip. As each pulse arrives at a particular transistor, the transistor must determine if it is to keep its current state of being a 1 or a 0, or to change its state based upon the current state of the logic gate that it finds itself in. Similarly, each cell in a growing embryo must make the same decision via its state splitter when a differentiation wave passes by.

This is why I like Dick Gordon’s theory of differentiation waves that traverse throughout a growing embryo and which perform the same function as a system clock in a CPU by coordinating the turning on and off of genes across all of the cells in an embryo. In a sense, Dick Gordon’s theory can be used to view a growing embryo as a mechanical-chemical computer, using a mechanical system clock driven by mechanical differentiation waves to synchronize events, like Konrad Zuse’s mechanical Z1 computer that was loosely based upon Alan Turing’s conceptual Turing machine. Figure 32 shows a diagram from Konrad Zuse's May 1936 patent for a mechanical binary switching element, using mechanical flat sliding rods that were a fundamental component of the Z1, and which essentially performed the function of Dick Gordon’s cell state splitter. Ironically, Zuse’s patent was granted in the same year that Alan Turing developed the mathematical concept of the Turing Machine in On Computable Numbers, with an Application to the Entscheidungsproblem. Again, Turing and Zuse were not aware of each other’s work at the time and were soon to find themselves on opposite sides during World War II. For more on the Z1 see:

http://en.wikipedia.org/wiki/Z1_%28computer%29

Figure 32 – A diagram from Zuse's May 1936 patent for a binary switching element, using mechanical flat sliding rods that were a fundamental component of the Z1.

In commercial software, at the level of Objects communicating with other Objects by calling public methods, it superficially appears as though differentiated Objects are brought into existence by paracrine morphogens passing from one Object to another, but don’t forget that this is an abstraction of many abstractions. At the lowest level, it is all really controlled by the system clocks on thousands of CPU chips in hundreds of servers sending out electromagnetic pulses across the surfaces of the chips.

3. Differentiation waves enable the GPS units of growing embryos
As a former geophysicist, what I really like about Dick Gordon’s theory is that the sequence and arrival times of differentiation waves will depend upon a cell’s location in a growing 3-D embryo. In that regard, it is very much like seismology. When an earthquake occurs many different types of mechanical waves are generated and begin to propagate along the Earth’s surface and throughout the Earth’s body. P-waves are compressional longitudinal waves that have the highest velocities, and therefore, arrive first at all recording stations as primary or p-waves. S-waves are transverse waves that have a lower velocity than p-waves, and therefore, arrive later at all recording stations as secondary or s-waves. By measuring the number of seconds between the arrival of the first p-waves and the arrival of the first s-waves, it is possible to tell how far away the epicenter of the earthquake is from a recording station, and by timing the arrival of the first p-waves and the first s-waves from an earthquake at a number of recording stations, it is possible to triangulate the epicenter of the earthquake. For more on that see:

http://www.geo.mtu.edu/UPSeis/locating.html

For example, in Figure 35 we see that for the Kobe earthquake of January 17, 1995, a recording station in Manila received p-waves and s-waves from the earthquake first. Receiving stations in Stockholm and Honolulu both received p-waves and s-waves from the earthquake at later times, and the number of seconds between the arrival of the p-waves and s-waves at those distant locations was greater than it was for the Manila seismic station because Stockholm and Honolulu are both much farther away from Kobe than is Manila. By plotting the arrival times for all three recording stations, it was possible for geophysicists to triangulate the location of the epicenter of the earthquake.

GPS units work just the opposite. With a GPS system, we have a single recording station and electromagnetic waves coming from multiple “earthquakes” in the sky on board a number of GPS satellites that orbit the Earth with an orbital radius of 16,500 miles. Again, by comparing the arrival times of the electromagnetic waves from several GPS satellites, it is possible to triangulate the position of a GPS receiver on the Earth.

Similarly, because each seismic recording station on the Earth has a unique position on the Earth’s surface, the first p-waves and s-waves arrive at different times and in different sequences when several earthquakes at different locations on the Earth are all happening around the same time. Since there are always many little earthquakes going on all over the Earth all the time, the Earth’s natural seismicity can be used as a very primitive natural GPS system. Essentially, by comparing p-wave arrival times at one seismic station with the p-wave arrival times of surrounding seismic stations, you can also figure out the location of the seismic station relative to the others.

I imagine this would be a great way to create a closed feedback loop between the genes and a developing embryo. Since each cell in a growing embryo occupies a unique 3-D location, and thus will experience a unique timing and sequencing of differentiation waves, it’s a wonderful way for the genes in an individual cell to obtain its GPS location in an embryo and to differentiate accordingly by switching on certain genes, while switching other genes off. One could still maintain that all of this is still under the control of the genes in a Richard Dawkins’ The Extended Phenotype 1982 manner. But communicating via waves seems like a much more efficient way to coordinate the growth and differentiation of an entire embryo, rather than trying to rely on morphogens diffusing across the bulk mass of an embryo that is essentially many light years in diameter at the molecular level. Indeed most information transmission in the Universe is accomplished via waves. It would only make sense that living things would stumble upon this fact at the cellular level. We certainly see macroscopic organisms using sound and light waves for communications, in addition to the primitive communications that are accomplished by the diffusion of scent molecules.

Figure 33 – When built up tectonic strain is released near the Earth’s surface sudden rock motions release mechanical waves that propagated away from the earthquake focus. When these mechanical waves reach you, they are felt as an earthquake.

Figure 34 – P-waves have the highest velocity and therefore arrive at each recording station first.

Figure 35 – By recording the arrival times of the p-waves and s-waves at a number of recording stations, it is possible to triangulate the epicenter of an earthquake. For example, by comparing the arrival times of the p-waves and s-waves from an earthquake at recording stations in Stockholm, Manila and Honolulu it was possible to pinpoint the epicenter of the January 17, 1995 earthquake in Kobe Japan.

Figure 36 – GPS systems do just the opposite. You can find your location on the Earth by comparing the arrival times of electromagnetic waves from several different satellites at the same time. If you think of each arriving satellite signal as a separate earthquake, it is possible to use them to triangulate your position.

Figure 37 – A conceptual view of the seismic waves departing from an earthquake in Italy.

Figure 38 – An ectoderm contraction wave in amphibian embryos. At hourly intervals, the image was digitally subtracted from the one 5 minutes earlier, showing the moving ectoderm contraction wave. The arc-shaped wave moves faster at its ends than in the middle, reforming a circle which then vanishes at what will be the anterior (head) end of the embryo. (These sets of images are from three different embryos.) The Bar in slide 10 = 1 mm. (Reprint of slides 80 and 81 of Dick Gordon’s lecture).

4. Déjà vu all over again
The current model that biologists use for morphogenesis reminds me very much of the state of affairs that classical geology found itself in back in 1960, before the advent of plate tectonics. I graduated from the University of Illinois in 1973 with a B.S. in physics, only to find that the end of the Space Race and a temporary lull in the Cold War had left very few prospects for a budding physicist. So on the advice of my roommate, a geology major, I headed up north to the University of Wisconsin in Madison to obtain an M.S. in geophysics, with the hope of obtaining a job with an oil company exploring for oil. These were heady days for geology because we were at the very tail end of the plate tectonics revolution that totally changed the fundamental models of geology. The plate tectonics revolution peaked during the five year period 1965 – 1970. Having never taken a single course in geology during all of my undergraduate studies, I was accepted into the geophysics program with many deficiencies in geology, so I had to take many undergraduate geology courses to get up to speed in this new science. The funny thing was that the geology textbooks of the time had not yet had time to catch up with the new plate tectonics revolution of the previous decade, so they still embraced the “classical” geological models of the past which now seemed a little bit silly in light of the new plate tectonics model. But this was also very enlightening. It was like looking back at the prevailing thoughts in physics prior to Newton or Einstein. What the classical geological textbooks taught me was that over the course of several hundred years, the geologists had figured out what had happened, but not why it had happened. Up until 1960 geology was mainly an observational science relying upon the human senses of sight and touch, and by observing and mapping many outcrops in detail, the geologists had figured out how mountains had formed, but not why.

In classical geology, most geomorphology was thought to arise from local geological processes. For example, in classical geology, fold mountains formed off the coast of a continent when a geosyncline formed because the continental shelf underwent a dramatic period of subsidence for some unknown reason. Then very thick layers of sedimentary rock were deposited into the subsiding geosyncline, consisting of alternating layers of sand and mud that turned into sandstones and shales, intermingled with limestones that were deposited from the carbonate shells of dead sea life floating down or from coral reefs. Next, for some unknown reason, the sedimentary rocks were laterally compressed into folded structures that slowly rose from the sea. More compression then followed, exceeding the ability of the sedimentary rock to deform plastically, resulting in thrust faults forming that uplifted blocks of sedimentary rock even higher. As compression continued, some of the sedimentary rocks were then forced down into great depths within the Earth and were then placed under great pressures and temperatures. These sedimentary rocks were then far from the thermodynamic equilibrium of the Earth’s surface where they had originally formed, and thus the atoms within recrystallized into new metamorphic minerals. At the same time, for some unknown reason, huge plumes of granitic magma rose from deep within the Earth’s interior as granitic batholiths. Then over several hundred millions of years, the overlying folded sedimentary rocks slowly eroded away, revealing the underlying metamorphic rocks and granitic batholiths, allowing human beings to cut them into slabs and to polish them into pretty rectangular slabs for the purpose of slapping them up onto the exteriors of office buildings and onto kitchen countertops. In 1960, classical geologists had no idea why the above sequence of events, producing very complicated geological structures, seemed to happen over and over again many times over the course of billions of years. But with the advent of plate tectonics (1965 – 1970), all was suddenly revealed. It was the lateral movement of plates on a global scale that made it all happen. With plate tectonics, everything finally made sense. Fold mountains did not form from purely local geological factors in play. There was the overall controlling geological process of global plate tectonics making it happen. For a comparison of the geomorphology of fold mountains with the morphogenesis of an embryo, please take a quick look at the two videos down below:

Fold Mountains
http://www.youtube.com/watch?v=Jy3ORIgyXyk

Medical Embryology - Difficult Concepts of Early Development Explained Simply
https://www.youtube.com/watch?annotation_id=annotation_1295988581&feature=iv&src_vid=rN3lep6roRI&v=nQU5aKKDwmo

Figure 39 – Fold mountains occur when two tectonic plates collide. A descending oceanic plate first causes subsidence offshore of a continental plate which forms a geosyncline that accumulates sediments. When all of the oceanic plate between two continents has been consumed, the two continental plates collide and compress the accumulated sediments in the geosyncline into fold mountains. This is how the Himalayas formed when India crashed into Asia.

Now the plate tectonics revolution was really made possible by the availability of geophysical data. It turns out that most of the pertinent action of plate tectonics occurs under the oceans, at the plate spreading centers and subduction zones, far removed from the watchful eyes of geologists in the field with their notebooks and trusty hand lenses. Geophysics really took off after World War II, when universities were finally able to get their hands on cheap war surplus gear. By mapping variations in the Earth’s gravitational and magnetic fields and by conducting deep oceanic seismic surveys, geophysicists were finally able to figure out what was happening at the plate spreading centers and subduction zones. Actually, the geophysicist and meteorologist Alfred Wegner had figured this all out in 1912 with his theory of Continental Drift, but at the time Wegner was ridiculed by the geological establishment. You see, Wegner had been an arctic explorer and had noticed that sometimes sea ice split apart, like South America and Africa, only later to collide again to form mountain-like pressure ridges. Unfortunately, Wegner froze to death in 1930 trying to provision some members of his last exploration party to Greenland, never knowing that one day he would finally be vindicated. In many ways, I suspect that Dick Gordon might be another Alfred Wegner and that his embryogenesis model built upon differentiation waves, cell state splitters and differentiation trees might just be the plate tectonics of embryology. Frankly, geophysicists would just love to know that geologists came from seismic waves traveling over the surfaces and through the bodies of growing embryos!

Final Thoughts
Based upon this posting and An IT Perspective on the Origin of Chromatin, Chromosomes and Cancer, together with my experience of watching commercial software evolve over the past 42 years, it may have gone down like this. About 4.0 billion years ago some very ancient prokaryotic archaea began to wrap their DNA around histone proteins to compact and stabilize the DNA under the harsh conditions that the archaea are so fond of. Basically, these ancient archaea accidentally discovered the value of using computer tape reels to store their DNA on tape racks. This innovation was found to be far superior to the simple free-floating DNA loops of normal prokaryotic bacteria that basically stored their DNA tape loosely sprawled all over the computer room floor. During this same interval of time, a number of parasitic bacteria took up residence within these archaea and entered into parasitic/symbiotic relationships with them to form the other organelles of eukaryotic cells in accordance with the Endosymbiosis theory of Lynn Margulis. These early single-celled eukaryotes then serendipitously discovered that storing DNA on tape reels that were systematically positioned on tape racks in identifiable locations in the form of chromosomes, allowed for epigenetic factors to control the kinds and amounts of proteins that were to be generated at any given time. This was a far superior access method for genes compared to the simple sequential access methods used by the prokaryotes. With time, this innovation that originally was meant to stabilize DNA under harsh conditions was further exapted into full-fledged indexed hierarchical B-Tree access methods like ISAM and VSAM. With the Cambrian explosion 541 million years ago these pre-existing indexed hierarchical B-Tree access methods were further exapted into Dick Gordon’s hierarchical differentiation trees by exapting the passage of differentiation waves through the body of developing embryos. I am guessing that originally the differentiation waves in a growing embryo served some other unrelated useful purpose, or perhaps they were just naturally occurring mechanical waves that arose to relieve the strains that accumulated in a growing embryo, like little earthquakes, or perhaps they were just a spandrel for some other totally unrelated function. However, once the cells in a growing embryo discovered the advantages of using wave communications to keep development in sync, there was no turning back.

For both biologists and IT professionals, the most ponderous thing about all of this is how can all that information possibly be encoded within a single fertilized egg? Somehow it must be encoded in stretches of DNA that we call genes, stretches of DNA that we do not call genes, in the complex structure of the chromatin and chromosomes of the single cell, and the complex structure of the fertilized egg itself, creating a complex network of interacting logical operations that ultimately produce a mature newborn.

Figure 40 – The Star Child from Stanley Kubrick’s 2001: A Space Odyssey gazes down upon the planet from which it came and wonders how it all can be.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

No comments: