Saturday, July 01, 2006

SoftwarePhysics

Have you ever wondered why your IT job is so difficult? Have you ever noticed that whenever we change software, performance can drastically decline? Have you observed that performance can drastically decline even when we don’t change software; that sometimes applications spontaneously get slow and then spontaneously return to normal response times without any intervention? Have you noticed that 50% of the time we never find a root cause for problems, and that we just start bouncing things at random until performance improves? Have you ever wondered why software behaves this way? Is there anything we can do about all this?

If you have had such thoughts, then you need to take a Twilight Zone trip into cyberspacetime and take a look at the material for my course for IT people called SoftwarePhysics 101 – The Physics of Cyberspacetime . Softwarephysics is an approach to software development, maintenance, and support based upon concepts from physics and biology that I have been using for over 25 years. The purpose of softwarephysics is to explain why IT is so difficult, to suggest possible remedies, and to provide a direction for thought. In addition, several universities also now offer courses on Biologically Inspired Computing which cover the biological aspects of softwarephysics, and the online content for some of these courses can be found by Googling for Biologically Inspired Computing.

SoftwarePhysics 101 – The Physics of Cyberspacetime is now available on Google Docs. Please note that some of the formulas do not render properly, especially exponents which do not display as superscripts, so please use your imagination.

Part 1 - Part 1 of the original PowerPoint document.
Part 2 - Part 2 of the original PowerPoint document.
Entropy – A spreadsheet referenced in Part 1
BSDE – A 1989 document describing how to use BSDE - the Bionic Systems Development Environment - to grow applications from genes and embryos within the maternal BSDE software.

Before proceeding further, let me describe a typical IT scenario that might help explain the utility of softwarephysics. I am currently in the Middleware Operations group supporting the website for a major corporation. The website is comprised of about 90 production servers – load balancers, firewalls, proxy servers, webservers, WebSphere Application Servers, CICS Gateway servers to mainframes, and mailservers. Every few weeks, the website will become extremely slow and all sorts of symptoms will begin to emerge on the servers. For example, I might initially see that the number of connections to the Oracle databases on one of the eight WebSphere Application Servers suddenly rise and WebSphere begin to get timeouts for database connections. This symptom may then spread to the other seven WebSphere servers causing the number of connections on an internal firewall to max out. Transactions begin to back up into the webservers and eventually max out the load balancers in front of the webservers. The whole architecture of 90 servers can spin out of control and grind to a halt within a period of several seconds to several hours, depending upon the situation. When this happens, our internal monitors will detect a mounting problem and will page out the appropriate people to jump onto the problem. A conference call will be convened and perhaps 10 people will begin looking for a root cause for the problem. Was some new application code released into production last night? Were the WebSphere configuration parameters changed? Was Oracle upgraded with a patch from the vendor? Are we under a hacker attack? While this analysis is proceeding, perhaps 100,000 people cannot use the website to check their balance or pay their bill electronically. It can get worse, when I was at United Airlines supporting www.ual.com, we would begin losing about $150/second because people could not book flights online. While on the conference call, there is a tension between finding a root cause for the escalating problem and bouncing the servers having problems. Bouncing is a technical term for stopping and restarting a piece of software or server to alleviate a problem, and anyone who owns a PC running the Windows operating system should be quite familiar with the process. The fear is that bouncing software may temporarily fix the problem, but the problem may eventually come back unless the root cause is determined. Also, it might take 30 minutes to bounce all of the affected servers and there is the risk that the problem will immediately reappear when the servers come back up. However, about 50% of the time we never find a root cause, and delaying bouncing can cause additional servers to spin out of control making recovery even more time consuming. As Ilya Prigogine has pointed out, cause and effect get pretty murky down at the level of individual transactions. In the chemical reaction

A  +  B  ↔   C

at equilibrium, do A and B produce C, or does C disassociate into A and B? We have the same problem with cause and effect in IT when trying to troubleshoot a large number of servers that are all in trouble at the same time. Is a rise in database connections a cause or an effect? Unfortunately, it can be both depending upon the situation. For the IT Operations department of a major corporation, the above scenario happens to several major systems on a daily basis, and this sort of activity has been going on for about 50 years in IT.

Since software and hardware are a product of human intelligence, IT people naturally think they understand what is going on at a reductionist metaphysical level. But understanding that software eventually translates into machine instructions playing quantum mechanical tricks with iron atoms on disk drives and silicon atoms in chips does not help with making day-to-day decisions in IT. What was needed was a pragmatic effective theory of software, and that is what softwarephysics is all about.

Outlined below are some of my adventures as a softwarephysicist exploring the fascinating physics of cyberspacetime. Softwarephysics is a fun, but useful, combination of physics, biology, and computer science that I have been using for more than 25 years, that might be of interest to you, if not entirely in a serious manner, perhaps at least in an entertaining one. Softwarephysics is a simulated science for the simulated software universe that we are all immersed in.

The Origin of Softwarephysics
In 1979, I was a happy exploration geophysicist exploring for oil in the Gulf of Suez out of Amoco’s Chicago exploration office. At the time, I was writing FORTRAN code for geophysical models to simulate the observed data from seismic surveys. Then one sad day, I learned that the whole Chicago exploration office was being transferred to Houston. Just six months prior to this announcement, I had left a really great job in Shell’s exploration office in Houston in order to return my family to our hometown of Chicago. To get myself out of this mess, I finagled my way into Amoco’s IT department supporting geophysical application software. After a couple of months in the IT department, I came to two conclusions:

1. Doing IT for a living was a lot harder than being an exploration geophysicist.

2. It seemed like these strange IT people had created their own little universe, or
at least a pretty good computer simulation of their own little universe.

It seemed like I was trapped in a frantic computer simulation, buried in punch card decks and fan-fold listings. I began to think of this computer-simulated universe as the "Software Universe". The Software Universe is a bit like Edwin A. Abbott’s Flatland, only a little more tangible. Flatland is a delightful little story written in 1884 about a charming little 2-dimensional universe called Flatland and its strange inhabitants that you can read about on the Internet. I soon noticed that the Software Universe was not as chaotic as it first appeared. It seemed to follow some "laws", and it seemed like I had seen many of these "laws" before when studying physics. The longer I studied the Software Universe, the more I became convinced that the IT community had accidentally produced a pretty decent computer simulation of the physical universe, just like I used to do on purpose when writing FORTRAN code to simulate geophysical models of seismic data. Soon I realized that I could use this accidental simulation in reverse. By understanding how the physical universe behaved, I could predict how the Software Universe would respond to stimuli. So to help myself cope with the daily mayhem of life in IT, I developed a scientific approach to software during my first few years that I called softwarephysics. I figured if you could apply physics to geology; why not apply physics to software? This was a bit of a role reversal. In physics, we use software to simulate the behavior of the universe, while in softwarephysics we use the universe to simulate the behavior of software.

The Software Universe is a 2-dimensional cyberspacetime universe consisting of a time dimension and processor space dimension. Cyberspacetime is not a continuum - both dimensions are quantized. The cyberspace dimension is defined by the 1 – 10 trillion currently active discrete microprocessors, wherever they might be, and the individual system clocks of each microprocessor quantize the time dimension. Cyberspacetime is not fixed and absolute because processes can jump from one processor to another and the system clocks all measure local time and have different drift rates. The Software Universe is not made of things; it consists only of processes and flows of information between discrete events in the two dimensions of cyberspacetime. There is the illusion that the Software Universe is filled with real things, such as files and databases with purchase orders and inventory levels on them, but this is an illusion. For example, when you edit a file with notepad, you are not interacting with a file; you are interacting with a process. Just press CTRL-ALT-DEL to open Windows Task Manager, click on the Processes tab and look for notepad.exe. The PID is the process ID for the notepad process you are interacting with. The CPU Time is the distance the notepad process has traveled along the time dimension of cyberspacetime measured in dedicated CPU cycles to the notepad process. Nothing ever interacts with "things" in cyberspacetime because the "things" have to first be read into the memory of a computer and placed under the control of a process. Only the processes interact with each other in cyberspacetime. Therefore files and databases on disk drives, tapes, and CDs are not part of the Software Universe; they are part of the physical universe. In the Software Universe, we have two processes called "read" and "write" which allow us to pass information into and out of cyberspacetime. The Software Universe began about 1.9 billion seconds (60 years) ago as a few bytes of machine code on a single computer, and has expanded and evolved into the complex Software Universe we see today, consisting of millions of terabytes of software residing on trillions of microprocessors. In softwarephysics, we use this unintentional simulation in reverse to help us understand the nature of software by observing how the physical universe operates.

Currently the physics community is hard at work trying to unify general relativity, the physics of large velocities and large masses, with quantum mechanics, the physics of tiny things. One of the promising approaches is called loop quantum gravity. Loop quantum gravity proposes that space and time are not continuous, but are quantized into tiny discrete chunks and that all of the particles and forces in the physical universe are really illusions. Loop quantum gravity postulates that all tangible things are merely a set of interacting processes that run with a clock speed of 1043 Hz. So the Software Universe seems to do a fair job of simulating a universe built on loop quantum gravity only with a clock speed of about 109 Hz.

Softwarephysics is the study of cyberspacetime. Softwarephysics begins by visualizing software as a virtual substance in the Software Universe. You then follow the normal steps a theoretical physicist would take when studying any new substance or phenomena. You gather some observations on how software seems to behave, develop a model that explains that behavior, and then try to test your model as best you can. Nearly everybody in IT has developed a model, or an intuitive feeling, for how software seems to behave in the universe. Softwarephysics is simply an explicit articulation of a physical model for software behavior. During the early 1980s, I developed a model that described software behavior in terms of a nonlinear system of quantum particles besieged by the second law of thermodynamics.

Physicists call the study of the macroscopic properties of matter thermodynamics. For example, a gas in a cylinder has certain macroscopic properties such as a volume, a pressure, and a temperature. Software can also be viewed macroscopically as the functions the software performs, the speed with which those functions are performed, and the stability and reliability of its performance. Physicists call the study of the microscopic properties of matter quantum mechanics. Quantum mechanics is what makes the transistors in your PC work and predicts that all matter exists as particles in discrete quantum states of energy and momentum. Quantum mechanics tells us that the universe is digital and not analog. Similarly, software can also be viewed microscopically as the interaction of a large number of bytes, each byte existing in one of 256 discrete quantum microstates. As with quantum mechanics, the macroscopic properties of software can be thought of as the sum total of the microscopic interactions of a large number of bytes in fixed quantum states. For a practical example, let us compare the college student’s favorite molecule, ethyl alcohol, with a line of code.

    H   H
    │   │
H-C-C-OH        for (i=0; i<10;i++)
    │   │
    H  H

Each atom in the molecule of ethyl alcohol has a set of electrons in fixed quantum states. Physicists express this concept in terms of a quantum wavefunction for each atom. Similarly, the line of code consists of a set of characters (atoms), with each character in one of 256 possible quantum ASCII states. The atoms in the ethyl alcohol molecule bond together into a molecule, which has its own quantum wavefunction. Large numbers of the resulting molecule interact with the "operating system" of the physical universe through the electromagnetic force to produce a clear liquid with a low boiling point that produces the desired effect when placed into the hands of a college student. After compilation, the properly formed line of code also interacts with an operating system to produce a desired effect.

Because of these close similarities to matter in the real world, software also is subject to the equivalent of a second law of thermodynamics - software naturally tends to a state of increased entropy, or disorder, and an attendant decrease in information content whenever software is worked on by programmers. The entropy of software has a tendency to increase because the number of incorrect versions of a piece of software vastly outnumber the correct versions of a piece of software. Going back to the above example, you can compare a methyl alcohol molecule to a line of code with a bug in it.

    H
    │
H-C-OH        for (i=0; i<0;i++)
    │
    H

(Note: the correct line of code will execute a loop of code 10 times, while the line of code with the bug in it will only execute the loop once because the programmer left out the "1" in the "10".)

The methyl alcohol molecule also produces a clear liquid with a low boiling point and the desired effect of inebriation, but it makes you go blind. Living things have to form complex organic molecules from simple atoms to perform the functions of life. In a similar fashion, programmers must assemble complex patterns of characters into lines of code to instruct a computer to perform useful functions. Both must deal with the second law of thermodynamics that demands the total entropy (disorder) of the universe must increase whenever a change is made. However, both living things and programmers are able to construct low entropy molecules and lines of code by dumping entropy into disordered kinetic energy, also known as heat. However, living things do this on a much grander scale and with much greater accuracy than any programmer.

Software also seems to exhibit nonlinear behavior because small changes to software, or the processing load that the software handles, can cause extreme fluctuations in the macroscopic behavior of the software. If you have ever driven on an expressway at rush hour, you have enjoyed the experience of dealing with a nonlinear system. Drivers watching a man change a tire 200 feet away across the median strip can cause a massive traffic jam on your side of the highway. Thirty minutes after the man with the flat has left the scene, the traffic jam still persists, and the luckless victims wonder what the root cause of the traffic jam was all about. Linear systems are like a highway with very little traffic. Small changes in the traffic density will cause small changes in the average traffic velocity when traffic is light. As the traffic density rises, there comes a tipping point, when the highway goes from linear to nonlinear behavior. After the tipping point, small changes to traffic density can cause drastic changes to throughput. The traffic becomes chaotic and lurches forward in spasms like beer going glug.. glug.. glug.. out of a bottle tipped over too far. Because nonlinear differential equations cannot generally be solved with calculus, scientists and engineers tended to entirely ignore nonlinear systems throughout history, until the 1970s, when computers appeared that could provide numerical solutions for nonlinear differential equations. Only then did scientists and engineers begin to realize the strange behavior of nonlinear systems through the invention of chaos theory.

Scientific Models
So what is the value in having a model for software behavior? The value is that a model allows you to make decisions about software, and it provides a direction for thought. Scientific models are analogies or approximations of reality that explain observations and make predictions. Most physicists gave up searching for true reality at the close of the 19th century, when they discovered that Newtonian mechanics and classical electrodynamics were only approximations of reality and did not work at very large velocities or for very small things like atoms. Instead, they adopted a worldview called positivism, in which physics only seeks out models of reality - not reality itself. All models are approximations - all models are "wrong". But knowing that you are "wrong" gives you a great advantage over people who know that they are "right", because knowing that you are "wrong" allows you to seek improved models of reality. However, having an approximate model that is "wrong" is better than no model at all. After all, Newtonian mechanics did allow us to put men on the Moon. So how do you put a model for software behavior to use? Here is an example. Softwarephysics suggests that you should not be too timid about bouncing software. On a conference call for a website outage, there is always a tension between bouncing software to restore functionality and searching for the root cause of the problem in an effort to prevent future occurrences. Softwarephysics predicts that, like the man with the flat tire, many times the root cause of a problem may be long gone even though its impact still remains.

All of the above leads us to the fundamental problem of software:

The Fundamental Problem of Software

1. The second law of thermodynamics tends to introduce small bugs into software that are never detected through testing.

2. Because software is inherently nonlinear, these small bugs cause general havoc when they reach production.

3. But even software that is absolutely bug-free can reach a critical tipping point and cross over from linear to nonlinear behavior.

So what can we do about the fundamental problem of software? Softwarephysics proposes that since the most complex systems in the physical universe that deal well with the second law of thermodynamics and nonlinearity are living things, we should take a biological approach to software. In fact, we already have.

The Software Universe is populated by living things. In my youth, we called these living things "computer systems", but today we call them "Applications". The Applications exist by exchanging information with each other, and sadly, are parasitized by viruses and worms and must also struggle with the second law of thermodynamics and nonlinearity. Since the beginning of the Software Universe, the architecture of the Applications has evolved through a process of innovation and natural selection that has followed a path very similar to the path followed by living things on Earth. I believe this has been due to what evolutionary biologists call convergence. For example, as Richard Dawkins has pointed out, the surface of the Earth is awash in a sea of visible photons, and the concept of the eye has independently evolved more than 40 times on Earth over the past 600 million years to take advantage of them. An excellent treatment of the significance that convergence has played in the evolutionary history of life on Earth, and possibly beyond, can be found in Life’s Solution (2003) by Simon Conway Morris. Programmers and living things both have to deal with the second law of thermodynamics and nonlinearity and there are only a few optimal solutions. Programmers try new development techniques and the successful techniques tend to survive and spread throughout the IT community, while the less successful techniques are slowly discarded. Over time, the population distribution of software techniques changes.

Unstructured Period (1945 – 1972)
During the Unstructured Period, programs were monolithic structures with lots of branch or GOTO statements and very little internal structure. These programs were similar to the early prokaryotic bacteria that appeared over 4,000 million years ago on Earth and lacked internal structure. Bacteria consist essentially of a polysaccharide cell wall filled with a lot of spaghetti code organic molecules. Just as bacteria still flourish today, many of these programs are still in production. At United Airlines, my former employer, there are still many spaghetti code FORTRAN II applications that were optimized in the late 1960s that are still in production.

Structured Period (1972 – 1992)
During the Structured Period, structured programming techniques were adopted by the IT community, and the GOTO statements were replaced by subroutines and indented code with lots of internal structure like the eukaryotic structure of modern cells that appeared about 1,500 million years ago. Eukaryotic cells are found in the bodies of all complex organisms from single-celled yeasts to you and me and divide up cell functions amongst a collection of organelles (subroutines) such as mitochondria, chloroplasts, Golgi bodies, and the endoplasmic reticulum.

Object-Oriented Period (1992 – Present)
During the Object-Oriented Period, programmers adopted a multicellular organization for software in which programs consisted of many instances of objects (cells) that were surrounded by membranes studded with exposed methods (membrane receptors). Multicellular organisms appeared about 900 million years ago and send messages between cells (objects) by secreting organic molecules that bind to the membrane receptors on other cells and induce those cells to execute exposed methods. For example, your body consists of about 100 trillion independently acting cells, and not a single cell in the collection knows that the other cells even exist. In an object-oriented manner, each cell just responds to the organic molecules that bind to its membrane receptors, and in turn, sends out its own set of chemical messages that bind to the membrane receptors of other cells in your body. When you wake to the sound of breaking glass in the middle of the night, your adrenal glands secrete the hormone adrenaline (epinephrine) into your bloodstream, which binds to the getScared() receptors on many of your cells. In an act of object-oriented polymorphism, your liver cells secrete glucose into your bloodstream and your heart cells constrict harder when their getScared() methods are called.

Distributed Objects Period
Currently we are entering the Distributed Objects Period, which is very similar to the Cambrian Explosion. During the Cambrian Explosion, 541 million years ago, complex body plans first evolved, which allowed cells in multicellular organisms to make RMI calls on the cells of remote organs to accomplish biological purposes. In the Distributed Objects Period, we are using common EJB components in J2EE appservers to create Applications with complex body plans. The J2EE appservers perform the functions of organs like kidneys, lungs and livers. I am discounting CORBA here as a failed precursor because CORBA never became ubiquitous as EJB seems to be heading. In the evolution of any form of self-replicating information, there frequently are many failed precursors. There is a growing body of evidence beginning to support the geological "Snowball Earth" hypothesis that the Earth went through a period of 100 million years of extreme climatic fluctuations just prior to the Cambrian Explosion. During this period, the Earth seesawed between being completely covered with a thick layer of ice and being a hot house with a mean temperature of 140 0F. It has been suggested that the resulting stress on the Earth's ecosystems sparked the Cambrian Explosion. In 1995, the IT community was also hit with a similar cataclysmic event called Java, which put extreme stress on the IT community and eventually sparked EJB.

It has taken the IT community nearly 60 years to develop a distributed objects architecture based upon multicellular organization. This was achieved through a slow evolutionary process via innovation and natural selection performed by millions of independently acting programmers. Granted, this occurred much faster than the three billion years nature took to come up with the same architecture, but we could have done this back in the 1960s if we had known better – after all, the object-oriented language Simula was developed in 1965. Softwarephysics proposes that we use concepts from biology to skip to solutions directly.

Applying biological concepts to software is not always obvious. The remainder of this posting is a rather long-winded case study from the 1980s that might get a bit tedious, but since it is not fully covered in SoftwarePhysics 101, I will tell the tale here. In 1985, while at Amoco, I figured that since the most complex nonlinear systems in the universe that dealt well with the second law of thermodynamics were living things, a biological or "bionic" approach to software development was the best approach to take. This did not happen all at once, as I shall explain later, but in 1985 I began developing an early mainframe IDE (Integrated Development Environment) called BSDE (Bionic Systems Development Environment) that developed software based upon concepts from molecular biology. During the 1980s BSDE, which was a direct application of softwarephysics, put several million lines of code into production at Amoco. BSDE generated a 10,000 line of code "embryo" for an application based upon the application’s "genes". The genes for an application were the DDL (Data Definition Language) statements used to create its DB2 relational database. (Note: In the 1980s IBM’s relational database was called SQL/DS on the VM/CMS operating system and DB2 on the MVS operating system – I will refer to them both simply as DB2 for the sake of clarity) Each embryo grew and differentiated into a unique individual application within BSDE by having a programmer turn on and off its set of unique genes to generate screens, reports, and SQL code. The first language was REXX and later BSDE generated PL/I and COBOL. Today these techniques are called wizards. The BSDE.txt file mentioned above describes the typical life cycle of an application developed in the 1980s using BSDE. It is best to view this file with a Fixedsys font so that the line printer graphics align properly.

Because BSDE generated applications using ISPF Dialog Manager screens and REXX, I was able to use BSDE to generate code for itself and have BSDE evolve through small incremental changes using concepts from evolutionary biology. The next generation of BSDE was grown inside of its maternal release. Over a period of seven years, from 1985 – 1992, more than 1,000 generations of BSDE were generated, and BSDE slowly evolved into a very sophisticated tool through small incremental changes.

The evolution of BSDE had an interesting issue. Like the origin of life on Earth, I had to deal with the bootstrap problem – how do you get started? BSDE began as a few simple ISPF edit macros running under ISPF edit. ISPF is the software tool that mainframe programmers still use today to interface to the IBM MVS and VM/CMS mainframe operating systems and contains an editor that can be greatly enhanced through the creation of edit macros written in REXX. I began BSDE by writing a handful of ISPF edit macros that could automate some of the editing tasks that a programmer needed to do when working on a program that used a DB2 database. These edit macros would read a Control File which contained the DDL statements to create the DB2 tables and indexes. The CREATE TABLE statements in the Control File were the equivalent of genes and the Control File itself performed the functions of a chromosome. For example, a programmer would retrieve a skeleton COBOL program, with the bare essentials for a COBOL/DB2 program, from a stock of reusable BSDE programs. The programmer would then position their cursor in the code to generate a DB2 SELECT statement and hit a PFKEY. The REXX edit macro would read the genes in the Control File and would display a screen listing all of the DB2 tables for the application. The programmer would then select the desired tables from the screen, and the REXX edit macro would then copy the selected genes to an array (mRNA). The mRNA array was then sent to a subroutine that inserted lines of code (tRNA) into the COBOL program. The REXX edit macro would also declare all of the SQL host variables in the DATA DIVISION of the COBOL program and would generate code to check the SQLCODE returned from DB2 for errors and take appropriate actions. A similar REXX ISPF edit macro was used to generate screens. These edit macros were also able to handle PL/I and REXX/SQL programs. They could have been altered to generate the syntax for any programming language such as C, C++, or Java. As time progressed, BSDE took on more and more functionality via ISPF edit macros. Finally, there came a point where BSDE took over and ISPF began to run under BSDE. This event was very similar to the emergence of the eukaryotic architecture for cellular organisms. BSDE consumed ISPF like the first eukaryotic cells that consumed prokaryotic bacteria and used them as mitochondria and chloroplasts. With continued small incremental changes, BSDE continued to evolve.

I noticed that I kept writing the same kinds of DB2 applications, with the same basic body plan, over and over. From embryology, I got the idea of using BSDE to read the Control File for an application and to generate an "embryo" for the application based upon its unique set of genes. The embryo would perform all of the things I routinely programmed over and over for a new application. Once the embryo was generated for a new application from its Control File, the programmer would then interactively "grow" code and screens for the application. With time, each embryo differentiated into a unique individual application until the fully matured application was delivered into production by BSDE. At this point, I realized that I could use BSDE to generate code for itself, and that is when I started using BSDE to generate the next generation of BSDE. This technique really sped up the evolution of BSDE because I had a positive feedback loop going. The more powerful BSDE became, the faster I could add improvements to the next generation of BSDE through the accumulated functionality inherited from previous generations.

Embryos were grown within BSDE using an ISPF split screen mode. The programmer would start up a BSDE session and run Option 4 – Interactive Systems Development from the BSDE Master Menu. This option would look for an embryo, and if it did not find one, would offer to generate an embryo for the programmer. Once an embryo was implanted, the option would turn the embryo on and the embryo would run inside of the BSDE session with whatever functionality it currently had. The programmer would then split his screen with PF2 and another BSDE session would appear in the lower half of his terminal. The programmer could easily toggle control back and forth between the upper and lower sessions with PF9. The lower session of BSDE was used to generate code and screens for the embryo on the fly while the embryo in the upper BSDE session was fully alive and functional. This was possible because BSDE generated applications that used ISPF Dialog Manager for screen navigation, which was an interpretive environment, so compiles were not required for screen changes. If your logic was coded in REXX, you did not have to do compiles for logic changes either, because REXX was an interpretive language. If PL/I or COBOL were used for logic, BSDE had facilities to easily compile code for individual programs after a coding change, and ISPF Dialog Manger would simply load the new program executable when that part of the embryo was exercised. These techniques provided a tight feedback loop so that programmers could immediately see the effects of a change as the embryo grew and differentiated.

This was still the early days of relational databases in IT, and I naively bought into the prevalent idea of the day that in the future we would be storing all the data for a corporation on a large number of shared DB2 tables, rather than having thousands of individual applications that all used their own stand-alone set of files. So I came up with the idea of the "corporate gene pool". I used BSDE to generate code for a new generation of BSDE that could read the DB2 System Catalog tables by splicing in the genes for the DB2 System Catalog tables into BSDE’s own Control File. This allowed BSDE to retrieve the genes for any existing DB2 application from the DB2 System Catalog. Now BSDE could create an embryo for a new application that was an amalgam of genes from existing applications with the addition of some new genes for the new DB2 tables required by the new application, and I actually started to generate applications that shared data, rather than relying on inter-application communication via extract files. Sadly, even today, we still usually write applications that "own" their own set of stand-alone relational tables.

BSDE was originally developed for my own use, but fellow programmers soon took note, and over time, an underground movement of BSDE programmers developed at Amoco. Encouraged by my supervisor, I began marketing BSDE and softwarephysics within Amoco. Being a rather conservative oil company, this was a hard sell, and I am not much of a salesman. By now it was late 1986, and I was calling their programmers "software engineers" and was pushing the unconventional ideas of applying physics and biology to software. Fortunately for me, all of Amoco's income came from a direct application of geology, physics, or chemistry, so many of our business partners were geologists, geophysicists, chemists, or chemical, petroleum, electrical, or industrial engineers, and that was a big help. Computer viruses began to appear about this time, and they provided some independent corroboration for the idea of applying biological concepts to software, and in 1987 we also started to hear some things about artificial life from Chris Langton out of Los Alamos, but there still was a lot of resistance to the idea back at Amoco. One manager insisted that I call genes "templates". I used the term "bionic" in the product name after I checked in the dictionary and found that bionic meant applying concepts from biology to solve engineering problems, and that was exactly what I was trying to achieve. There were also a couple of American television programs in the 1970s that introduced the term to the American public, The Six Million Dollar Man and The Bionic Woman, which featured superhuman characters performing astounding feats. In my BSDE road shows, I would demo BSDE in action spewing out perfect code in real time and about 20 times faster than normal human programmers could achieve, so I thought the term "bionic" was fitting. I am still not sure that using the term "bionic" was the right thing to do. IT was not "cool" in the 1980s, as it is in today’s ubiquitous Internet Age, and was dominated by very serious and conservative people with backgrounds primarily in accounting. However, BSDE was pretty successful and by 1989, BSDE was finally recognized by Amoco IT management and was made available to all Amoco programmers. A BSDE class was developed and about 20 programmers became active BSDE programmers. A BSDE COI (Community of Interest) was formed, and I used to send out weekly email newsletters about applying physics and biology to software to the COI members. In October 1991, an article on BSDE appeared as the cover story of the Enterprise Systems Journal.

Figure 1 – BSDE appeared as the cover story of the October 1991 issue of the Enterprise Systems Journal

During the period of 1985-1989, BSDE only ran under the IBM VM/CMS operating system. The VM/CMS operating system was an interactive operating system that IBM developed in the 1970s, which was somewhat similar to the Unix operating system being developed at Bell Labs at the same time. The other major IBM operating system was MVS, a batch operating system that was the direct descendant of the original 1965 IBM OS/360 operating system. In the 1970s, IBM added a time-sharing option to MVS called TSO. Now it just happened that MVS/TSO had all the software infrastructure components I needed for an MVS/TSO version of BSDE, but there were subtle syntax differences between the two operating systems. I realized that with some gradual incremental changes to the VM/CMS version of BSDE, I could evolve an MVS/TSO version. So I took the code for the VM/CMS version of BSDE and copied it over to MVS/TSO in a more or less brute force manner. This was very much like the first lungfish to crawl up onto the land. BSDE VM/CMS did not run very well under MVS/TSO because of the syntax differences, but it did just barely survive the transition. At this point the two versions of BSDE began to diverge. I kept evolving BSDE MVS/TSO to correct the syntax errors as they arose during use, and slowly BSDE MVS/TSO adapted to the new operating system. At the same time, I began to incrementally modify BSDE VM/CMS so that programmers could develop applications on VM/CMS and then port them to MVS/TSO. The computer charge rate for VM/CMS was substantially less than for MVS/TSO, so it was much cheaper to grow an embryo within BSDE VM/CMS and then port the nearly fully grown larval stage embryo to MVS/TSO for the final delivery into production. BSDE MVS/TSO could then be used to maintain the fully-grown adult MVS/TSO application. So BSDE generated MVS/TSO applications took on a two-stage life cycle. The bulk of their development took place within BSDE VM/CMS as a larval stage embryo in an environment where computer charges were cheap and the living was easy. The adult stage application fluttered about on MVS/TSO fulfilling its purpose in life.

Because all BSDE generated applications had the same biochemistry down at the coding level, the effort to maintain BSDE generated applications was much less than for normal applications. Maintenance is a big problem in IT because 80% of your budget goes to supporting old applications that may be 10 years old and which are littered with the historical programming styles of dozens of programmers who have long since gone. Once a programmer became familiar with one BSDE generated application, it was very easy for the programmer to support other BSDE applications for the same reason that we do not have hundreds of different kinds of veterinarians taking care of domestic animals.

I wish that I could claim that I was smart enough to have sat down and thought up all of this stuff from first principles, but that is not what happened. It all just happened through small incremental changes over a very long period of time and most of the design work was done subconsciously, if at all. Even the initial BSDE ISPF edit macros happened through serendipity. When I first started programming DB2 applications, I found myself copying in the DDL CREATE TABLE statements from the file I used to create the DB2 database into the program that I was working on. This file, with the CREATE TABLE statements, later became the Control File used by BSDE to store the genes for an application. I would then go through a series of editing steps on the copied in data to transform it from a CREATE TABLE statement into a DB2 SELECT, INSERT, UPDATE, or DELETE statement. I would do the same thing all over again to declare the host variables for the program. Being a lazy programmer, I realized that there was really no thinking involved in these editing steps and that an ISPF edit macro could do the job equally as well, only very quickly and without error, so I went ahead and wrote a couple of ISPF edit macros to automate the process. I still remember the moment when it hit me. For me it was very much like the scene in 2001 - A Space Odyssey, when the man-ape picks up the wildebeest thighbone and starts to pound the ground with it. My ISPF edit macros were doing the same thing that happens when the information in a DNA gene is transcribed into a protein! A flood of biological ideas poured into my head over the next few days, because at last I had a solution for my pent-up ideas about nonlinear systems and the second law of thermodynamics that were making my life so difficult as a commercial software developer. We needed to "grow" code – not write code!

So that is how BSDE got started by accident. Since I did not have any budget for BSDE development, and we charged out all of our time at Amoco to various projects, I was forced to develop BSDE through small incremental enhancements that always made my job on the billable projects a little easier. I new that was how evolution worked, so I was not too concerned. In retrospect, this was a fortunate thing. In the 1980s, the accepted theory of the day was that you needed to prepare a very thick user requirements document for a new application and then you would code from the requirements as a blueprint. This was before the days of prototyping and development through small incremental releases that you find today. In the mid 1980s, prototyping was still a heretical proposition in IT. But I don’t think that I could have built BSDE using the traditional blueprint methodology of the day.

Unfortunately, the early 1990s saw the downfall of BSDE. The distributed computing model hit with full force, and instead of deploying applications on mainframe computers, we began to distribute applications across a network of servers and client PCs. Since BSDE generated applications for mainframe computers, it could not compete, and BSDE quickly went extinct in the minds of Amoco IT management. I was left with just a theory and no tangible product, and it became much harder to sell softwarephysics at that point. So after a decade of being considered a little "strange", I decided to give it a rest. But I made a promise to myself. I would let 25 years go by, until the year 2004, and if I did not see softwarephysics appear elsewhere in the IT community, I would give it another shot.

Anyway, over the years, I have found softwarephysics to be very useful and also a lot of fun, and maybe you might too.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston