Wednesday, November 14, 2007

The Demon of Software

Last time we explored the macroscopic properties of software behavior using the thermodynamics of steam engines as a guide. We saw that in thermodynamics there is a mysterious fluid at work called entropy which measures how run down a system is. You can think of entropy as depreciation; it always increases with time and never decreases. As I mentioned previously, the second law of thermodynamics can be expressed in many forms. Another statement goes as:

dS/dt > 0

This is a simple differential equation that states that the entropy S of an isolated system always increases with time for irreversible processes. All of the other effective theories of physics can also be expressed as differential equations with time as a factor. However, all of these other differential equations have a "=" sign in them, meaning that the interactions can be run equally forwards or backwards in time. For example, a movie of two billiard balls colliding can be run forwards or backwards in time, and you cannot tell which is which. Such a process is called a reversible process because it can be run backwards to return the Universe to its original state, like backing out a bad software install for a website. But if you watch a movie of a cup falling off a table and breaking into a million pieces, you can easily tell which is which. For all the other effective theories of physics, a broken cup can spontaneously jump back onto a table and reassemble itself into a whole cup if you give the broken shards the proper energy kick. This would clearly be a violation of the second law of thermodynamics because the entropy of the cup fragments would spontaneously decrease. The second law of thermodynamics is the only effective theory in physics that has a ">" sign in it, which many physicists consider to be the arrow of time. With the second law of thermodynamics, you can easily tell the difference between the past and the future because the future will always contain more entropy or disorder.

Energy too was seen to be subject to the second law of thermodynamics and subject to entropy increases as well. We saw that a steam engine can convert the energy in high temperature steam into useful mechanical energy by dumping a portion of the steam energy into a reservoir at a lower temperature. Because not all of the energy in steam can be converted into useful mechanical energy, steam engines, like all other heat engines, can never be 100% efficient. We also saw that software too tends to increase in entropy whenever maintenance is performed upon it. Software tends to depreciate by accumulating bugs.

The Second Law of Thermodynamics
The second law does not mean that we can never create anything of value; it just puts some severe limits on the process. Whenever we create something of value, like a piece of software or pig iron, the entropy of the entire Universe must always increase. For example, a piece of pig iron can be obtained by heating iron ore with coke in a blast furnace, but if you add up the entropy decrease of the pig iron with the entropy increase of degrading the chemical energy of the coke into heat energy, you will find that the net amount of entropy in the Universe has increased. The same goes with software. It is possible to write perfect bug-free software by degrading chemical energy in a programmer’s brain into heat and allowing the heat to cool off to room temperature. A programmer on a 2400 calorie diet (2400 kcal/day) produces about 100 watts of heat sitting at her desk and about 20 – 30 watts of that heat comes from her brain. So the next time your peers comment that you are as dim-witted as a 40 watt light bulb after a code review, please be sure to take that as a compliment!

Kinetic Theory of Gasses and Statistical Mechanics
This time we will drill down deeper and use another couple of effective theories from physics – the kinetic theory of gasses and statistical mechanics. We will see that both of these effective theories can be related to software source code at the line of code level. Recall that in 1738 Bernoulli proposed that gasses are really composed of a very large number of molecules bouncing around in all directions. Gas pressure in a cylinder was simply the result of a huge number of molecular impacts from individual gas molecules striking the walls of a cylinder, and heat was just a measure of the kinetic energy of the molecules bouncing around in the cylinder. Bernoulli’s kinetic theory of gasses was not well received by the physicists of the day because many physicists in the 18th and 19th centuries did not believe in atoms or molecules. The idea of the atom goes back to the Ancient Greeks, Leucippus and his student Democritus, about 450 B.C. and was formalized by John Dalton in 1803, when he showed that in chemical reactions the relative weights of elemental chemical reactants was always the same and proportional to integer multiples. But many physicists had problems with thinking of matter being composed of a “void” filled with “atoms” because that meant they had to worry about the forces that kept the atoms together and that repelled atoms apart when objects “touched”. To avoid this issue, many physicists simply considered “atoms” to be purely a mathematical trick used by chemists to do chemistry. This held sway until 1897 when J. J. Thompson successfully isolated electrons in a cathode ray beam by deflecting them with electric and magnetic fields in a vacuum. It seems that physicists don’t have a high level of confidence in things until they can bust them up into smaller things.

Now imagine a container consisting of two compartments. We fill the left compartment with pure oxygen gas molecules (white dots) and the right compartment with pure nitrogen gas molecules (black) dots.

Figure 1

Next we perforate the divider between the compartments and allow the molecules of oxygen and nitrogen to mingle.

Figure 2

After a period of time we will find that the two compartments now contain a gas that is a uniform mixture of oxygen and nitrogen molecules.

Figure 3

This is an example of a spatial entropy increase. The reverse process, that of a mixture of oxygen and nitrogen spontaneously separating into one compartment of pure oxygen and another compartment of pure nitrogen is never observed to occur. Such a process would be a violation of the second law of thermodynamics.

In 1859, physicist James Clerk Maxwell took Bernoulli’s idea one step further. He combined Bernoulli’s idea of a gas being composed of a large number of molecules with the new mathematics of statistics. Maxwell reasoned that the molecules in a gas would not all have the same velocities. Instead, there would be a distribution of velocities; some molecules would move very quickly while others would move more slowly, with most molecules having a velocity around some average velocity. Now imagine that the two preceding compartments (see Figure 1) are filled with nitrogen gas, but that this time the left compartment is filled with cold slow moving nitrogen molecules (white dots), while the right compartment is filled with hot fast moving nitrogen molecules (black dots). If we again perforate the partition between compartments, as in Figure 2 above, we will observe that the fast moving hot molecules on the right will mix with and collide with the slow moving cold molecules on the left and will give up kinetic energy to the slow moving molecules. Eventually both containers will be found to be at the same temperature (see Figure 3), but we will always find some molecules moving faster than the average (black dots), and some molecules moving slower than the average (white dots) just as Maxwell had determined. This is called a state of thermal equilibrium and demonstrates a thermal entropy increase. Just as with the previous example, we never observe a gas in thermal equilibrium suddenly dividing itself into hot and cold segments (the gas can go from Figure 2 to Figure 3 but never the reverse). Such a process would also be a violation of the second law of thermodynamics.

In 1867, Maxwell proposed a paradox along these lines known as Maxwell’s Demon. Imagine that we place a small demon at the opening between the two compartments and install a small trap door at this location. We instruct the demon to open the trap door whenever he sees a fast moving molecule in the left compartment approach the opening to allow the fast moving molecule to enter the right compartment. Similarly, when he sees a slow moving molecule from the right compartment approach the opening, he opens the trap door to allow the low temperature molecule to enter the left compartment. After some period of time, we will find that all of the fast moving high temperature molecules are in the right compartment and all of the slow moving low temperature molecules are in the left compartment. Thus the left compartment will become colder and the right compartment will become hotter in violation of the second law of thermodynamics (the gas would go from Figure3 to Figure 2 above). With the aid of such a demon, we could run a heat engine between the two compartments to extract mechanical energy from the right compartment containing the hot gas as we dumped heat into the colder left compartment. This really bothered Maxwell, and he never found a satisfactory solution to his paradox. This paradox also did not help 19th century physicists become more comfortable with the idea of atoms and molecules.

Beginning in 1866, Ludwig Boltzmann began work to extend Maxwell’s statistical approach. Boltzmann’s goal was to be able to explain all the macroscopic thermodynamic properties of bulk matter in terms of the statistical analysis of microstates. Boltzmann proposed that the molecules in a gas occupied a very large number of possible energy states called microstates, and for any particular energy level of a gas there were a huge number of possible microstates producing the same macroscopic energy. The probability that the gas was in any one particular microstate was assumed to be the same for all microstates. In 1872, Boltzmann was able to relate the thermodynamic concept of entropy to the number of these microstates with the formula:

S = k ln(N)

S = entropy
N = number of microstates
k = Boltzmann’s constant

These ideas laid the foundations of statistical mechanics.

The Physics of Poker
Boltzmann’s logic might be a little hard to follow, so let’s use an example to provide some insight by delving into the physics of poker. For this example, we will bend the formal rules of poker a bit. In this version of poker, you are dealt 5 cards as usual. The normal rank of the poker hands still holds and is listed below. However, in this version of poker all hands of a similar rank are considered to be equal. Thus a full house consisting of a Q-Q-Q-9-9 is considered to be equal to a full house consisting of a 6-6-6-2-2 and both hands beat any flush. We will think of the rank of a poker hand as a macrostate. For example, we might be dealt 5 cards, J-J-J-3-6, and end up with the macrostate of three of a kind. The particular J-J-J-3-6 that we hold, including the suit of each card, would be considered a microstate. Thus for any particular rank of hand or macrostate, such as three of a kind, we would find a number of microstates. For example, for the macrostate of three of a kind there are 54,912 possible microstates or hands that constitute the macrostate of three of a kind.

Rank of Poker Hands
Royal Flush - A-K-Q-J-10 all the same suit

Straight Flush - All five cards are of the same suit and in sequence

Four of a Kind - Such as 7-7-7-7

Full House - Three cards of one rank and two cards of another such as K-K-K-4-4

Flush - Five cards of the same suit, but not in sequence

Straight - Five cards in sequence, but not same suit

Three of a Kind - Such as 5-5-5-7-3

Two Pair - Such as Q-Q-7-7-4

One Pair - Such as Q-Q-3-J-10

Next we create a table using Boltzmann’s equation to calculate the entropy of each hand. For this example, we set Boltzmann’s constant k = 1, since k is just a “fudge factor” used to get the units of entropy using Boltzmann’s equation to come out to those used by the thermodynamic formulas of entropy.

Thus for three of a kind where N = 54,912 possible microstates or hands:

S = ln(N)
S = ln(54,912) = 10.9134872

HandNumber of Microstates NProbabilityEntropy = LN(N)Information Change = Initial Entropy - Final Entropy
Royal Flush 4 1.54 x 10-06 1.3862944 13.3843291
Straight Flush 40 1.50 x 10-053.6888795 11.0817440
Four of a Kind 624 2.40 x 10-04 6.4361504 8.3344731
Full House 3,744 1.44 x 10-038.2279098 6.5427136
Flush 5,108 2.00 x 10-038.5385632 6.2320602
Straight 10,200 3.90x 10-039.2301430 5.5404805
Three of a Kind 54,912 2.11 x 10-0210.9134872 3.8571363
Two Pairs 123,552 4.75 x 10-0211.7244174 3.0462061
Pair 1,098,240 4.23 x 10-0113.9092195 0.8614040
High Card 1,302,540 5.01 x 10-01 14.0798268 0.6907967
Total Hands 2,598,964 1.0014.7706235 0.0000000

Examine the above table. Note that higher ranked hands have more order, less entropy, and are less probable than the lower ranked hands. For example, a straight flush with all cards the same color, same suit, and in numerical order has an entropy = 3.6889, while a pair with two cards of the same value has an entropy = 13.909. A hand that is a straight flush appears more orderly than a hand that contains only a pair and is certainly less probable. A pair is more probable than a straight flush because there are more microstates that produce the macrostate of a pair (1,098,240) than there are microstates that produce the macrostate of a straight flush (40). In general, probable things have lots of entropy and disorder, while improbable things, like perfectly bug-free software, have little entropy or disorder. In thermodynamics, entropy is a measure of the depreciation of a macroscopic system like how well mixed two gases are, while in statistical mechanics entropy is a measure of the microscopic disorder of a system, like the microscopic mixing of gas molecules. A pure container of oxygen gas will mix with a pure container of nitrogen gas because there are more arrangements or microstates for the mixture of the oxygen and nitrogen molecules than there are arrangements or microstates for one container of pure oxygen and the other of pure nitrogen molecules. In statistical mechanics, a neat room tends to degenerate into a messy room and increase in entropy because there are more ways to mess up a room than there are ways to tidy it up.

In statistical mechanics, the second law of thermodynamics results because systems with lots of entropy and disorder are more probable than systems with little entropy or disorder, so entropy naturally tends to increase with time.

How the Demon Unleashed the Atomic Bomb
For nearly 100 years physicists struggled with Maxwell’s Demon to no avail. Our next stop will be to see how the concept of information in physics helped to solve the problem, and along the way to finding a solution to Maxwell's Demon, physicists dramatically changed the history of the world. But before we go on, as an IT professional you deal with information all day long, but have you ever really stopped to think what information is? As a start let’s begin with a very simplistic definition.

Information – Something you know

and then see how trying to figure out what information really is accidentally led to the development of the atomic bomb.

Like many of today’s college graduates, Albert Einstein could not initially find a job in physics after graduating from college, so he went to work as a clerk in the Swiss Patent Office from 1902 – 1908, with the hope of one day obtaining a position as a professor of physics at a university. In 1905, he published four very significant papers in the Annalen der Physik, one of the most prestigious physics journals of the time, on the photoelectric effect, Brownian motion, special relativity, and the equivalence of matter and energy. At the time, Einstein rightly figured that these papers would be his ticket out of the patent office, but to his dismay, the 1905 papers were nearly totally ignored by the physics community. This changed when Max Planck, one the most influential physicists of the time, took interest in Einstein’s work. With Max Planck’s favorable remarks, Einstein was invited to lecture at international meetings and rapidly rose in academia. Beginning in 1908, he embarked upon a series of positions at increasingly prestigious institutions, including the University of Zürich, the University of Prague, the Swiss Federal Institute of Technology, and finally the University of Berlin, where he served as director of the Kaiser Wilhelm Institute for Physics from 1913 to 1933.

At the University of Berlin, Einstein taught a young Hungarian student named Leo Szilárd. In 1922 Leo Szilárd earned his Ph.D. in physics from the University of Berlin with a thesis on thermodynamics. Now Leo Szilárd had a great knack for applying physics to practical problems and coming up with inventions. In 1926 Szilárd decided to go into the refrigerator business by coming up with an improved design. The problem with the home refrigerators of the day was that they did not use the compression and expansion of inert gasses like Freon to do the cooling. Instead, they used very poisonous and noxious gasses like ammonia for that purpose, and the seal where the spinning electric motor axle entered the compressor could leak those poisonous gasses into a kitchen. Szilárd had read a newspaper story about a Berlin family who had been killed when the seal in their refrigerator broke and leaked poisonous fumes into their home. Szilárd figured that a refrigerator without moving parts would eliminate the potential for seal failure because there would be no compressor motor at all, and he began to explore practical applications for different refrigeration cycles with no moving parts. But how do you launch such an enterprise? Szilárd figured what could be better than to enlist the support of the most famous physicist in the world, who just also happened to have been a patent clerk too, namely Albert Einstein. From 1926 - 1933 Einstein and Szilárd collaborated on ways to improve home refrigeration technology. The two were eventually granted 45 patents in their names for three different models. The way these refrigerators worked was that you simply heated one side of the device with a flame, electric coil, or even concentrated sunlight, and the other side of the device got cold. There were no moving parts so there was no need for a compressor motor with a possibly leaky seal. You can actually buy such refrigerators today. Just search for "natural gas refrigerators" on the Internet. They are primarily used for RVs that need to keep food cold when no electricity is available.

Figure 5 – Diagram from Szilárd’s Dec 16, 1927 refrigerator with no moving parts. Notice that Szilárd wisely gave Einstein top billing.

In 1927 the Electrolux Vacuum Cleaner Company ( bought one of their patents, freeing up Szilard to take up the new branch of physics known as nuclear physics. Szilárd became an instructor and researcher at the University of Berlin. There he published a paper, On the Decrease of Entropy in a Thermodynamic System by the Intervention of Intelligent Beings in 1929.

Figure 6 – In Szilard published a paper in which he explained that the process of the Demon knowing which side of a cylinder a molecule was in must produce some additional entropy to preserve the second law of thermodynamics.

In Szilárd's 1929 paper, he proposed that using Maxwell’s Demon, you could indeed build of a 100% efficient steam engine in conflict with the second law of thermodynamics. Imagine a cylinder with just one water molecule bouncing around in it (Figure 6a). First, the Demon figures out if the water molecule is in the left half or the right half of the cylinder. If he sees the water molecule in the right half of the cylinder (Figure 6b), he quickly installs a piston connected to a weight via a chord and pulley. As the water molecule bounces off the piston (Figure 6c), and moves the piston to the left, it slowly raises the weight and does some useful work upon it. In the process of moving the piston to the left, the water molecule must lose kinetic energy in keeping with the first law of thermodynamics and slow down to a lower velocity and temperature than the atoms in the surrounding walls of the cylinder. When the piston has finally reached the far left end of the cylinder it is removed from the cylinder in preparation for the next cycle of the engine. The single water molecule then bounces around off the walls of the cylinder (Figure 6a), and in the process picks up additional kinetic energy from the jiggling atoms in the walls of the cylinder as they kick the water molecule back into the cylinder each time it bounces off the cylinder walls. Eventually, the single water molecule will once again be in thermal equilibrium with the jiggling atoms in the walls of the cylinder and will be on average traveling at the same velocity it originally had before it pushed the piston to the left. So this proposed engine takes the ambient high-entropy thermal energy of the cylinder’s surroundings and converts it into useful the low-entropy potential energy of a lifted weight. Notice that the first law of thermodynamics is preserved. The engine does not create energy; it simply converts the high-entropy thermal energy of the random motions of the atoms in the cylinder walls into useful low-entropy potential energy, but that does violate the second law of thermodynamics. Szilárd's solution to this paradox was simple. He proposed that the process of the Demon figuring out if the water molecule was in the left hand side of the cylinder or the right hand side of the cylinder must cause the entropy of the Universe to increase. So “knowing” which side of the cylinder the water molecule was in must come with a price; it must cause the entropy of the Universe to increase. Recall that our first attempt at defining information was simply to say that information was “something you know”. More aptly, we should have said that useful information was simply “something you know”.

On July 4, 1934 Leo Szilárd filed the first patent application for the method of producing a nuclear chain reaction that could produce a nuclear explosion. This patent included a description of "neutron induced chain reactions to create explosions", and the concept of critical mass. In 1938 Otto Hahn, Lise Meitner and Otto Frisch in Germany succeeded in demonstrating that U-235 could indeed fission and possibly form the basis for such a chain reaction. At the time of this discovery, Szilárd and Enrico Fermi were both at Columbia University, and they conducted a very simple experiment that showed significant neutron multiplication when uranium fissioned, proving that the chain reaction was indeed possible and could form the basis for nuclear weapons. Based upon this experiment and the fear that Nazi Germany would soon be embarking on a program to develop an atomic bomb, Leo Szilárd drafted a letter to President Franklin D. Roosevelt for his old business partner and fellow refrigerator designer, Albert Einstein, to sign. The letter, dated August 2, 1939, explained that nuclear weapons were indeed possible and warned of similar Nazi work on such weapons and recommended that the United States immediately begin a development program of its own. This famous letter resulted in the creation of the Manhattan Project which led to the atomic bombs that ended World War II six years later, almost to the day, with the surrender of Japan on August 15, 1945.

Figure 7 – Leo Szilárd’s letter to President Franklin D. Roosevelt signed by Albert Einstein on August 2, 1939.

Information and the Solution to Maxwell’s Demon
Finally in 1950 Leon Brillouin used quantum mechanics to show that Maxwell’s Demon required information to tell if a molecule was moving slowly or quickly. Brillouin defined information as negative entropy and found that information about the velocities of the oncoming molecules could only be obtained by the demon by bouncing photons off the moving molecules. Bouncing photons off the molecules increased the total entropy of the entire system whenever the demon determined if a molecule was moving slowly or quickly. So Maxwell's Demon was really not a paradox after all, since even the Demon could not violate the second law of thermodynamics.

For Brillouin entropy was a lack of information about a system; a measure of ignorance. Brillouin proposed that information is the elimination of microstates that a system can be found to exist in. From the above analysis, information is then the difference between the initial and final entropies of a system after a determination about the system has been made.

Information = Si - Sf
Si = initial entropy
Sf = final entropy

Going back to our poker example, let’s compute the amount of information you convey when you tell your opponent what hand you hold. When you tell your opponent that you have a straight flush, you eliminate more microstates than when you tell him that you have a pair, so telling him that you have a straight flush conveys more information than telling him you hold a pair. For example, there are a total of 2,598,964 possible poker hands or microstates for a 5 card hand, but only 40 hands or microstates constitute the macrostate of a straight flush.

Strait Flush Information = Si – Sf = ln(2,598,964) – ln(40) = 11.082

For a pair we get:

Pair Information = Si – Sf = ln(2,598,964) – ln(1,098,240) = 0.8614040

When you tell your opponent that you have a straight flush you deliver 11.082 units of information, while when you tell him that you have a pair you only deliver 0.8614040 units of information. Clearly when your opponent knows that you have a straight flush, he knows more about your hand than if you tell him that you have a pair.

Application to Software
Software (source code, config files, lookup tables, etc.) exists as a set of bytes. Each byte can be in one of 256 microstates if we allow for all possible ASCII states of a byte. Some programmers might object that we do not use all 256 possible ASCII characters in programs, but I am going for the most general case here. Since there are hundreds of programming languages, all using a variety of different character sets, the approximation of using all 256 possible ASCII characters is not too bad. For example, there actually is a programming language called whitespace that only uses non-displayed characters such as spaces, tabs and newlines for its source code character set. Such programs appear as blank whitespace in a normal editor, adding an extra layer of source code security. More on whitespace is available at:

Now consider a program that is M bytes long. M bytes of software can be in N = 256M microstates, which will be a very large number for any M of appreciable size. That means there are 256M versions of a program that is M bytes long. Consider a medium size program of 30,000 bytes. The number of versions or microstates of a 30,000 byte program are:

N = 25630,000 = 158 x 10 72,245

That is a 158 with 72,245 zeroes behind it! Unfortunately, nearly all of these potential programs will just be a meaningless jumble of characters. In this huge mix, you will also find car ads, the wedding invitations for every couple ever married in the western world, and the Gettysburg Address. If we narrow our search down in the mix to just the true programs, we will find all 30,000 byte programs that ever have been written, or ever will be written, in every programming language that ever has been or ever will be devised using the ASCII character set. Your job as a programmer is to find one of the small set of 30,000 byte programs in the mix that performs the task you desire with a decent response time.

Later we will explore the biological aspects of sofwarephysics, but to skip ahead and borrow a biological concept now, we can consider the 158 x 1072,245 possible versions of a 30,000 byte program as the DNA Landscape of the program. DNA stores information in 4 nucleotide or base pair sequences abbreviated as A, C, T, and G. Each base pair in a living thing can be considered a biological bit and can exist in one of the four states A, C, T, or G. Note that living things use base 4 arithmetic as opposed to the binary or base 2 arithmetic common to all of today's computing devices. Today's computers use bits that can be in one of two states "1" or "0". A simple bacterium, like E. coli, contains about 4 million base pairs. Thus the DNA Landscape of E. coli is the set of all DNA sequences that can be made with 4 million base pairs.

N = 44,000,000
N = 1.0 x 10 2,408,240

which is a 1 with 2,408,240 zeroes behind it. Just as with the Landscape of our 30,000 byte program, nearly all of these DNA sequences lead to a dead bacterium that does not work.

The entropy of a piece of software can be measured using the same simplified Boltzmann’s equation with k = 1 that we used for poker hands.

S = ln(N)
N = Number of microstates

The entropy of all possible programs of length M bytes is:

S = ln(256M) = M ln(256) = 5.5452 M

For example, the entropy of all possible 30,000 byte programs comes to

S = 5.5452 M = (5.5452) (30,000) = 166,356

Now of the 158 x 1072,245 possible 30,000 byte programs, we know that only an infinitesimal number of them will provide the desired functionality with a decent response time. Since 158 x 1072,245 is such a huge number, we might as well make the simplifying approximation that there is only one of the possible programs that does the job. Of course this is not true, but even if there were a billion billion suitable programs in the mix they would pale to insignificance relative to 158 x 1072,245.

So if we assume that there is only one correct version of the program, we can calculate its information content if we remember that the ln(1) = 0:

Information of program = Si - Sf = ln(N) - ln(1) = ln(N)

Information of program = ln(N) = ln(256M) = M ln(256)
Information of program = 5.5452 M

Since the number of correct versions of a program is always much less than the number of possible versions, our approximation that there is only one correct version of a program is not too bad. For configuration files, frequently there only is one correct version of a file. Notice in the above formula that the information content of a piece of software is proportional to the number of bytes M in the file. This prediction makes intuitive sense.

For our 30,000 byte program the information content comes to:

Information in 30,000 bytes = ln(N) = ln(25630,000)
= 30,000 ln(256) = (5.5452) (30,000)

Information = 166,356

Thus a 30,000 byte program contains 166,356 units of information, which is a huge amount of information when you compare it to the information in a straight flush that weighs in with only 11.082 units of information. It turns out that a mere 2 bytes of correct software conveys 11.090 units of information, which is about that of a straight flush. That means the odds of getting two bytes of software correct by sheer chance are about the same as drawing a straight flush in poker! This is why programming is so difficult.

When you write a program or other piece of software, you are creating information by eliminating microstates that a file could exist in. Essentially you eliminate all of the programs that do not work, like a sculptor who creates a statue from a block of marble by removing the marble that is not part of the statue. The number of possible microstates of a file that constitute the macrostate of being a "correct" version of a program is quite small relative to the vast number of "buggy" versions or microstates that do not work, and consequently has a very low level of entropy and correspondingly a very large amount of information. According to the second law of thermodynamics, software will naturally seek a state of maximum disorder (entropy) whenever you work on software because there are many more “buggy” versions of a file than there are “correct” versions of a file. Large chunks of software will always contain a small number of residual bugs no matter how much testing is performed. Continued maintenance of software tends to add functionality and more residual bugs. Either the second law of thermodynamics directly applies to software, or we have created a very good computer simulation of the second law of thermodynamics - the effects are the same in either case.

An Alternative Concept of Information
Before proceeding, I must mention that in this discussion we are exclusively dealing with Leon Brillouin’s formulation for the concept of information. Unfortunately, there are now several other concepts of information and entropy floating around in science and engineering, and this can cause a great deal of confusion. For example, people who use Information Theory to analyze the flow of information over transmission networks use a slightly different approach to information, and this approach also uses the terms of information and entropy, but in a different manner than we have discussed. In fact, in Information Theory people actually equate entropy to the amount of useful information in a message! In Information Theory people calculate the entropy, or information content of a message, by mathematically determining how much “surprise” there is in a message. For example, in Information Theory, if I transmit a binary message consisting only of 1s or only of 0s, I transmit no useful information because the person on the receiving end only sees a string of 1s or a string of 0s, and there is no “surprise” in the message. For example, the messages “1111111111” or “0000000000” are both equally boring and predictable, with no real “surprise” or information content at all. Consequently, the entropy, or information content, of each bit in these messages is zero, and the total information of all the transmitted bits in the messages is also zero because they are both totally predictable and contain no “surprise”. On the other hand, if I transmit a signal containing an equal number of 1s and 0s, there can be lots of “surprise” in the message because nobody can really tell in advance what the next bit will bring, and each bit in the message then has an entropy, or information content, of one full bit of information. For more on this see Some More Information About Information. This concept of entropy and information content is very useful for people who work with transmission networks and on error detection and correction algorithms for those networks, but it is not very useful for our discussion. For example, suppose you had a 10 bit software configuration file and the only “correct” configuration for your particular installation consisted of 10 1s in a row like this “1111111111”. In Information Theory that configuration file contains no information because it contains no “surprise”. However, in Leon Brillouin’s formulation of information there would be a total of N = 210 possible microstates or configuration files for the 10 bit configuration file, and since the only “correct” version of the configuration file for your installation is “1111111111” there are only N = 1 microstates that meet that condition. Using the formulas above we can now calculate the entropies of our single “correct” 10 bit configuration file and the entropy of all possible 10 bit configuration files as:

Sf = ln(1) = 0

Si = ln(210) = ln (1024) = 6.93147

So using Leon Brillouin’s formulation for the concept of information the Information content of a single “correct” 10 bit configuration file is:

Si - Sf = 6.93147 – 0 = 6.93147

which, if you look at the above table, contains a little more information than drawing a full house in poker without drawing any additional cards and would be even less likely for you to stumble upon by accident than drawing a full house.

So in Information Theory, a very “buggy” 10 MB executable program file would contain just as much information, and would require just as many network resources as transmitting a bug-free 10 MB executable program file. Clearly, the Information Theory formulations for the concepts of information and entropy are less useful for IT professionals than are Leon Brillouin’s formulations for the concepts of information and entropy.

The Conservation of Information Does Not Prevent the Destruction of Useful Information
Another consequence of Brillouin’s reformulation of the second law of thermodynamics that the amount of entropy in the Universe must always increase when anything is changed:

dS/dt > 0

implies that the amount of information in the Universe must also decrease in the Universe whenever something is changed:

dI/dt < 0

Whenever you do something, like work on a piece of software, the total amount of disorder in the Universe must increase, the total amount of information must decrease, and the most likely candidate for that to take place is in the software that you are working on! Truly a sobering thought for any programmer. It seems that the Universe is constantly destroying information because that’s how the Universe tells time!

But the idea of destroying information causes some real problems for physicists, and as we shall see, the solution to that problem is that we need to make a distinction between useful information and useless information. Here is the problem that physicists have with destroying information. Recall, that a reversible process is a process that can be run backwards in time to return the Universe back to the state that it had before the process even began, as if the process had never even happened in the first place. For example, the collision between two molecules at low energy is a reversible process that can be run backwards in time to return the Universe to its original state because Newton’s laws of motion are reversible. Knowing the position of each molecule at any given time and also its momentum, a combination of its speed, direction, and mass, we can predict where each molecule will go after a collision between the two, and also where each molecule came from before the collision using Newton’s laws of motion. For a reversible process such as this, the information required to return a system back to its initial state cannot be destroyed, no matter how many collisions might occur, in order for it to be classified as a reversible process that is operating under reversible physical laws.

Figure 8– The collision between two molecules at low energy is a reversible process because Newton’s laws of motion are reversible (click to enlarge)

Currently, all of the effective theories of physics, what many people mistakenly now call the “laws” of the Universe, are indeed reversible, except for the second law of thermodynamics, but that is because, as we saw above, the second law is really not a fundamental “law” of the Universe at all. Now in order for a law of the Universe to be reversible it must conserve information. That means that two different initial microstates cannot evolve into the same microstate at a later time. For example, in the collision between the blue and pink molecules in Figure 8, the blue and pink molecules both begin with some particular position and momentum one second before the collision and end up with different positions and momenta at one second after the collision. In order for the process to be reversible and Newton’s laws of motion to be reversible too, this has to be unique. A different set of identical blue and pink molecules starting out with different positions and momenta one second before the collision could not end up with the same positions and momenta one second after the collision as the first set of blue and pink molecules. If that were to happen, then one second after the collision, we would not be able to tell what the original positions and momenta of the two molecules were one second before the collision, since there would now be two possible alternatives, and we would not be able to uniquely reverse the collision. We would not know which set of positions and momenta the blue and pink molecules originally had one second before the collision, and the information required to reverse the collision would be destroyed. And because all of the current effective theories of physics are time reversible in nature that means that information cannot be destroyed. So if someday information were indeed found to be destroyed in an experiment, the very foundations of physics would collapse, and consequently, all of science would collapse as well.

So if information cannot be destroyed, but Leon Brillouin’s reformulation of the second law of thermodynamics does imply that the total amount of information in the Universe must decrease (dS/dt > 0 implies that dI/dt < 0), what is going on? The solution to this problem is that we need to make a distinction between useful information and useless information. Recall that the first law of thermodynamics maintains that energy, like information, also cannot be created nor destroyed by any process. Energy can only be converted from one form of energy into another form of energy by any process. For example, when you drive to work, you convert all of the low entropy chemical energy in gasoline into an equal amount of useless waste heat energy by the time you hit the parking lot of your place of employment, but during the entire process of driving to work, none of the energy in the gasoline is destroyed, it is only converted into an equal amount of waste heat that simply diffuses away into the environment as your car cools down to be in thermal equilibrium with the environment. So why cannot I simply drive home later in the day using the ambient energy found around my parking spot? The reason you cannot do that is that pesky old second law of thermodynamics. You simply cannot turn the useless high-entropy waste heat of the molecules bouncing around near your parked car into useful low-entropy energy to power your car home at night. And the same goes for information. Indeed, the time reversibility of all the current effective theories of physics may maintain that you cannot destroy information, but that does not mean that you cannot change useful information into useless information.

But for all practical purposes from an IT perspective, turning useful information into useless information is essentially the same as destroying information. For example, suppose you take the source code file for a bug-free program and scramble its contents. Theoretically, the scrambling process does not destroy any information because theoretically it can be reversed. But in practical terms you will be turning a low-entropy file into a useless high-entropy file that only contains useless information. So effectively you will have destroyed all of the useful information in the bug-free source code file. Here is another example. Suppose you are dealt a full house, K-K-K-4-4, but at the last moment a misdeal is declared and your K-K-K-4-4 gets shuffled back into the deck! Now the K-K-K-4-4 still exists as scrambled hidden information in the entropy of the entire deck, and so long as the shuffling process can be reversed, the K-K-K-4-4 can be recovered, and no information is lost, but that does not do much for your winnings. Since all the current laws of physics are reversible, including quantum mechanics, we should never see information being destroyed. In other words, because entropy must always increases and never decreases, the hidden information of entropy cannot be destroyed.

Entropy, Information and Black Holes
In recent years, the idea that information cannot be destroyed has caused quite a battle in physics as outlined in Leonard Susskind’s The Black Hole War – My Battle With Stephen Hawking To Make The World Safe For Quantum Mechanics (2008). It all began in 1972, when Jacob Bekenstein suggested that black holes must have entropy. A black hole is a mass that is so concentrated that its surrounding gravitational field will allow nothing to escape, not even light. At the time, it was thought that a black hole only had three properties: mass, electrical charge, and angular momentum – a measure of its spin. Now imagine two identical black holes. Into the first black hole we start dropping a large number of card decks fresh from the manufacturer that are in perfect sort order. Into the second black hole we drop a similar number of shuffled card decks. When we are finished, both black holes will have increased in mass by exactly the same amount, and will remain identical because there will be no change to their electrical charge or angular momentum either. The problem is that the second black hole picked up much more entropy by absorbing all those shuffled card decks than the first black hole, which absorbed only fresh card decks in perfect sort order. If the two black holes truly remain identical after absorbing different amounts of entropy, then we would have a violation of the second law of thermodynamics because some entropy obviously disappeared from the Universe. Bekenstein proposed that in order to preserve the second law of thermodynamics, black holes must have a fourth property – entropy. Since the only thing that changes when a black hole absorbs decks of cards, or anything else, is the radius of the black hole’s event horizon, then the entropy of a black hole must be proportional to the area of its event horizon. The event horizon of a black hole essentially defines the size of a black hole. At the heart of a black hole is a singularity, a point-sized pinch in spacetime with infinite density, where all the current laws of physics break down. Surrounding the singularity is a spherical event horizon. The black hole essentially sucks spacetime down into its singularity with increasing speed as you approach the singularity, and the event horizon is simply where spacetime is being sucked down into the black hole at the speed of light. Because nothing can travel faster than the speed of light, nothing can escape from within the event horizon of a black hole because everything within the event horizon is carried along by the spacetime being sucked down into the singularity faster than the speed of light.

In Bekenstein’s model, the event horizon of a black hole is densely packed with bits, or pixels, of information to account for all the information, or entropy, that has fallen past the black hole’s event horizon. Like in a computer, the pixilated bits on the event horizon of a black hole can be in a state of “1”, or “0”. Each pixel is one Planck unit in area, about 10-70 square meters. In Quantum Software we will learn how Max Planck started off quantum mechanics in 1900 with the discovery of Planck’s constant h = 4.136 x 10-15 eV sec. Later, Planck proposed that, rather than using arbitrary units of measure like meters, kilograms, and seconds that were simply thought up by certain human beings, we should use the fundamental constants of nature - c (the speed of light), G (Newton’s gravitational constant), and h (Planck’s constant) to define the basic units of length, mass, and time. When you combine c, G, and h into a formula that produces a unit of length, called the Planck length, it comes out to about 10-35 meters, so a Planck unit of area is the square of a Planck length or about 10-70 square meters. Now a Planck length is a very small distance indeed, about 1025 times smaller than an atom, so a square Planck length is a very small area, which means the data density of black holes is quite large, and in fact, black holes have the maximum data density allowed in the Universe. They would make great disk drives!

In 1974, Stephen Hawking calculated the exact formula for the entropy of a black hole’s event horizon, and also determined that black holes must also have a temperature, and consequently, radiate energy into space. Recall that entropy was originally defined in terms of heat flow. This meant that because black holes are constantly losing energy by radiating Hawking radiation, they must be slowly evaporating and will one day totally disappear in a brilliant flash of light. This would take a very long time, for example, a black hole with a mass equal to that of the Sun would evaporate in about 2 x 1067 years. But what happens to all the information or entropy that black holes absorb over their lifetimes? Does it disappear from the Universe as well? This is exactly what Hawking, and the other relativists, believed – information, or entropy, could be destroyed as black holes evaporated.

This idea was quite repugnant to people like Leonard Susskind, and other string theorists grounded in quantum mechanics. In future postings, we shall see that the general theory of relativity is a very accurate effective theory that makes very accurate predictions for large masses separated by large distances, but does not work very well for small objects separated by small atomic-sized distances. Quantum mechanics, is just the opposite; it works for very small objects separated by very small distances, but does not work well for large masses at large distances. Physics has been trying to combine the two into a theory of quantum gravity for nearly 80 years to no avail. Fortunately, the relativists and quantum people work on very different problems in physics and never even have to speak to each other for the most part. These two branches of physics are only forced to confront each other at the extrema of the earliest times following the Big Bang and at the event horizon of black holes.

In The Black Hole War – My Battle With Stephen Hawking To Make The World Safe For Quantum Mechanics (2008), Susskind describes the ensuing 30 year battle he had with Stephen Hawking over this issue. The surprising solution to all this is the Holographic Principle. The Holographic Principle states that any amount of 3-dimensional stuff in our physical Universe, like an entire galaxy, can be described by the pixilated bits of information on a 2-dimensional surface surrounding the 3-dimensional stuff, just like the event horizon of a black hole. It is called the Holographic Principle because, like the holograms that you see in science museums, the 2-dimensional wiggles of an interference pattern generated by a laser beam bouncing off 3-dimensional objects and recorded on a 2-dimensional film, can regenerate a 3-dimensional image of the objects that you can walk around, and which appears just like the original 3-dimensional objects. In the late 1990s, Susskind and other investigators demonstrated with string theory and the Holographic Principle that black holes are covered with a huge number of strings attached to the event horizon which can break off as photons or other fundamental particles. String theory has been an active area of investigation for the past 30 years in physics and contends that the fundamental particles of nature, such as photons, electrons, quarks, and neutrinos are really very small vibrating strings or loops of energy about one Planck length in size. In this model, the pixilated bits on the event horizon of a black hole are really little strings with both ends firmly attached to the event horizon. Every so often, the strings can twist upon themselves and form a complete loop that breaks free of the event horizon to become a photon, or any other fundamental particle, that is outside of the event horizon and which, therefore, can escape from the black hole. As it does so, it carries away the entropy or information it encoded on the black hole event horizon. This is the string theory explanation of Hawking radiation. In fact, the Holographic Principle can be extended to the entire observable Universe, which means that all the 3-dimensional stuff in our Universe can be depicted as a huge number of bits of information on a 2-dimensional surface surrounding our Universe, like a gargantuan quantum computer calculating how to behave. As I mentioned in So You Want To Be A Computer Scientist, the physics of the 20th century has led many physicists and philosophers to envision our physical Universe simply consisting of information, running on a huge quantum computer. Please keep this idea in mind, while reading the remainder of the postings in this blog.

09/25/2016 Update
For a very interesting update on recent work on black holes and information consider taking the Master Class The Black Hole Information Paradox by Samir Mathur at the World Science U at

So what happened to Boltzmann and his dream of a statistical mechanical explanation for the observed laws of thermodynamics? Boltzmann had to contend with a great deal of resistance by the physics establishment of the day, particularly from those like Ernst Mach who had adopted an extreme form of logical positivism. This group of physicists held that it was a waste of time to create theories based upon things like molecules that could not be directly observed. After a life long battle with depression and many years of scorn from his peers, Boltzmann tragically committed suicide in 1906. On his tombstone is found the equation:

S = k ln(N)

Scientists and engineers have developed a deep respect for the second law of thermodynamics because it is truly "spooky".

Next time we will further explore the nature of information and its role in the Universe, and surprisingly, see how it helped to upend all of 19th century physics.

Comments are welcome at

To see all posts on softwarephysics in reverse order go to:

Steve Johnston

No comments: