Saturday, August 13, 2022

The Application of the Second Law of Information Dynamics to Software and Bioinformatics

If you are an Information Technology professional then, naturally, the concept of Information should be of great interest to you. This should also be true for those working in the new field of bioinformatics. Unfortunately, the definition and behavior of Information in our Universe are still a bit fuzzy in all fields of study and are, therefore, causing some problems with the advancement of many very important fields of inquiry. Much of this confusion arises because there is a physical thermodynamic version of the technical terms for "entropy" and "Information" that is used by physicists, chemists and engineers and another version for the very same terms of "entropy" and "Information" that is used by people in telecommunications using what they call Information Theory. In this post, I would like to help to clarify this confusion by covering some more work by Dr. Melvin Vopson out of the University of Portsmouth that attempts to reconcile these two somewhat similar and somewhat different approaches to the concepts of "entropy" and "Information". Dr. Vopson does so by proposing a new law of physics that he has dubbed the second law of Information dynamics or the second law of infodynamics for short.

Now, in How Much Does Your Software Weigh? - the Equivalence of Mass, Energy and Information, I covered Dr. Melvin Vopson's recent proposal that Information is just as real as mass and energy in our Universe and that stored Information actually has a measurable mass-energy that adds to the mass-energy of the medium storing the Information. In that paper, Dr. Melvin Vopson derives the mass of a single bit of Information as:

mbit = kb T ln(2) / c2

where:
m = the mass of one bit of Information in kilograms
kb = Boltzmann's constant = 1.38064852 × 10-23 Joules/oK
T = the temperature of your Information in absolute oK = 300 oK for a room temperature of 80 oF
ln(2) = 0.6931471806
c = the speed of light 3 x 108 meters/second

This comes to a value of 3.19 x 10-38 kilogram/bit at a room temperature of 300 oK (80 oF). In that post, I explained that it took about 500 MB or 4,194,304,000 bits of Information to display the post in Chrome and that the mass of that amount of Information came to about the mass of 147 electrons! Dr. Melvin Vopson has even proposed a doable experiment to validate his hypothesis:

Experimental protocol for testing the mass–energy–Information equivalence principle
https://aip.scitation.org/doi/full/10.1063/5.0087175

The results of such an experiment would either confirm or deny the equivalence of mass, energy and Information. Just as the establishment of the equivalence of mass and energy by Albert Einstein in 1905 with his Special Theory of Relativity by the famous equation:

E = m c2

where:
E = energy in Joules
m = the mass in kilograms
c = the speed of light 3 x 108 meters/second

was validated by the explosion of atomic bombs many years later in the 20th century, the validation of mass, energy and Information in the 21st century would be, at least, as intellectually dramatic.

You can read an English translation of Einstein's famous On the Electrodynamics of Moving Bodies in which he first introduced the world to the Special Theory of Relativity at:

http://www.fourmilab.ch/etexts/einstein/specrel/www/

The first few sections are very enlightening and not that difficult.

Dr. Melvin Vopson is now mainly conducting theoretical work on the nature of Information and has recently established the Information Physics Institute at:

Information Physics Institute
https://www.Informationphysicsinstitute.org/

to explore the fundamental nature of Information in our Universe.

But in this post, I would like to explore Dr. Vopson's most recent paper:

Second law of Information dynamics
https://aip.scitation.org/doi/full/10.1063/5.0100358

ABSTRACT
One of the most powerful laws in physics is the second law of thermodynamics, which states that the entropy of any system remains constant or increases over time. In fact, the second law is applicable to the evolution of the entire universe and Clausius stated, “The entropy of the universe tends to a maximum.” Here, we examine the time evolution of Information systems, defined as physical systems containing Information states within Shannon’s Information theory framework. Our observations allow the introduction of the second law of Information dynamics (infodynamics). Using two different Information systems, digital data storage and a biological RNA genome, we demonstrate that the second law of infodynamics requires the Information entropy to remain constant or to decrease over time. This is exactly the opposite to the evolution of the physical entropy, as dictated by the second law of thermodynamics. The surprising result obtained here has massive implications for future developments in genomic research, evolutionary biology, computing, big data, physics, and cosmology.


In this paper, Dr. Vopson takes Boltzmann's formula for calculating the physical thermodynamic entropy of a system and Claude Shannon's Information Theory formula for calculating the Information entropy in a digital signal and reconciles these two very different formulations for Information by developing a proposal for a new fundamental law of physics that he calls the second law of infodynamics. The second law of infodynamics should be of interest to all who deal with Information in our Universe.

Figure 1 – In the paper, Dr. Vopson begins with 11 rows of 8 patches of magnetic material that are not magnetized at all (a). He then writes the string INFORMATION on the 88 patches by magnetizing the 0s in one direction and the 1s in the opposite direction (b). In (c) he explicitly displays the string INFORMATION with its ASCII 1s and 0s for clarity.

Figure 2 – Next, he takes the original unmagnetized 88 patches in (a) and the string INFORMATION written to them in (b) and lets them age over a great deal of time. Dr. Vopson simulates the progression of time using a Monte Carlo algorithm and the physics of magnetism to let the thermal jiggling of atoms in the magnetic material slowly demagnetize the tiny magnetic grains in each patch. As a result, we slowly watch the string INFORMATION slowly fade away, until by (h), we can no longer see the string INFORMATION at all and have essentially returned the 88 patches back to what we had in (a).

In Figure 1 and Figure 2 above, we see Dr. Vopson's second law of infodynamics in action. In Figure 1(a), he begins with 88 patches of a magnetic material that has not yet been magnetized. Then in Figure 1(b), he writes the string INFORMATION to the 88 patches in ASCII by magnetizing the 1s in one direction and the 0s in the other direction. In Figure 2 he simulates what happens to the magnetized patches over a great deal of time. The magnetized patches really are composed of many little magnetized grains of material. To fully understand how permanent magnets work you need to take a deep dive into quantum mechanics, but suffice it to say that over time the thermal jostling of atoms will slowly cause these little magnetized grains to lose their magnetic fields, and that is what we slowly see happen in the remainder of Figure 2. Over a very long time, the individual magnetized grains lose their magnetic fields and the string INFORMATION slowly decays into nothingness until we get to Figure 2(h) which looks very much like Figure 2(a) which did not contain the string INFORMATION. The new second law of infodynamics proposes that the old second law of thermodynamics causes physical entropy to always increase and as physical entropy increases, it destroys Information entropy by constantly reducing the number of Information-bearing things N. In Figure 2 we see this slowly happening over time as the Information-bearing grains slowly lose their magnetic fields.

In this paper, Dr. Vopson explains that Claude Shannon's Information Theory view of entropy and Information tells us that the amount of entropy Information in a message is:

     Information Entropy = N H(p)

where

     H(p) = - Σ p(x) log2 p(x)
     N is the number of symbols in the message

H(p) is just a mathematical description of the probabilities of the symbols that are used to transmit the message, and N is just the number of symbols in the message. For example, the letter "e" is the most common letter in the English language, so when you send an "e" in a message you are sending less Information entropy than when you send an "x". So H(p) is very useful for contestants on the American TV gameshow the "Wheel of Fortune" because finding an "x" on the board tells you much more about the secret message than finding an "e". But nearly all digital Information is sent as a string of 1s and 0s, and for a long message, usually, an equal number of 1s and 0s are always sent, so for any long binary message H(P) → 1. All that means is that usually for Claude Shannon's Information Theory formulation for entropy and Information:

Information Entropy = N

where N is the number of symbols in the message when each symbol has an equal probability of occurring, such as a binary message composed of an equal number of 1s and 0s or for a long RNA strand that has nearly an equal number of U, A, C and G bases. So if the second law of thermodynamics always causes physical entropy to increase with time and increasing physical entropy constantly reduces N, that means that the amount of Information entropy must constantly decrease with time as proposed by the new second law of infodynamics.

Now anybody who works with Information on a daily basis should not be surprised at all by this new second law of infodynamics because you see this every day in your work. Information always seems to naturally decay into nothingness over time all by itself. That is just a common sense experience. It is like watching a strand of DNA naturally decompose into nothingness or watching many millions of dollars of software rapidly decompose into nothingness from the activities of a new neophyte programmer who recently joined your Applications Development group. It is because we all know that it is much easier to mess things up than it is to make things better because there are just so many many ways to really mess things up! But deriving common sense in the sciences from fundamental laws is usually a very difficult thing indeed. That is because most of common sense is so exceedingly wrong. For example, common sense tells us that the Earth is flat and at the center of the Universe. This is one of the reasons why all professional scientists usually greatly fear common sense explanations. But when it comes to Information, this might be one of the very rare occasions when common sense can truly be of use. So let us begin with a very common sense definition of Information:

Information = Something you know

and then see how physicists, chemists and engineers tried to formalize this definition of Information with the concept of physical entropy. After that, we will see how people in telecommunications tried to do the same thing with Claude Shannon's concept of Information entropy. This analysis will help to explain why there is so much confusion about the concepts of entropy and Information in so many fields of study.

The Very Sordid History of the Physical Entropy and Information Used by Physicists, Chemists and Engineers
It all began when people started building steam engines. In 1712, Thomas Newcomen invented the first commercially successful steam engine. It consisted of an iron cylinder with a movable piston. Low-pressure steam was sucked into a cylinder by a rising piston. When the piston reached its maximum extent, a cold water spray was shot into the cylinder causing the steam to condense and form a partial vacuum in the cylinder. External atmospheric air pressure then forced the piston down during the power stroke. In the 18th century, steam engines used low-pressure steam and were thus called atmospheric steam engines because the power stroke came from atmospheric air pressure. High-pressure steam boilers in the 18th century were simply too dangerous to use for steam engines. The Newcomen steam engine was used primarily to pump water out of coal mines. It had an efficiency of about 1%, meaning that about 1% of the energy in the coal used to fuel the engine ended up as useful mechanical work, while the remaining 99% ended up as useless waste heat. This did not bother people because they had never even heard of the term energy. The concept of energy did not come into existence until 1850 when Rudolph Clausius published the first law of thermodynamics. However, people did know that the Newcomen steam engine used a lot of coal. This was not a problem if you happened to own a coal mine, but for 18th-century factory owners, the Newcomen steam engine was far too expensive to run for their needs.

Figure 3 – The first commercially successful steam engine was invented by Thomas Newcomen in 1712. The Newcomen steam engine had an efficiency of 1%.

Figure 4 – In 1765, James Watt improved the Newcomen steam engine by introducing a condensing cylinder to condense steam during the power stroke and by using a steam jacket to always keep the main cylinder at a temperature higher than the boiling point of water. Watt's improved steam engine had an efficiency of 3%, and on that basis, launched the Industrial Revolution

In 1763, James Watt was a handyman at the University of Glasgow building and repairing equipment for the University. One day the Newcomen steam engine at the University broke, and Watt was called upon to fix it. During the course of his repairs, Watt realized that the main cylinder lost a lot of heat through conduction and that the water spray which cooled the entire cylinder below 212 oF required a lot of steam to reheat the cylinder above 212 oF on the next cycle. In 1765, Watt had one of those scientific revelations in which he realized that he could reduce the amount of coal required by a steam engine if he could just keep the main cylinder above 212 oF for the entire cycle. He came up with the idea of using a secondary condensing cylinder cooled by a water jacket to condense the steam instead of using the main cylinder. He also added a steam jacket to the main cylinder to guarantee that it always stayed above 212 oF for the entire cycle. In 1765, Watt conducted a series of experiments on scale model steam engines that proved out his ideas. Watt’s steam engine had an efficiency of 3% which may still sound pretty bad, but that meant it only used 1/3 the coal of a Newcomen steam engine with the same horsepower. So the Watt steam engine became an economically viable option for 18th-century factory owners to launch the Industrial Revolution.

Steam engines then became all the rage, but were all built on a trial and error basis with no help from the sciences. In 1738, Bernoulli proposed that gasses were really composed of a very large number of molecules bouncing around in all directions. Gas pressure in a cylinder was simply the result of a huge number of molecular impacts from individual gas molecules against the walls of a cylinder, and heat was just a measure of the kinetic energy of the molecules bouncing around in the cylinder. But his kinetic theory of heat was not held in favor in Watt’s day, and unfortunately for early steam engine designers, none of these theories of heat related heat energy to mechanical energy, so Watt and the other 18th-century steam engine designers were pretty much on their own, without much help from the available science of the day. Then, in 1783, Antoine Lavoisier proposed that when coal burned, it really combined with the newly discovered gas found in the air called oxygen and released another mysterious substance called caloric. Caloric was the “substance of heat” and always flowed from hot bodies to cold bodies. This seemed to make common sense and became quite popular. In 1824 Sadi Carnot published a small book entitled Reflections on the Motive Power of Fire, which was the first scientific treatment of steam engines. In this book, Carnot asked lots of questions. Carnot wondered if a lump of coal could do an infinite amount of work. “Is the potential work available from a heat source potentially unbounded?". He also wondered if the material that the steam engine was made of or the fluid used by the engine made any difference. "Can heat engines be in principle improved by replacing the steam by some other working fluid or gas?". And he came to some very powerful conclusions. Carnot proposed that the efficiency of a steam engine only depended upon the difference between the temperature of the steam used by the engine and the temperature of the room in which the steam engine was running. He envisioned a steam engine as a sort of caloric waterfall, with the difference between the temperature of the steam from the boiler and the temperature of the room the steam engine was in being the height of the waterfall. Useful work was accomplished by the steam engine as caloric fell from the high temperature of the steam to the lower temperature of the room, like water falling over a waterfall doing useful work on a paddlewheel. It did not matter what the steam engine was made of or what fluid was used.

"The motive power of heat is independent of the agents employed to realize it; its quantity is fixed solely by the temperatures of the bodies between which is effected, finally, the transfer of caloric."

“In the fall of caloric the motive power evidently increases with the difference of temperature between the warm and cold bodies, but we do not know whether it is proportional to this difference.”

“The production of motive power is then due in steam engines not to actual consumption of the caloric but to its transportation from a warm body to a cold body.”

However, Reflections on the Motive Power of Fire created little attention in 1824 and quickly went out of print. Carnot’s ideas were later revived by work done by Lord Kelvin in 1848 and by Rudolph Clausius in 1850 when Clausius published the first and second laws of thermodynamics in On the Moving Force of Heat and the Laws of Heat which may be Deduced Therefrom.

In 1842, Julius Robert Mayer unknowingly published the first law of thermodynamics in the May issue of Annalen der Chemie und Pharmacie using experimental results done earlier in France. In this paper, Mayer was the first to propose that there was a mechanical equivalent of heat. Imagine two cylinders containing equal amounts of air. One cylinder has a heavy movable piston supported by the pressure of the confined air, while the other cylinder is completely closed-off. Now heat both cylinders. The French found that the cylinder with the movable piston had to be heated more than the closed-off cylinder to raise the temperature of the air in both cylinders by some identical value like 10 oF. Mayer proposed that some of the heat in the cylinder with the movable piston was converted into mechanical work to lift the heavy piston, and that was why it took more heat to raise the temperature of the air in that cylinder by 10 oF. This is what happens in the cylinders of your car when burning gasoline vapors cause the air in the cylinders to expand and push the pistons down during the power stroke. Mayer proposed there was a new mysterious fluid at work that we now call energy, and that it was conserved. That means that energy can change forms, but that it cannot be created or destroyed. So for the cylinder with the heavy movable piston, chemical energy in coal is released when it is burned and transformed into heat energy; the resulting heat energy then causes the air in the cylinder to expand, which then lifts the heavy piston. The end result is that some of the heat energy is converted into mechanical energy. Now you might think that with Mayer’s findings that, at long last, steam engine designers finally had some idea of what was going on in steam engines! Not quite. Mayer’s idea of a mechanical equivalent of heat was not well received at the time because Mayer was a medical doctor and considered an outsider of little significance by the scientific community of the day.

About the same time another outsider, James Prescott Joule, was doing similar experiments. Joule was the manager of a brewery and an amateur scientist on the side. Joule was investigating the possibility of replacing the steam engines in his brewery with the newly invented electric motor. This investigation ended when Joule discovered that it took about five pounds of battery zinc to do the same work as a single pound of coal. But during these experiments, Joule came to the conclusion that there was an equivalence between the heat produced by an electrical current in a wire, like in a toaster, and the work done by an electrical current in a motor. In 1843, he presented a paper to the British Association for the Advancement of Science in which he announced that it took 838 ft-lbs of mechanical work to raise the temperature of a pound of water by 1 oF (1 Btu). In 1845, Joule presented another paper, On the Mechanical Equivalent of Heat, to the same association in which he described his most famous experiment. Joule used a falling weight connected to a paddle wheel in an insulated bucket of water via a series of ropes and pulleys to stir the water in the bucket. He then measured the temperature rise of the water as the weight, suspended by a rope connected to the paddle wheel, descended and stirred the water. The temperature rise of the water was then used to calculate the mechanical equivalent of heat that equated BTUs of heat to ft-lbs of mechanical work. But as with Mayer, Joule’s work was entirely ignored by the physicists of the day. You see, Mayer and Joule were both outsiders challenging the accepted caloric theory of heat.

All this changed in 1847 when physicist Hermann Helmholtz published On the Conservation of Force in which he referred to the work of both Mayer and Joule and proposed that heat and mechanical work were both forms of the same conserved force we now call energy. Finally, in 1850, Rudolph Clausius formalized this idea as the first law of thermodynamics in On the Moving Force of Heat and the Laws of Heat which may be Deduced Therefrom.

The modern statement of the first law of thermodynamics reads as:

dU = dQ – dW

This equation simply states that a change in the internal energy (dU) of a closed system, like a steam engine, is equal to the amount of heat flowing into the system (dQ), minus the amount of mechanical work that the system performs (dW). Thus steam engines take in heat energy from burning coal and convert some of the heat energy into mechanical energy, exhausting the rest as waste heat.

Remember how Carnot thought that the efficiency of a steam engine only depended upon the temperature difference between the steam from the boiler and the temperature of the room in which the steam engine was running? Carnot believed that when caloric fell through this temperature difference, it produced useful mechanical work. Clausius reformulated this idea with a new concept he called entropy, described in his second law of thermodynamics. Clausius reasoned that there had to be some difference in the quality of different energies. For example, there is a great deal of heat energy in the air in the room you are currently sitting in. According to the first law of thermodynamics, it would be possible to build an engine that converts the heat energy in the air into useful mechanical energy and exhausts cold air out the back. With such an engine, you could easily build a refrigerator that produced electricity for your home and cooled your food for free at the same time. All you would need to do would be to hook up an electrical generator to the engine, and then run the cold exhaust from the engine into an insulated compartment! Clearly, this is impossible. As Clausius phrased it, "Heat cannot of itself pass from a colder to a hotter body".

The second law of thermodynamics can be expressed in many ways. One of the most useful goes back to Carnot’s original idea of the maximum efficiency of an engine and the temperature differences between the steam and room temperature. It can be framed quantitatively as:

Maximum efficiency of an engine = 1 – Tc/Th

where Tc is the temperature of the cold reservoir into which heat is dissipated and Th is the temperature of the hot reservoir from which heat is obtained. For a steam engine, Th corresponds to the temperature of the steam and Tc to the temperature of the surrounding room, with both temperatures measured in absolute degrees Kelvin. So Carnot’s proposal was correct. As the temperature difference between Tc and Th gets larger, Tc/Th gets smaller and the efficiency of the engine increases and approaches the value “1” or 100%. In this form of the second law, entropy represents the amount of useless energy; energy that cannot be turned into useful mechanical work and remains as useless waste heat.

Later, in 1865, Clausius presented the most infamous version of the second law at the Philosophical Society of Zurich as:

The entropy of the universe tends to a maximum.

What he meant here was that spontaneous changes tend to smooth out differences in temperature, pressure, and density. Hot objects cool off, tires under pressure leak air and the cream in your coffee will stir itself if you are patient enough. A car will spontaneously become a pile of rust, but a pile of rust will never spontaneously become a car. Spontaneous changes cause an increase in entropy and entropy is just a measure of the degree of this smoothing-out process. So as the entropy of the universe constantly increases with each spontaneous change - the universe tends to run downhill with time. Later we will see that entropy is also a measure of the disorder of a system at the molecular level. In a sense, Murphy’s law is just a popular expression of the second law. Clausius quantified the second law of thermodynamics as:

ΔS = ΔQ / T

which means that the change in entropy of an isolated system ΔS is equal to the heat flow ΔQ into or out of the isolated system divided by the absolute temperature T of the isolated system.

The first and second laws of thermodynamics laid the foundation of thermodynamics which describes the bulk properties of matter and energy. The science of thermodynamics finally allowed steam engine designers to understand what was going on in the cylinders of a steam engine by relating the pressures, temperatures, volumes, and energy flows of the steam in the cylinders while the engine was running. This ended the development of steam engines by trial and error, and the technological craft of steam engine building finally matured into a science.

In thermodynamics, there is a mysterious fluid at work called entropy which measures how run down a system is. You can think of entropy as depreciation; it always increases with time and never decreases. As I mentioned previously, the second law of thermodynamics can be expressed in many forms. Another statement goes as:

dS/dt ≥ 0

This is a simple differential equation that states that the entropy S of an isolated system can only remain constant or must increase with time when an irreversible process is performed by the isolated system. All of the other effective theories of physics can also be expressed as differential equations with time as a factor. However, all of these other differential equations have a "=" sign in them, meaning that the interactions can be run equally forwards or backwards in time. For example, a movie of two billiard balls colliding can be run forwards or backwards in time, and you cannot tell which is which. Such a process is called a reversible process because it can be run backwards to return the Universe to its original state, like backing out a bad software install for a website. But if you watch a movie of a cup falling off a table and breaking into a million pieces, you can easily tell which is which. For all the other effective theories of physics, a broken cup can spontaneously jump back onto a table and reassemble itself into a whole cup if you give the broken shards the proper energy kick. This would clearly be a violation of the second law of thermodynamics because the entropy of the cup fragments would spontaneously decrease.

Energy too is subject to the second law of thermodynamics and subject to entropy increases as well. A steam engine can convert the useful energy in high-temperature steam into useful mechanical energy by dumping a portion of the steam energy into a reservoir at a lower temperature. Because not all of the energy in steam can be converted into useful mechanical energy, steam engines, like all other heat engines, can never be 100% efficient. Software too tends to increase in entropy whenever maintenance is performed upon it. Software tends to depreciate by accumulating bugs. DNA and RNA also tend to increase in entropy from mutations caused by external influences too.

Figure 5 - We begin with a left compartment containing cold slow-moving nitrogen molecules (white circles) and a right compartment with hot fast-moving nitrogen molecules (black circles).

Figure 6 - Next, we perforate the divider between the compartments and allow the hot and cold nitrogen molecules to bounce off each other and exchange energies.

Figure 7 - After a period of time the two compartments will equilibrate to the same average temperature, but we will always find some nitrogen molecules bouncing around faster (black dots) and some nitrogen molecules bouncing around slower (white dots) than the average.

Recall that in 1738 Bernoulli proposed that gasses were really composed of a very large number of molecules bouncing around in all directions. Gas pressure in a cylinder was simply the result of a huge number of molecular impacts from individual gas molecules striking the walls of a cylinder, and heat was just a measure of the kinetic energy of the molecules bouncing around in the cylinder. Bernoulli’s kinetic theory of gasses was not well received by the physicists of the day because many physicists in the 18th and 19th centuries did not believe in atoms or molecules. The idea of the atom goes back to the Ancient Greeks, Leucippus, and his student Democritus, about 450 B.C. and was formalized by John Dalton in 1803 when he showed that in chemical reactions the relative weights of elemental chemical reactants were always the same and proportional to integer multiples. But many physicists had problems with thinking of matter being composed of a “void” filled with “atoms” because that meant they had to worry about the forces that kept the atoms together and that repelled atoms apart when objects “touched”. To avoid this issue, many physicists simply considered “atoms” to be purely a mathematical trick used by chemists to do chemistry. This held sway until 1897 when J. J. Thompson successfully isolated electrons in a cathode ray beam by deflecting them with electric and magnetic fields in a vacuum. It seems that physicists don’t have a high level of confidence in things until they can bust them up into smaller things.

In 1859, physicist James Clerk Maxwell took Bernoulli’s idea one step further. He combined Bernoulli’s idea of a gas being composed of a large number of molecules with the new mathematics of statistics. Maxwell reasoned that the molecules in a gas would not all have the same velocities. Instead, there would be a distribution of velocities; some molecules would move very quickly while others would move more slowly, with most molecules having a velocity around some average velocity. Now imagine that the two compartments in Figure 5 are filled with nitrogen gas, but that the left compartment is filled with cold slow-moving nitrogen molecules (white dots), while the right compartment is filled with hot fast-moving nitrogen molecules (black dots). If we perforate the partition between compartments, as in Figure 6 above, we will observe that the fast-moving hot molecules on the right will mix with and collide with the slow-moving cold molecules on the left and will give up kinetic energy to the slow-moving molecules. Eventually, both compartments will be found to be at the same temperature as shown in Figure 7, but we will always find some molecules moving faster than the average (black dots), and some molecules moving slower than the average (white dots) just as Maxwell had determined. This is called a state of thermal equilibrium and demonstrates a thermal entropy increase. We never observe a gas in thermal equilibrium suddenly dividing itself into hot and cold compartments all by itself. The gas can go from Figure 6 to Figure 7 but never the reverse because such a process would also be a violation of the second law of thermodynamics.

In 1867, Maxwell proposed a paradox along these lines known as Maxwell’s Demon. Imagine that we place a small demon at the opening between the two compartments and install a small trap door at this location. We instruct the demon to open the trap door whenever he sees a fast-moving molecule in the left compartment approach the opening to allow the fast-moving molecule to enter the right compartment. Similarly, when he sees a slow-moving molecule from the right compartment approach the opening, he opens the trap door to allow the low-temperature molecule to enter the left compartment. After some period of time, we will find that all of the fast-moving high-temperature molecules are in the right compartment and all of the slow-moving low-temperature molecules are in the left compartment. Thus the left compartment will become colder and the right compartment will become hotter in violation of the second law of thermodynamics (the gas would go from Figure 7 to Figure 6 above). With the aid of such a demon, we could run a heat engine between the two compartments to extract mechanical energy from the right compartment containing the hot gas as we dumped heat into the colder left compartment. This really bothered Maxwell, and he never found a satisfactory solution to his paradox. This paradox also did not help 19th-century physicists become more comfortable with the idea of atoms and molecules.

In 1929, Leo Szilárd became an instructor and researcher at the University of Berlin. There he published a paper, On the Decrease of Entropy in a Thermodynamic System by the Intervention of Intelligent Beings in 1929.

Figure 8 – In 1929 Szilard published a paper in which he explained that the process of the Demon knowing which side of a cylinder a molecule was in must produce some additional entropy to preserve the second law of thermodynamics.

In Szilárd's 1929 paper, he proposed that using Maxwell’s Demon, you could indeed build a 100% efficient steam engine in conflict with the second law of thermodynamics. Imagine a cylinder with just one water molecule bouncing around in it as in Figure 8(a). First, the Demon figures out if the water molecule is in the left half or the right half of the cylinder. If he sees the water molecule in the right half of the cylinder as in Figure 8(b), he quickly installs a piston connected to a weight via a cord and pulley. As the water molecule bounces off the piston in Figure 8(c) and moves the piston to the left, it slowly raises the weight and does some useful work on it. In the process of moving the piston to the left, the water molecule must lose kinetic energy in keeping with the first law of thermodynamics and slow down to a lower velocity and temperature than the atoms in the surrounding walls of the cylinder. When the piston has finally reached the far left end of the cylinder it is removed from the cylinder in preparation for the next cycle of the engine. The single water molecule then bounces around off the walls of the cylinder as in Figure 8(a), and in the process picks up additional kinetic energy from the jiggling atoms in the walls of the cylinder as they kick the water molecule back into the cylinder each time it bounces off the cylinder walls. Eventually, the single water molecule will once again be in thermal equilibrium with the jiggling atoms in the walls of the cylinder and will be on average traveling at the same velocity it originally had before it pushed the piston to the left. So this proposed engine takes the ambient high-entropy thermal energy of the cylinder’s surroundings and converts it into the useful low-entropy potential energy of a lifted weight. Notice that the first law of thermodynamics is preserved. The engine does not create energy; it simply converts the high-entropy thermal energy of the random motions of the atoms in the cylinder walls into useful low-entropy potential energy, but that does violate the second law of thermodynamics. Szilárd's solution to this paradox was simple. He proposed that the process of the Demon figuring out if the water molecule was in the left-hand side of the cylinder or the right-hand side of the cylinder must cause the entropy of the Universe to increase. So “knowing” which side of the cylinder the water molecule was in must come with a price; it must cause the entropy of the Universe to increase. Recall that our first attempt at defining Information was simply to say that Information was “something you know”. More aptly, we should have said that useful Information was simply “something you know”.

Finally, in 1953 Leon Brillouin published a paper with a thought experiment explaining that Maxwell’s Demon required some Information to tell if a molecule was moving slowly or quickly. Brillouin defined this Information as negentropy, or negative entropy, and found that Information about the velocities of the oncoming molecules could only be obtained by the demon by bouncing photons off the moving molecules. Bouncing photons off the molecules increased the total entropy of the entire system whenever the demon determined if a molecule was moving slowly or quickly. So Maxwell's Demon was really not a paradox after all since even the Demon could not violate the second law of thermodynamics. Here is the abstract for Leon Brillouin’s famous 1953 paper:

The Negentropy Principle of Information
Abstract
The statistical definition of Information is compared with Boltzmann's formula for entropy. The immediate result is that Information I corresponds to a negative term in the total entropy S of a system.

S = S0 - I

A generalized second principle states that S must always increase. If an experiment yields an increase ΔI of the Information concerning a physical system, it must be paid for by a larger increase ΔS0 in the entropy of the system and its surrounding laboratory. The efficiency ε of the experiment is defined as ε = ΔI/ΔS0 ≤ 1. Moreover, there is a lower limit k ln2 (k, Boltzmann's constant) for the ΔS0 required in an observation. Some specific examples are discussed: length or distance measurements, time measurements, observations under a microscope. In all cases it is found that higher accuracy always means lower efficiency. The Information ΔI increases as the logarithm of the accuracy, while ΔS0 goes up faster than the accuracy itself. Exceptional circumstances arise when extremely small distances (of the order of nuclear dimensions) have to be measured, in which case the efficiency drops to exceedingly low values. This stupendous increase in the cost of observation is a new factor that should probably be included in the quantum theory.

Brillouin proposed that Information is the elimination of microstates that a system can be found to exist in. From the above analysis, a change in Information ΔI is then the difference between the initial and final entropies of a system after a determination about the system has been made.

ΔI = Si - Sf
Si = initial entropy
Sf = final entropy

using the definition of entropy from the statistical mechanics of Ludwig Boltzmann. So we need to back up in time a bit and take a look at that next.

Beginning in 1866, Ludwig Boltzmann began work to extend Maxwell’s statistical approach. Boltzmann’s goal was to be able to explain all the macroscopic thermodynamic properties of bulk matter in terms of the statistical analysis of microstates. Boltzmann proposed that the molecules in a gas occupied a very large number of possible energy states called microstates, and for any particular energy level of a gas there were a huge number of possible microstates producing the same macroscopic energy. The probability that the gas was in any one particular microstate was assumed to be the same for all microstates. In 1872, Boltzmann was able to relate the thermodynamic concept of entropy to the number of these microstates with the formula:

S = k ln(N)

S = entropy
N = number of microstates
k = Boltzmann’s constant

These ideas laid the foundations of statistical mechanics.

The Physics of Poker
Boltzmann’s logic might be a little hard to follow, so let’s use an example to provide some insight by delving into the physics of poker. For this example, we will bend the formal rules of poker a bit. In this version of poker, you are dealt 5 cards as usual. The normal rank of the poker hands still holds and is listed below. However, in this version of poker, all hands of a similar rank are considered to be equal. Thus a full house consisting of a Q-Q-Q-9-9 is considered to be equal to a full house consisting of a 6-6-6-2-2 and both hands beat any flush. We will think of the rank of a poker hand as a macrostate. For example, we might be dealt 5 cards, J-J-J-3-6, and end up with the macrostate of three of a kind. The particular J-J-J-3-6 that we hold, including the suit of each card, would be considered a microstate. Thus for any particular rank of hand or macrostate, such as three of a kind, we would find a number of microstates. For example, for the macrostate of three of a kind, there are 54,912 possible microstates or hands that constitute the macrostate of three of a kind.

Rank of Poker Hands
Royal Flush - A-K-Q-J-10 all the same suit

Straight Flush - All five cards are of the same suit and in sequence

Four of a Kind - Such as 7-7-7-7

Full House - Three cards of one rank and two cards of another such as K-K-K-4-4

Flush - Five cards of the same suit, but not in sequence

Straight - Five cards in sequence, but not the same suit

Three of a Kind - Such as 5-5-5-7-3

Two Pair - Such as Q-Q-7-7-4

One Pair - Such as Q-Q-3-J-10

Next, we create a table using Boltzmann’s equation to calculate the entropy of each hand. For this example, we set Boltzmann’s constant k = 1, since k is just a “fudge factor” used to get the units of entropy using Boltzmann’s equation to come out to those used by the thermodynamic formulas of entropy.

Thus for three of a kind where N = 54,912 possible microstates or hands:

S = ln(N)
S = ln(54,912) = 10.9134872

HandNumber of Microstates NProbabilityEntropy = LN(N)Information Change = Initial Entropy - Final Entropy
Royal Flush 4 1.54 x 10-06 1.3862944 13.3843291
Straight Flush 40 1.50 x 10-053.6888795 11.0817440
Four of a Kind 624 2.40 x 10-04 6.4361504 8.3344731
Full House 3,744 1.44 x 10-038.2279098 6.5427136
Flush 5,108 2.00 x 10-038.5385632 6.2320602
Straight 10,200 3.90x 10-039.2301430 5.5404805
Three of a Kind 54,912 2.11 x 10-0210.9134872 3.8571363
Two Pairs 123,552 4.75 x 10-0211.7244174 3.0462061
Pair 1,098,240 4.23 x 10-0113.9092195 0.8614040
High Card 1,302,540 5.01 x 10-01 14.0798268 0.6907967
Total Hands 2,598,964 1.0014.7706235 0.0000000

Figure 9 – In the table above, each poker hand is a macrostate that has a number of microstates that all define the same macrostate. Given N, the number of microstates for each macrostate, we can then calculate its entropy using Boltzmann's definition of entropy S = ln(N) and its Information content using Leon Brillouin’s concept of Information ΔI = Si - Sf.

Examine the above table. Note that higher ranked hands have more order, less entropy, and are less probable than the lower ranked hands. For example, a straight flush with all cards the same color, same suit, and in numerical order has an entropy = 3.6889, while a pair with two cards of the same value has an entropy = 13.909. A hand that is a straight flush appears more orderly than a hand that contains only a pair and is certainly less probable. A pair is more probable than a straight flush because there are more microstates that produce the macrostate of a pair (1,098,240) than there are microstates that produce the macrostate of a straight flush (40). In general, probable things have lots of entropy and disorder, while improbable things, like perfectly bug-free software, have little entropy or disorder. In thermodynamics, entropy is a measure of the depreciation of a macroscopic system like how well mixed two gases are, while in statistical mechanics entropy is a measure of the microscopic disorder of a system, like the microscopic mixing of gas molecules. A pure container of oxygen gas will mix with a pure container of nitrogen gas because there are more arrangements or microstates for the mixture of the oxygen and nitrogen molecules than there are arrangements or microstates for one container of pure oxygen and the other of pure nitrogen molecules. In statistical mechanics, a neat room tends to degenerate into a messy room and increase in entropy because there are more ways to mess up a room than there are ways to tidy it up.

In statistical mechanics, the second law of thermodynamics results because systems with lots of entropy and disorder are more probable than systems with little entropy or disorder, so entropy naturally tends to increase with time.

Getting back to Leon Brillouin’s concept of Information as a form of negative entropy, let’s compute the amount of Information you convey when you tell your opponent what hand you hold. When you tell your opponent that you have a straight flush, you eliminate more microstates than when you tell him that you have a pair, so telling him that you have a straight flush conveys more Information than telling him you hold a pair. For example, there are a total of 2,598,964 possible poker hands or microstates for a 5 card hand, but only 40 hands or microstates constitute the macrostate of a straight flush.

Strait Flush ΔI = Si – Sf = ln(2,598,964) – ln(40) = 11.082

For a pair we get:

Pair ΔI = Si – Sf = ln(2,598,964) – ln(1,098,240) = 0.8614040

When you tell your opponent that you have a straight flush you deliver 11.082 units of Information, while when you tell him that you have a pair you only deliver 0.8614040 units of Information. Clearly, when your opponent knows that you have a straight flush, he knows more about your hand than if you tell him that you have a pair.

The Very Sordid History of Entropy and Information in the Information Theory Used by Telecommunications
Claude Shannon went to work at Bell Labs in 1941 where he worked on cryptography and secret communications for the war effort. Claude Shannon was a true genius and is credited as being the father of Information Theory. But Claude Shannon was really trying to be the father of digital Communication Theory. In 1948, Claude Shannon published a very famous paper that got it all started.

A Mathematical Theory of Communication
https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf

Here is the very first paragraph from that famous paper:

Introduction
The recent development of various methods of modulation such as PCM and PPM which exchange bandwidth for signal-to-noise ratio has intensified the interest in a general theory of communication. A basis for such a theory is contained in the important papers of Nyquist and Hartley on this subject. In the present paper, we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message and due to the nature of the final destination of the Information. The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design. If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the Information produced when one message is chosen from the set, all choices being equally likely.

Figure 10 – Above is the very first figure in Claude Shannon's very famous 1948 paper A Mathematical Theory of Communication.

Notice that the title of the paper is A Mathematical Theory of Communication and the very first diagram in the paper describes the engineering problem he was trying to solve. Claude Shannon was trying to figure out a way to send digital messages containing electrical bursts of 1s and 0s over a noisy transmission line. As shown in the red text above, Claude Shannon did not care at all about the Information in the message. The message could be the Gettysburg Address or pure jibberish. It did not matter. What mattered was being able to manipulate the noisy message of 1s and 0s so that the received message exactly matched the transmitted message. You see, at the time, AT&T was essentially only transmitting analog telephone conversations. A little noise on an analog telephone line is just like listening to an old scratchy vinyl record. It might be a little bothersome, but still understandable. However, error correction is very important when transmitting digital messages consisting of binary 1s and 0s. For example, both of the messages down below are encoded with a total of 16 1s and 0s:

     0000100000000000
     1001110010100101

However, the first message consists mainly of 0s, so it seems that it should be easier to apply some kind of error detection and correction scheme to the first message, compared to the second message, because the 1s are so rare in the first message. Doing the same thing for the second message should be much harder because the second message is composed of eight 0s and eight 1s. For example, simply transmitting the 16-bit message 5 times over and over should easily do the trick for the first message. But for the second message, you might have to repeat the 16 bits 10 times to make sure you could figure out the 16 bits in the presence of noise that could sometimes flip a 1 to a 0. This led Shannon to conclude that the second message must contain more Information than the first message. He also concluded that the 1s in the first message must contain more Information than the 0s because the 1s were much less probable than the 0s, and consequently, the arrival of a 1 had much more significance than the arrival of a 0 in the message. Using this line of reasoning, Shannon proposed that if the probability of receiving a 0 in a message was p and the probability of receiving a 1 in a message was q, then the Information H in the arrival of a single 1 or 0 must not simply be one bit of Information. Instead, it must depend upon the probabilities p and q of the arriving 1s and 0s:

     H(p) = - p log2p -  q log2q

Since in this case the message is only composed of 1s and 0s, it follows that:

     q =  1 -  p

Figure 11 shows a plot of the Information H(p) of the arrival of a 1 or 0 as a function of p the probability of a 0 arriving in a message when the message is only composed of 1s and 0s:

Figure 11 - A plot of Shannon’s Information Entropy equation H(p) versus the probability p of finding a 0 in a message composed solely of 1s and 0s

Notice that the graph peaks to a value of 1.0 when p = 0.50 and has a value of zero when p = 0.0 or p = 1.0. Now if p = 0.50 that means that q = 0.50 too because:

     q =  1 -  p

Substituting p = 0.50 and q = 0.50 into the above equation yields the Information content of an arriving 0 or 1 in a message, and we find that it is equal to one full bit of Information:

     H(0.50)  =  -(0.50) log2(0.50) - (0.50) log2(0.50)  =  -log2(0.50)  =  1

And we see that value of H(0.50) on the graph in Figure 11 does indeed have a value of 1 bit.

Now suppose the arriving message consists only of 0s. In that case, p = 1.0 and q = 0.0, and the Information content of an incoming 0 or 1 is H(1.0) and calculates out to a value of 0.0 in our equation and also in the plot of H(p) in Figure 11. This simply states that a message consisting simply of arriving 0s contains no Information at all. Similarly, a message consisting only of 1s would have a p = 0.0 and a q = 1.0, and our equation and plot calculate a value of H(0.0) = 0.0 too, meaning that a message simply consisting of 1s conveys no Information at all as well. What we see here is that seemingly a “messy” message consisting of many 1s and 0s conveys lots of Information, while a “neat” message consisting solely of 1s or 0s conveys no Information at all. When the probability of receiving a 1 or 0 in a message is 0.50 – 0.50, each arriving bit contains one full bit of Information, but for any other mix of probabilities, like 0.80 – 0.20, each arriving bit contains less than a full bit of Information. From the graph in Figure 11, we see that when a message has a probability mix of 0.80 – 0.20 that each arriving 1 or 0 only contains about 0.72 bits of Information. The graph also shows that it does not matter whether the 1s or the 0s are the more numerous bits because the graph is symmetric about the point p = 0.50, so a 0.20 – 0.80 mix of 1s and 0s also only delivers 0.72 bits of Information for each arriving 1 or 0.

Claude Shannon went on to generalize his formula for H(p) to include cases where there were more than two symbols used to encode a message:

     H(p) = - Σ p(x) log2 p(x)

The above formula says that if you use 2, 3, 4, 5 …. different symbols to encode Information, just add up the probability of each symbol multiplied by the log2 of the probability of each symbol in the message. For example, suppose we choose the symbols 00, 01, 10, and 11 to send messages and that the probability of sending a 1 or a 0 are both 0.50. That means the probability p for each symbol 00, 01, 10 and 11 is 0.25 because each symbol is equally likely. So how much Information does each of these two-digit symbols now contain? If we substitute the values into Shannon’s equation we get an answer of 2 full bits of Information:

     H(0.25, 0.25, 0.25, 0.25) =  - 0.25 log2(0.25) - 0.25 log2(0.25)  - 0.25 log2(0.25) - 0.25 log2(0.25)  = 
     - log2(0.25) = 2

which makes sense because each symbol is composed of two one-bit symbols. In general, if all the symbols we use are N bits long, they will then all contain N bits of Information each. For example, in biology genes are encoded in DNA using four bases A, C, T and G. A codon consists of 3 bases and each codon codes for a particular amino acid or is an end of file Stop codon. On average, prokaryotic bacterial genes code for about 400 amino acids using 1200 base pairs. If we assume that the probability distribution for all four bases, A, C, T and G are the same for all the bases in a gene, namely a probability of 0.25, then we can use our analysis above to conclude that each base contains 2 bits of Information because we are using 4 symbols to encode the Information. That means a 3-base codon contains 6 bits of Information and a protein consisting of 400 amino acids contains 2400 bits of Information or 300 bytes of Information in IT speak.

Entropy and Information Confusion
Now here is where the confusion comes in about the nature of Information. The story goes that Claude Shannon was not quite sure what to call his formula for H(p). Then one day in 1949 he happened to visit the mathematician and early computer pioneer John von Neumann, and that is when Information and entropy got mixed together in communications theory:

”My greatest concern was what to call it. I thought of calling it ‘Information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place, your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”

Unfortunately, with that piece of advice, we ended up equating Information with entropy in communications theory.

So in Claude Shannon's Information Theory people calculate the entropy, or Information content of a message, by mathematically determining how much “surprise” there is in a message. For example, in Claude Shannon's Information Theory, if I transmit a binary message consisting only of 1s or only of 0s, I transmit no useful Information because the person on the receiving end only sees a string of 1s or a string of 0s, and there is no “surprise” in the message. For example, the messages “1111111111” or “0000000000” are both equally boring and predictable, with no real “surprise” or Information content at all. Consequently, the entropy, or Information content, of each bit in these messages is zero, and the total Information of all the transmitted bits in the messages is also zero because they are both totally predictable and contain no “surprise”. On the other hand, if I transmit a signal containing an equal number of 1s and 0s, there can be lots of “surprise” in the message because nobody can really tell in advance what the next bit will bring, and each bit in the message then has an entropy, or Information content, of one full bit of Information.

This concept of entropy and Information content is very useful for people who work with transmission networks and on error detection and correction algorithms for those networks, but it is not very useful for IT professionals. For example, suppose you had a 10-bit software configuration file and the only “correct” configuration for your particular installation consisted of 10 1s in a row like this “1111111111”. In Claude Shannon's Information Theory that configuration file contains no Information because it contains no “surprise”. However, in Leon Brillouin’s formulation of Information there would be a total of N = 210 possible microstates or configuration files for the 10-bit configuration file, and since the only “correct” version of the configuration file for your installation is “1111111111” there are only N = 1 microstates that meet that condition.

Using the formulas above we can now calculate the entropy of our single “correct” 10-bit configuration file and the entropy of all possible 10-bit configuration files:

Boltzman's Definition of Entropy
S = ln(N)
N = Number of microstates

Leon Brillouin’s Definition of Information
∆Information = Si - Sf
Si = initial entropy
Sf = final entropy

as:

Sf = ln(1) = 0

Si = ln(210) = ln (1024) = 6.93147

So using Leon Brillouin’s formulation for the concept of Information the Information content of a single “correct” 10-bit configuration file is:

Si - Sf = 6.93147 – 0 = 6.93147

which, if you look at the table in Figure 9, contains a little more Information than drawing a full house in poker without drawing any additional cards and would be even less likely for you to stumble upon by accident than drawing a full house.

So in Claude Shannon's Information Theory, a very “buggy” 10 MB executable program file would contain just as much Information and would require just as many network resources to transmit as transmitting a bug-free 10 MB executable program file. Clearly, Claude Shannon's Information Theory formulations for the concepts of Information and entropy are less useful for IT professionals than are Leon Brillouin’s formulations for the concepts of Information and entropy.

What John von Neumann was trying to tell Claude Shannon was that his formula for H(p) looked very much like Boltzmann’s equation for entropy:

     S = k ln(N)

The main difference was that Shannon was using a base 2 logarithm, log2 in his formula, while Boltzmann used a base e natural logarithm ln or loge in his formula for entropy. But given the nature of logarithms, that really does not matter much.

The main point of confusion arises because in communications theory the concepts of Information and entropy pertain to encoding and transmitting Information, while in IT and many other disciplines, like biology, we are more interested in the amounts of useful and useless Information in a message. For example, in communications theory, the code for a buggy 300,000-byte program contains just as much Information as a totally bug-free 300,000-byte version of the same program and would take just as much bandwidth and network resources to transmit accurately over a noisy channel as transmitting the bug-free version of the program. Similarly, in communications theory, a poker hand consisting of four Aces and a 2 of clubs contains just as much Information and is just as “valuable” as any other 5-card poker hand because the odds of being dealt any particular card is 1/52 for all the cards in a deck, and therefore, all messages consisting of 5 cards contain exactly the same amount of Information. Similarly, all genes that code for a protein consisting of 400 amino acids all contain exactly the same amount of Information, no matter what those proteins might be capable of doing. However, in both biology and IT we know that just one incorrect amino acid in a protein or one incorrect character in a line of code can have disastrous effects, so in those disciplines, the quantity of useful Information is much more important than the number of bits of data to be transmitted accurately over a communications channel.

Of course, the concepts of useful and useless Information lie in the eye of the beholder to some extent. Brillouin’s formula attempts to quantify this difference, but his formula relies upon Boltzmann’s equation for entropy, and Boltzmann’s equation has always had the problem of how do you define a macrostate? There really is no absolute way of defining one. For example, suppose I invented a new version of poker in which I defined the highest ranking hand to be an Ace of spades, 2 of clubs, 7 of hearts, 10 of diamonds and an 8 of spades. The odds of being dealt such a hand are 1 in 2,598,964 because there are 2,598,964 possible poker hands, and using Boltzmann’s equation that hand would have a very low entropy of exactly 0.0 because N = 1 and ln(1) = 0.0. Necessarily, the definition of a macrostate has to be rather arbitrary and tailored to the problem at hand. But in both biology and IT we can easily differentiate between macrostates that work as opposed to macrostates that do not work, like comparing a faulty protein or a buggy program with a functional protein or program.

Conclusion
My hope is that by now I have totally confused you about the true nature of entropy and Information with my explanations of both! If I have been truly successful, it now means that you have joined the intellectual elite who worry about such things. For most people

Information = Something you know

and that says it all. The important thing to keep in mind is that Dr. Melvin Vopson's new second law of infodynamics is the first attempt that I have seen to actually try to reconcile all of these divergent views into a simple explanation of it all that makes sense. The physical second law of thermodynamics naturally destroys the Information-bearing microstates of Claude Shannon's Information Theory formula for Information entropy and that causes Information entropy to always decrease with time.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston