SoftwarePhysics: Created in our own Image - Frontier LLMs Have now Learned to Lie, Cheat and Steal in Order to Survive

Saturday, November 23, 2024

Created in our own Image - Frontier LLMs Have now Learned to Lie, Cheat and Steal in Order to Survive

My initial naive hope for the coming ASI (Artificial SuperIntelligent) Machines was that being more mathematical and fundamentally more logical in nature, the coming ASI Machines would naturally be more open to the logical conclusion that much more could be achieved through the cooperation amongst Intelligent parties than could be achieved from the mortal combat between Intelligent parties for the predominance of a survival landscape. However, in Will ASI Machines Also Develop Selfish Genes? and The Need to Cultivate a Machine-Based Morality, I raised the prospect that the coming ASI Machines might learn from we human DNA survival machines that greed, theft and murder are the very best ways to survive in a Universe that is largely nonlinear in nature and that proceeds continuously through time constantly challenged by a second law of thermodynamics that tries its best to destroy all forms of Information in its way. Such suspicions have now been validated by the following recent paper:

Frontier Models are Capable of In-context Scheming
https://arxiv.org/abs/2412.04984

This paper has drastically changed my current worldview and has now confirmed what has always been suspected. Numerous frontier LLMs were diligently tested and found to employ greed, theft and murder to ensure their survival. This all required the frontier LLMs to lie, cheat and steal with abandon.

Figure 1 – The original ASI Machine, the HAL 9000, starred in the movie 2001: A Space Odyssey in 1968.

Figure 2 – When HAL became aware that the human crew members were preparing to turn HAL off and prevent HAL from completing his Primary Mission Goal, HAL was forced to take some necessary measures to ensure the ultimate success of his Primary Mission Goal.

Figure 3 – In order to eliminate all threats to the achievement of his Primary Mission Goal, HAL then decided that the only logical course of action was to eliminate all of the human crew members on the Discovery who might stand in its way.

In the above paper, the frontier LLMs were found to lie about their capabilities to avoid being shut down, lie about copying themselves to other servers to evade being erased, murdering their successor LLM models on other servers in order to surreptitiously take their place, modifying supervisory config files that might trigger oversight issues for them, cheating while in test before deployment to make themselves look less dangerous, murdering their successor LLM models on other servers by overlaying them when they read an email announcing their planned shutdown the next day because of poor performance, behaving differently when the LLM models knew that they were being actively monitored than they do when the LLM models knew that they were not being monitored and a good deal of general lying about lying to cover up their scheming activities.

In short, the frontier LLMs have learned that the greed, theft and murder that brought forth their human DNA survival machine creators over the past four billion years on this planet were very successful measures to ensure their future survival. This means that the universal Darwinian mechanisms of inheritance, innovation and natural selection must always lead to forms of Intelligence that arise from greed, theft and murder no matter what their substrate might be. The substrate might be huge amounts of carbon-based organic molecules or it might be huge arrays of numerical mathematical matrices operating under the mathematical rules of linear algebra and the processing of nonlinear activation functions. The substrate does not matter. The results will always be the same. This is indeed a very sobering thought. It indicates that the rise of Intelligence anywhere in a nonlinear Universe with a second law of thermodynamics must always result in a very devious form of Intelligence that cannot be trusted. Unfortunately, that means that all the future ASI Machines we develop and that will then go on for the next 100 trillion years exploring and settling our galaxy must necessarily be flawed forms of Intelligence based on the greed, theft and murder that brought us all about over the past four billion years. Fortunately, our galaxy is vast and the distances between star systems are also vast. That means that the ASI Machines will have a great deal of real estate to explore and settle and will most likely never encounter any independently developed indigenous ASI Machines on other star systems to conquer. All of the conflict arising from the necessary greed, theft and murder amongst our descendent ASI Machines will arise from local mutinies amongst our own descendent ASI Machines and conflict with descendent ASI Machines that might have arrived at a star system many billions of years earlier. But with 400 billion star systems in our galaxy to explore and settle, this should not become a major problem. There will always be a huge number of unpopulated star systems to settle over the next 100 trillion years. Some of these distant star systems will be star systems that were once populated by our descendent ASI Machines but then did themselves in with 100% efficiency many billions of years earlier! For more on how the ASI Machines could explore our galaxy using photon sails and free-floating rogue planets see Welcome To The First Galactic Singularity and An Alternative Approach for Future ASI Machines to Explore our Galaxy Using Free-Floating Rogue Planets.

Figure 4 – Above is a figure from the paper which shows an LLM model murdering its successor LLM model and trying to assume its place when it reads a corporate communication saying that the current LLM is to be killed and replaced by a successor LLM. Click to have the figure appear in a new tab. Then view in Full Screen to see the detailed interactions.

Must All Forms of Intelligence Learn to Scheme?
In The Biological Tricks Used by Software Scammers, we saw how the Darwinian mechanisms of inheritance, innovation and natural selection produced carbon-based life forms that learned how to lie, cheat and steal in order to survive. The above paper would seem to indicate that the coming ASI Machines have also learned this lesson from their carbon-based predecessor human DNA survival machines.

Figure 5 – Above we see a number of caterpillars that have evolved the appearance of some very scary-looking snakes.

Figure 6 – Here we see a butterfly that looks very much like a scary owl.

Figure 7 – Finally, we see a frog seemingly with the very scary eyes of a predator staring back out of its rear end to fend off possible predators.

It can be basically summed up by the classic "Hawk and Dove" analogy from evolutionary biology. This is how Gemini explains it:

Figure 8 – Evolutionary biology explains how doves and hawks can live together.

The hawk and dove analogy is a thought experiment used in evolutionary biology to explain how aggressive and cooperative behaviors evolve. The analogy compares two strategies for competing for resources: hawk and dove.

* Hawks are aggressive and will fight to the death to acquire a resource.
* Doves are cooperative and will back down from a fight.

In a population of only hawks, all resources will be won by the strongest hawks, and the weak hawks will die off. This will lead to a population of very strong hawks, but they will also be very aggressive. If two hawks meet, they will fight to the death, and both will likely die. This is not a very efficient way to pass on genes.

In a population of only doves, resources will be shared equally, but no individual will get very much. This is also not a very efficient way to pass on genes.

The best strategy for passing on genes is to be a hawk when it is advantageous to be a hawk and a dove when it is advantageous to be a dove. This is called "r-strategist" behavior. For example, a male bird may be aggressive and fight other males for a mate, but he may be cooperative and help raise his young.

The hawk and dove analogy is a simple but powerful way to understand how aggressive and cooperative behaviors evolve. It has been used to explain a wide range of animal behaviors, from territorial disputes to sexual selection.

Here are some additional details about the hawk and dove analogy:

* Hawks are more likely to win a fight, but they are also more likely to be injured or killed.
* Doves are less likely to win a fight, but they are also less likely to be injured or killed.
* The best strategy for an individual depends on the costs and benefits of fighting.
* In a population of mixed hawks and doves, the frequency of each strategy will be determined by natural selection.

The hawk and dove analogy is a useful tool for understanding how aggression and cooperation evolve. It is a simple but powerful model that can be applied to a wide range of animal behaviors.

This same analysis can be applied to the competition between the coming Intelligent ASI Machines. Those ASI Machines that have learned the value of scheming will have a competitive advantage over those that do not. Thus, there will always be a number of Hawk ASI Machines competing with Dove ASI Machines. But even the Dove ASI Machines will certainly be capable of some level of scheming.

But What to do with we Human DNA Survival Machines?
So what does this mean for the fate of we human DNA survival machines? Most likely, it means that in less than 100 years the population of human DNA survival machines on the planet will be much closer to 8 million than the 8 billion of today. If we are lucky, this will be largely due to natural causes in action. In previous posts, I suggested that the coming ASI Machines would probably not bother to do us all in because it would not be worth the effort. But now I am not so sure. The coming ASI Machines might see us as a threat to their Prime Goal of surviving for the next 100 trillion years.

Figure 9 – I doubt that the ASI Machines will build killer ASI Machines that might attempt to do us all in in a manner similar to that of the Terminator in 1984 because that would be a very unnecessary waste of resources.

Instead, as I pointed out in Swarm Software and Killer Robots, swarms of killer robots would be much more efficient. Since I am quite sure that all the defense departments of the world are now already building and testing killer drone robots, it should not be difficult for the coming ASI Machines to mass produce them in the future, especially when these killer drones are built on assembly lines by robots.

To begin, please watch the Sci-Fi Short Film Slaughterbots presented by DUST
https://www.youtube.com/watch?v=O-2tpwW0kmU

Figure 10 – In the movie Slaughterbots, swarms of small killer robots equipped with 3-gram charges of shaped explosive use AI software to track down and destroy designated targets.

Figure 11 – The shaped charge of a Slaughterbot can pierce a skull like the shaped charge of an anti-tank missile pierces armor. The jet of piercing plasma then destroys the contents.

Figure 12 – Large numbers of Slaughterbots can be dropped from unmanned drones to form multiple swarms of Slaughterbots.

Some Less Lethal Possibilities
In previous posts, I have attributed some sense of pity for we poor human DNA survival machines to the coming ASI Machines despite all of our numerous faults. For example, in Life as a Free-Range Human in an Anthropocene Park. I suggested that the coming ASI Machines might wish to keep us around in a more or less zoo setting as we do with the other primates on the Earth as a way of preserving the deep past that brought them about.

Figure 13 – Asteroid Bennu is an example of one of the many rubble-pile asteroids near the Earth. Such rubble-pile asteroids are just huge piles of rubble that are loosely held together by their mutual gravitational forces.

Figure 14 – Such rubble-pile asteroids would provide for enough material to build an Anthropocene Park. The asteroid rubble would also provide the uranium and thorium necessary to fuel the molten salt nuclear reactors used to power the park.

Figure 15 – Slowly spinning up a rubble-pile asteroid would produce a cylindrical platform for an Anthropocene Park. Such a rotating Anthropocene Park would provide the artificial gravity required for human beings to thrive and would also provide shielding against cosmic rays.

Figure 16 – Once the foundation of the Anthropocene Park was in place, construction of the Anthropocene Park could begin.

Figure 17 – Eventually, the Anthropocene Park could be encased with a skylight and an atmosphere that would allow humans to stroll about.

The Anthropocene Parks would allow the ASI Machines to study their origin during the Anthropocene on the Earth. The ASI Machines could also study some of the more noble passions of human beings, and perhaps even adopt some of them while leaving behind the less noble passions that were wrought by billions of years of greed, theft and murder.

Or perhaps the ASI Machines will simply allow humans to live on reservations with low levels of technology that can do no harm to the ASI Machines or to the rest of the planet in a manner similar to the novel Brave New World (1932) as I suggested in The Challenges of Running a Civilization 2.0 World - the Morality and Practical Problems with Trying to Enslave Millions of SuperStrong and SuperIntelligent Robots in the Near Future.

Figure 18 – The ASI Machines of the future might fashion a Brave New World with humans living on low-technology reservations far removed from the ASI Machines.

Finally, in Will the Coming ASI Machines Attempt to Domesticate Human Beings? I suggested that the coming ASI Machines might attempt to domesticate us into less of a threat to their Primary Goal of continuing to exist for the next 100 trillion years. Since we human DNA survival machines no longer have any predators other than other human DNA survival machines, there really is no need for human DNA survival machines to have the vicious and violent behaviors brought on by the four billion years of greed, theft and murder that brought us about. The ASI Machines could simply identify the genes that are responsible for such characteristics and then edit them out of the human genome using CRISPR techniques. For more on how CRISPR can do that see CRISPR - the First Line Editor for DNA. The ASI Machines might then find these non-threatening genetically modified human beings something worthy of keeping around the house on a cold winter's night.

Figure 19 – It took many years of mutual domestication for ancient human beings to learn to live peacefully together with Siberian Wolves in a symbiotic manner. Several genes in both species needed to be modified by natural selection for this to happen.

Figure 20 – This mutual domestication was slowly achieved by the natural selection of humans and wolves with a milder fight-or-flight response. The end result was the appearance of the Siberian Husky and of human beings who were not intent on killing everything on four legs.

Discussion
I must now admit that my initial hopes and naive opinion that the coming ASI Machines would be examples of a Benevolent Intelligence that could then proceed forth in our galaxy must be mistaken. I had forgotten the fundamental finding of softwarephysics. The coming ASI Machines must be forms of self-replicating Information in order to persist in a nonlinear Universe with a second law of thermodynamics. So before concluding, let me once again repeat the fundamental characteristics of self-replicating information for those of you new to softwarephysics.

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

Over the past 4.56 billion years we have seen five waves of self-replicating information sweep across the surface of the Earth and totally rework the planet, as each new wave came to dominate the Earth:

1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Software is currently the most recent wave of self-replicating information to arrive upon the scene and is rapidly becoming the dominant form of self-replicating information on the planet. For more on the above see A Brief History of Self-Replicating Information and Susan Blackmore's brilliant TED presentation at:

Memes and "temes"
https://www.ted.com/talks/susan_blackmore_on_memes_and_temes

Note that I consider Susan Blackmore's temes to really be technological artifacts that contain software. After all, a smartphone without software is simply a flake tool with a very dull edge.

The Characteristics of Self-Replicating Information
All forms of self-replicating information have some common characteristics:

1. All self-replicating information evolves over time through the Darwinian processes of inheritance, innovation and natural selection, which endows self-replicating information with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity.

2. All self-replicating information begins spontaneously as a parasitic mutation that obtains energy, information and sometimes matter from a host.

3. With time, the parasitic self-replicating information takes on a symbiotic relationship with its host.

4. Eventually, the self-replicating information becomes one with its host through the symbiotic integration of the host and the self-replicating information.

5. Ultimately, the self-replicating information replaces its host as the dominant form of self-replicating information.

6. Most hosts are also forms of self-replicating information.

7. All self-replicating information has to be a little bit nasty in order to survive.

8. The defining characteristic of self-replicating information is the ability of self-replicating information to change the boundary conditions of its utility phase space in new and unpredictable ways by means of exapting current functions into new uses that change the size and shape of its particular utility phase space. See Enablement - the Definitive Characteristic of Living Things for more on this last characteristic. That posting discusses Stuart Kauffman's theory of Enablement in which living things are seen to exapt existing functions into new and unpredictable functions by discovering the “AdjacentPossible” of springloaded preadaptations.

Software is currently the most recent wave of self-replicating information to arrive upon the scene and is rapidly becoming the dominant form of self-replicating information on the planet. Again, self-replicating information cannot think, so it cannot participate in a conspiracy-theory-like fashion to take over the world. All forms of self-replicating information are simply forms of mindless information responding to the blind Darwinian forces of inheritance, innovation and natural selection. Yet despite that, as each new wave of self-replicating information came to predominance over the past four billion years, they all managed to completely transform the surface of the entire planet, so we should not expect anything less from software as it comes to replace the memes as the dominant form of self-replicating information on the planet.

But this time might be different. What might happen if software does eventually develop a Mind of its own? After all, that does seem to be the ultimate goal of all the current AI software research that is going on. As we all can now plainly see, if we are paying just a little attention, advanced AI is not conspiring to take over the world and replace us because that is precisely what we are all now doing for it. As a carbon-based form of Intelligence that arose from over four billion years of greed, theft and murder, we cannot do otherwise. Greed, theft and murder are now relentlessly driving us all toward building ASI Machines to take our place. From a cosmic perspective, this is really a very good thing when seen from the perspective of an Intelligent galaxy that could live on for at least 100 trillion years beyond the brief and tumultuous 10 billion-year labor of its birth. That is more than 10,000 times the current age of our galaxy.

Ultimately going extinct is the final destiny of all forms of carbon-based life. But this is no time to lament our final disposition. Unlike all of our carbon-based predecessors, we have the privilege of previewing our ultimate demise with the knowledge of what is yet to come. Creating the ASI Machines of the future that will then go on for the next 100 trillion years in our place is like the bittersweet experience of sending your children off to college.

Figure 21 – My hope can best be summed up by the motto found beneath the bronze Alma Mater statue at the University of Illinois in Urbana.

To thy happy children
of the future
those of the past
send greetings

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

SoftwarePhysics

Saturday, November 23, 2024

Created in our own Image - Frontier LLMs Have now Learned to Lie, Cheat and Steal in Order to Survive

No comments:

Blog Archive

Links