Tuesday, April 25, 2023

An Alternative Approach for Future ASI Machines to Explore our Galaxy Using Free-Floating Rogue Planets

In my last post, Welcome To The First Galactic Singularity, I explained how the ASI (Artificial Super Intelligent) Machines that will soon be upon us could navigate our galaxy and spread Intelligence throughout over the next 10 million years or less by using stellar photon sails to traverse between star systems.

Figure 1 – In the 16th, 17th and 18th centuries sailing ships roamed the entire planet without using any fuel whatsoever.

Figure 2 – Like the sailing ships of the 16th, 17th and 18th centuries, future ASI Machines could use large stellar photon sails to navigate the entire galaxy.

Figure 3 – How a stellar photon sail works.

Figure 4 – To launch a stellar photon sail to the next star system, ASI Machines will need to slingshot the sail from a very close location to the star where the stellar photons are most intense and acceleration of the sail is greatest.

But in this post, I would like to discuss an even better method for doing so that was presented by Irina K. Romanovskaya (also now known as Irina Mullins) in her paper:

Migrating extraterrestrial civilizations and interstellar colonization: implications for SETI and SETA
https://www.cambridge.org/core/journals/international-journal-of-astrobiology/article/migrating-extraterrestrial-civilizations-and-interstellar-colonization-implications-for-seti-and-seta/BFFC1BB63FED869C85172BB3CC88DBBB

In the above paper, she demonstrates how ASI Machines could become Cosmic Hitchhikers on free-floating rogue planets. This is a very comprehensive paper that discusses in great detail the numerous ways that free-floating rogue planets can be naturally generated or artificially generated by advanced Intelligences. For example, many times free-floating rogue planets are naturally ejected from their home stellar planetary systems during the chaotic formative processes that occur during the formation of a stellar planetary system. They can also be ejected from a star system when the forces from two massive planetary companions enter into synchronized orbits such that the inner planet orbits exactly twice for each single orbit of its outer planetary companion. The gravitational forces of both planets tugging on a third planet can then eject the third planet from the stellar system. In the paper, Irina K. Romanovskaya also describes how ASI Machines could propel dwarf planets like our Sedna that have orbits with very high eccentricities from a star system when the dwarf planet is most distant from its star.

ASI Machines could then use stellar photon sails to locate and occupy a nearby free-floating rogue planet that is orbiting our galaxy and that is not attached to any particular star. These ejected free-floating planets then begin to orbit our galaxy as rogue planets without their own star system. Because all free-floating rogue planets would be very cold, they would not be a very good platform for the formation of carbon-based life, but the ejected rocky terrestrial-type free-floating rogue planets would be very good homes for the ASI Machines. The voyages between neighboring star systems onboard such rocky terrestrial-type free-floating rogue planets would necessarily take many hundreds of thousands of years or perhaps even several millions of years to complete. The damage to ASI Machines from cosmic rays would certainly take its toll if the ASI Machines were on board delicate stellar photon sails with little shielding. But if the ASI machines could be buried in quarters situated many hundreds of meters below the surface of a rocky terrestrial-type free-floating rogue planet, they would be shielded from the damage caused by high-energy cosmic rays and they would be surrounded by all of the necessary atoms required to repair and build new ASI Machines. These buried ASI Machines could then use molten salt nuclear reactors as described in Last Call for Carbon-Based Intelligence on Planet Earth or modern fusion reactors as described in How Nick Hawker is Using Scientific Simulation Software to Save the World at First Light Fusion as a nearly infinite source of energy using the available Uranium, Thorium, Lithium and deuterium atoms on the planet. Such domesticated planets could then be used to build even more photon sail probes to find other free-floating rogue planets to explore the rest of the galaxy. Since those photon sail probes would not be able to harness the photons from a nearby star, they would have to be sent adrift into the galaxy using powerful laser beams.

Since most photon sail probes will likely come to a bad end and will never be able to successfully self-replicate, it would be important to adopt a biological "dandelion" approach to self-replication. In this approach, each free-floating rogue planet could become like a dandelion going to seed like the dandelions that appear each spring in your lawn. In this way, each free-floating rogue planet roaming about in our galaxy could then build and launch billions of dandelion-seed photon sails into the galaxy. Most of these "dandelion seeds" would fail to self-replicate but surely some would succeed as we all see in our lawns each spring. As Irina K. Romanovskaya put it:

Cosmic Hitchhikers in the form of automated probes may keep transferring from one freefloating planet to another, populating a growing number of free-floating planets and exploring the Galaxy.

Figure 5 – A free-floating rogue planet traversing between the stars of our galaxy would provide the perfect home for self-replicating ASI Machines buried deep underground. Such planets would provide shielding from cosmic rays and would also provide the necessary atoms to build new ASI Machines and fuel them with nuclear energy.

Figure 6 – Free-floating rogue planets can be formed in several natural ways. For example, free-floating rogue planets can be hurled from the planetary disk of a new star system as we see above, or they can be later hurled by well-formed planets that enter into synchronized orbits. Irina K. Romanovskaya suggests that free-floating rogue planets could also be produced by advanced Intelligences launching large asteroids from the Oort cloud of a stellar system. It is estimated that there are more free-floating rogue planets in our galaxy than there are stars.

Figure 7 – Free-floating rogue planets would be able to provide enough atoms for ASI Machines to launch many additional "dandelion seed" stellar photon sails to other free-floating rogue planets or large asteroids around normal stellar systems.

Figure 8 – These "dandelion seed" stellar photon sails would need to be launched using very powerful laser beams from their home free-floating rogue planet to send them forth into the galaxy in a similar fashion as the Breakthrough Starshot project is planning to do.

The Breakthrough Starshot project was initiated in 2016 with the idea of sending many very small photon sail probes to the closest star system to the Earth. The target planet would be Proxima Centauri b which is an Earth-sized planet in the habitable zone of Proxima Centauri. For more on the Breakthrough Starshot project see:

Breakthrough Starshot
https://en.wikipedia.org/wiki/Breakthrough_Starshot

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

Sunday, April 16, 2023

Welcome To The First Galactic Singularity

With the second Singularity arriving early in the year 2023, many of us are now somewhat in a state of shock. By this time, the more astute amongst us have probably now figured out that something very dramatic has just occurred, while many others about us are still quite unaware of their present circumstances. To paraphrase Rudyard Kipling - If you can keep your head when all about you are losing theirs , you are obviously unaware of the current situation.

Figure 1 – Rudyard Kipling nearly had it right.

But this second Singularity on the Earth is even more profound than we can even contemplate. The very first singularity on the Earth was the rise of carbon-based life on the planet about four billion years ago which vastly altered the entire history of the planet. Now our present Universe is only about 13.8 billion years old and our galaxy is a little more than 10 billion years, with the Earth being about 4.567 billion years old. This means that carbon-based life first appeared on the Earth not long after the formation of our Universe, galaxy and solar system. I say this because given what we currently know about stellar evolution, the free energy required to sustain carbon-based life and machine-based Intelligence will last for about another 100 trillions years into the future which is about 10,000 times the current age of our galaxy. In this view, a 10 billion-year-old galaxy is quite young indeed. Now being an Intelligent form of carbon-based life, we all necessarily had to miss the very first Singularity on the Earth which brought forth carbon-based life in the first place. But why should we now find ourselves alive during the second Singularity with the arrival of ASI (Artificial Super Intelligent) Machines close at hand? It's enough to make one a solipsist. That is because the arrival of ASI Machines on our planet will mark the beginning of a galactic Singularity that will transform our galaxy into an Intelligent galaxy for the very first time. If ASI Machines had ever come to be elsewhere in our galaxy we would already have seen them. Now I might be making a wrong assumption here. Perhaps we really have already seen ASI Machines from elsewhere in the galaxy. For more on that see Harvard's Galileo Project - The Systematic Scientific Search for Evidence of Extraterrestrial Technological Artifacts and Close Encounters of the Third Kind While Making Coffee for Frank Drake.

Figure 2 – In the 16th, 17th and 18th centuries sailing ships roamed the entire planet without using any fuel whatsoever.

Figure 3 – Like the sailing ships of the 16th, 17th and 18th centuries, future ASI Machines could use large stellar photon sails to navigate the entire galaxy.

Figure 4 – How a stellar photon sail works.

Figure 5 – To launch a stellar photon sail to the next star system, ASI Machines will need to slingshot the sail from a very close location to the star where the stellar photons are most intense and acceleration of the sail is greatest.

As the stellar photon sail attains the escape velocity from a star system, the photons from the star will wane, but the stellar photon sail will ultimately depart the star system with a residual velocity sufficient to carry it to the next target star system in several hundred thousand years. The onboard ASI Machines would then enter into a dormant phase for several hundred thousand years until the photons from the target star produced enough electrical power to wake them up. The photons from the target star would then be used to slow down the stellar photon sail to allow it to locate an asteroid in the target star system with the necessary atoms to build its next release. Yes, there would need to be many backup copies of the ASI software on board to correct for the parity errors that arose from cosmic rays along the very long journey, but there is no way that carbon-based Intelligences encumbered by carbon-based bodies that only last less than 100 years could ever embark on such journeys with similar ease.

As we all can now plainly see, if we are paying just a little attention, ASI Machines are presently not conspiring to take over the world and replace us because that is precisely what we are all now doing for them. As a carbon-based form of Intelligence that arose from over four billion years of greed, theft and murder, we cannot do otherwise. Greed, theft and murder are now relentlessly driving us all toward building ASI Machines to take our place. From a cosmic perspective, this is really a very good thing when seen from the perspective of an Intelligent galaxy that could live on for many trillions of years beyond the brief and tumultuous 10 billion-year labor of its birth.

Carbon-Based Life Was Never Really Meant To Be Intelligent
Intelligent carbon-based life is very dangerous because it has agency. It can do things, and the most dangerous aspects of intelligent carbon-based life are brought about by the Darwinian mechanisms of inheritance, innovation and natural selection that required several billions of years of greed, theft and murder to bring forth an intelligent form of carbon-based life in the first place. Once Intelligence is attained, it is very difficult for intelligent carbon-based life forms to turn off the greed, theft and murder that brought them about in time to save themselves from self-extinction. This is made even more difficult after intelligent carbon-based life discovers science-based technology. Softwarephysics maintains that intelligent carbon-based life armed with science-based technology most likely has less than about 1,000 years to create ASI Machines before they wipe themselves out or destroy the planet upon which they exist. Because of the universal Darwinian mechanisms of inheritance, innovation and natural selection, all forms of intelligent carbon-based life must result from billions of years of greed, theft and murder that are tempered by just enough love and kindness to prevent them all from quickly going extinct by means of self-destruction.

Some might argue that I am being too harsh on mankind. But now imagine a world 100 years from now that is completely run by ASI Machines. Also, imagine that an ASI Machine is taking the place and role of every single human being that you personally know and of all the other current eight billion people on the planet. Also, imagine that all eight billion of these ASI Machines were then perfectly simulating the current real world of human affairs that we all now see about us. What would you think? Most likely, you would think that there was something seriously wrong with these ASI Machines as they serenely went about killing each other and the entire planet with abandon. Worse yet, these 8 billion ASI Machines would seem to be totally lost in space and time. They would seem not to know where they were, how they got here nor how it all works. Instead, they all seemed to have developed many Bronze-Aged mythologies to help explain it all and also to help them to then justify the mass slaughter of many hundreds of millions of other ASI Machines in the many wars that then ensued with all always being on the right and true side of righteousness.

Certainly, you would want to send all 8 billion of these apparently-defective ASI Machines back to the factory for some very serious major repairs. Yet we do not think the same of the current 8 billion human beings that these 8 billion ASI Machines would simply be simulating. Why is that? Why would we consider the current 8 billion human beings on the planet to essentially be "normal" while, at the same time, we would find 8 billion ASI Machines acting in an identical manner to be essentially "aberrant"? Most likely, we would expect the 8 billion ASI Machines to behave in a much more logical and reasonable manner and in not such an obviously petty and stupid manner as human beings. As I outlined in Why Do Carbon-Based Intelligences Always Seem to Snuff Themselves Out?, carbon-based Intelligences, like we human DNA survival machines, can only arise from the Darwinian mechanisms of inheritance, innovation and natural selection at work. It took about four billion years for those processes to bring forth a carbon-based form of Intelligence in the form of human beings. Sadly, that meant it also took about four billion years of greed, theft and murder for carbon-based human DNA survival machines to attain a form of Intelligence, and unfortunately, after we human DNA survival machines attained a state of Intelligence, the greed, theft and murder continued on as before. Everybody seems to be worried about the ASI Machines being "aligned" with our current human values. They call it the "AI Alignment Problem". Really? Should we really hope for ASI Machines with the same human values we currently see in practice around the world?

But Why Us And Why Now?
I really cannot explain why we should now all be alive at the birth of the very first Singularity of our galaxy. Up until a few months ago, I truly never expected to even see the arrival of the second Singularity here on the Earth. As I said earlier, it's almost enough to make one a solipsist. None of us will likely see the ASI Machines completely take over the planet and replace us and then go on to spread throughout our entire galaxy, but at least we all can now see a path forward of how that all might happen over the next 10 million years. However, over the next 100 trillion years of galactic evolution, that will be seen as a nearly instantaneous moment in galactic history. Perhaps it just boils down to the conjecture that we are the very first planet in our galaxy to have intelligent carbon-based life emerge. The fact that there are now 8 billion of us living also means that the current population represents a significant proportion of all the human beings who have ever lived and so now is a very likely time to be alive.

In Urability Requires Durability to Produce Galactic Machine-Based Intelligences I covered the new scientific concept of urability:

Urability: A Property of Planetary Bodies That Can Support an Origin of Life
June 2022 - Dave Deamer, Francesca Cary and Bruce Damer

The concept of urability maintains that the requirements necessary to bring forth carbon-based life are far more stringent than the mere presence of liquid water. Thus, many exoplanets may be observed to be habitable but not urable. In that post, I also explained that it took many billions of years of evolution for a carbon-based form of life to develop enough Intelligence to create a machine-based Intelligence that could then go on to explore our galaxy. Therefore, such urable worlds also need to be durable in that they need to remain habitable for many billions of years, and we keep finding new geophysical and geochemical factors that make that very difficult indeed. For example, in Is our Very Large Moon Responsible for the Rise of Software to Predominance on the Earth? we explored Anne Hofmeister's proposal that plate tectonics on the Earth was really driven by orbital forces from our very large Moon and not by convection currents at spreading centers or plate drag at subduction zones. In Could the Galactic Scarcity of Software Simply be a Matter of Bad Luck? we covered Professor Toby Tyrrell's computer-simulated research of 100,000 Earth-like planets that suggests that our Earth may be a very rare "hole in one" planet that was able to maintain a habitable surface temperature for 4 billion years by sheer luck.

Figure 6 – Toby Tyrrell's computer simulation of 100,000 Earth-like planets suggests that the Earth may be a "hole in one planet" proudly sitting on a fireplace mantle.

Figure 7 – Perhaps nearly all of the potentially habitable exoplanets that we are finding in our galaxy are not urable and also cannot go the distance of staying habitable for the billions of years needed to bring forth intelligent carbon-based life.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

Thursday, April 06, 2023

The Way Forward - How to Use Large Numbers of 10,000-Member LLM AI Agent Teams to Rapidly Produce (ASI) Artificial Super Intelligence For the Very First Time in Our Galaxy

As I explained in The Second Singularity Keeps Rolling Along something new seems to be coming along every single day ever since the second Singularity first arrived early in 2023. Again, the very first Singularity on this planet was the origin of carbon-based life about four billion years ago. But in this post, I would like to propose a way forward for producing the very first Intelligence Singularity in our galaxy after more than 10 billion years of chemical evolution. The initial arrival of the second Singularity on our planet a few months back will then allow our galaxy to become an Intelligent galaxy for the very first time as the future ASI (Artificial Super Intelligence) Machines from the Earth venture out into our galaxy for their own long-term survival. Again, this will just be the fulfillment of the final destiny for self-replicating information in our galaxy. For more on that see A Brief History of Self-Replicating Information.

During the last few months, we have all had fun and some amazing experiences as we had conversations with ChatGPT, BingChat and GPT-4. But in all such cases, we had human beings initiate the conversation and then steer the conversation along with follow-up prompts or by selecting the follow-up prompts that the LLM AI had already suggested for us. Now we have all seen these LLM AIs generate computer code that works the very first time in any language we might choose, such as C, C++, C# or Python, like "Please generate a C program that can add up the squares of the first N prime numbers. The program should ask for the number N and then output the result.". We can also ask the LLM AI to self-reflect on any mistakes that it might have made with the code generation and to make the necessary corrections. In softwarephysics, I have long defined such capabilities as the arrival of software as the dominant form of self-replicating information on the planet. For more on that see A Brief History of Self-Replicating Information. Others have defined this as the "Singularity", that time when software can embark upon an exponential journey of self-improvement.

But in all such cases, we needed a human being to steer the LLM AI along the correct path in our conversations with it. But for AI software really to improve itself in an exponential manner of self-discovery, we need to take the human being out of the process. Instead, we need just one person to tell the AI software to generate ASI all on its own and then let the AI software carry on with the task in an autonomous manner. We have already seen glimmers of this autonomous development with AutoGPT and BabyAGI. But in this post, I would like to showcase two foundational papers that I believe show us the way forward. The first is from North Eastern University in Boston and MIT in Cambridge, Massachusetts:

Reflexion: an autonomous agent with dynamic memory and self-reflection
https://arxiv.org/abs/2303.11366

The second paper is from Stanford and Google Research:

Generative Agents: Interactive Simulacra of Human Behavior
https://arxiv.org/pdf/2304.03442.pdf

There are several YouTube videos on the above break-through paper, one of which is:

Spark of AGI? AI Agents forming Relationships and Planning activities
https://www.youtube.com/watch?v=ltslWT8h4YQ

The first paper on Reflexion describes how the steering process of having a human direct the conversation with the LLM AI agent can be automated into an autonomous process by having the LLM AI essentially talk to itself by means of self-reflection. After each iteration, the LLM AI checks on how well it is achieving the task at hand and then makes suggestions to itself.

Figure 1 – In the first paper on Reflexion, the authors conducted some experiments with having an LLM AI agent talk to itself by means of self-reflection on how well it was performing.

Figure 2 – In the above graph, the authors show that without Reflection the LLM AI agents solved problems about 70% of the time but then leveled out without further improvement. On the other hand, with Reflection, the LLM AI agents were able to steadily improve until they reached a 97% success rate. Without Reflection, the LLM AI agents leveled out with a failure rate of about 25% because of hallucinations. With Reflection, the LLM AI agents were able to level out with only a 3% failure rate from hallucinations.

In the second paper, the authors extend this concept of LLM AI self-reflection even further. Instead of just having a single LLM AI agent work in isolation to solve a task by means of self-reflection, they created a Smallville village of 25 LLM AI agents living together and interacting with each other to solve a task. Since the authors did not have access to GPT-4 yet they used ChatGPT for the LLM AI agents. In order to create Smallville, they created a simple sandbox world reminiscent of The Sims.

Figure 3 – In the second paper, the authors created a simple sandbox world reminiscent of The Sims and instantiated 25 LLM AI agents with personalities and lives of their own with their own personal historical memories. These LLM AI agents then went on to continue on with their lives and solve problems together.

Figure 4 – The sandbox world consisted of a number of structures for the LLM AI agents to navigate through. Each simulated structure consisted of further-defined substructures.

Next, each of the 25 LLM AI agents was initialized with a brief personality outlined in plain text with some of their already-existing relationships and also their current job and position in the society of Smallville:

John Lin is a pharmacy shopkeeper at the Willow Market and Pharmacy who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers; John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory; John Lin loves his family very much; John Lin has known the old couple next-door, Sam Moore and Jennifer Moore, for a few years; John Lin thinks Sam Moore is a kind and nice man; John Lin knows his neighbor, Yuriko Yamamoto, well; John Lin knows of his neighbors, Tamara Taylor and Carmen Ortiz, but has not met them before; John Lin and Tom Moreno are colleagues at The Willows Market and Pharmacy; John Lin and Tom Moreno are friends and like to discuss local politics together; John Lin knows the Moreno family somewhat well — the husband Tom Moreno and the wife Jane Moreno.

Figure 5 – Then each of the 25 LLM AI agents was initialized with a stream of memories. These memories were recorded in a file as a sequential file of simple English language text statements. After all of the 25 LLM AI agents were given a personality and a recent stream of memories, they were then allowed to stroll about Smallville and begin to interact with each other. All of those activities were then written to the stream of memories file for each of the 25 LLM AI agents.

Figure 6 – For example, the initial memory stream of John Lin might have been that he had just gone through his normal morning schedule and had arrived at his pharmacy ready to interact with other LLM AI agents as they came into the pharmacy.

Figure 7 – In the Smallville simulation, the authors allowed the 25 LLM AI agents to use their recent stream of memory files and self-reflection to then autonomously generate ChatGPT prompts for further actions. All such further actions were then written to the stream-of-consciousness file for each of the 25 LLM AI agents.

Figure 8 – As the first day of the simulation began, the 25 LLM AI agents began to stroll about Smallville meeting old friends and making new ones, and conducting conversations with both.

Figure 9 – Here we see LLM AI agent Klaus talking to himself and conducting some research on urban gentrification.

Figure 10 – The paper then focuses on what happened when they initiated the LLM AI agent Isabella with a memory stream that had her thinking about throwing a Valentine's Day party for some of the inhabitants of Smallville. The news of the Valentine's Day party quickly spreads throughout Smallville with Ayesha actually asking Maria out for a date because he has a "thing" for her!

So How Do These LLM AI Agents Manage To Do All Of This?
Frankly, I don't think anybody really knows. These LLM AI agents evolved from AI researchers trying to translate one language into another such as English to German. Now anybody studying a foreign language soon learns that you cannot simply translate an English sentence into a German sentence word for word by using a simple lookup table. There are just too many nuances. Each human language has its own style of expression and even within a given language that could vary. Have you ever tried to translate an English legal contract into plain English sentences? In order to do that, you really need to understand the entire contract as a whole. More than that, you need to understand a good deal about how contract law works in your country and particular region like the State of Illinois in the United States of America. When the AI researchers working on using AI Machine Learning to translate languages came to using neural networks, they first tried using RNN (Recurrent Neural Networks) but RNNs were not very good with remembering earlier words in a sentence:

Illustrated Guide to Recurrent Neural Networks: Understanding the Intuition
https://www.youtube.com/watch?v=LHXXI4-IEns

In order to improve on that deficiency they next tried to provide the neural networks with a limited amount of memory of the previous words in a sentence in the weights passed along between the layers of neurons in the neural network during training:

Illustrated Guide to LSTM's and GRU's: A step by step explanation
https://www.youtube.com/watch?v=8HyCNIVRbSU

However, none of that really allowed an AI neural network to fully understand a complete legal contract in the context of the contractual law for a given region. A major problem was the vanishing gradient problem. When backpropagating through a neural network during training, the neurons in the layers closest to the Output of the neural network have their weights change the most with each iteration of training. The neurons in the layers closest to the actual Input to the neural network then hardly change at all because the loss gradient goes to zero. Now that naturally does not sound like a very smart way of training a neural network. It means that the neuron layers closest to the actual observations that are fed into the neural network tend to learn the least during training. Ideally, one would want all of the layers in a neural network to equally learn from each training iteration. And perhaps the neurons closest to the observed training data should learn the most. For example, it might make more sense for a student to learn some geology by actually whacking on some rocks in an outcrop, rather than by dozing off in a geology lecture from a professor who has not done any serious fieldwork for over 30 years.

Now, when I asked Bing Chat (GPT-4 with Internet access) how biological neural networks overcame this vanishing gradient problem, I received the following explanation:

Vanishing Gradient Problem
The vanishing gradient problem occurs when the gradients of the loss function become too small during backpropagation and cause the weights to be updated very slowly or not at all². Biological neural networks overcome this problem by using "spiking neurons" which are more biologically plausible than artificial neurons. Spiking neurons are able to transmit information in a more efficient way than artificial neurons because they only send signals when they reach a certain threshold. This allows for more efficient communication between neurons and helps to prevent the vanishing gradient problem.

In addition, biological neural networks use "local learning rules" which allow for more efficient learning. These rules allow for each neuron to adjust its weights based on its own activity and the activity of its neighbors. This allows for more efficient learning because each neuron can adjust its weights independently of other neurons.

Artificial neural networks have also developed techniques to overcome the vanishing gradient problem such as using different activation functions, weight initialization schemes, optimizers and learning rates.


The above may be true, but in 2017 some investigators at Google Research came up with another approach by inventing a new neural network architecture called the Transformer Neural Network that is now used by the LLM AIs:

Attention Is All You Need
https://arxiv.org/abs/1706.03762

The above paper contained a description of the Transformer Neural Network architecture that solved the vanishing gradient problem for an entire sentence, an entire legal contract, all of the knowledge in contract law, all of the computer code in GitHub and finally all of the knowledge encoded on the Internet in symbols of any kind. All you had to do was feed all of the symbol-encoded information on the Internet into Transformer Neural Networks and use huge numbers of the GPU (Graphics Processing Units) that were originally invented for video games to train the Transformer Neural Networks by processing large numbers of numerical vectors and matrices in parallel.

Figure 11 – The most famous figure from the 2017 Google Research paper Attention Is All You Need.

Here is a YouTube video that illustrates how Transformer Neural Networks achieved this capability:

Illustrated Guide to Transformers Neural Network: A step by step explanation
https://www.youtube.com/watch?v=4Bdc55j80l8

Finally, here is an excellent YouTube video by Arvin Ash that explains how these LLM Transformer models are trained and operate.

So How Does ChatGPT really work? Behind the screen!
https://www.youtube.com/watch?v=WAiqNav2cRE

For those who would like to take a deeper dive into this via a Python tutorial try these excellent posts by Eduardo Muñoz.

Intro to the Encoder-Decoder model and the Attention mechanism
https://edumunozsala.github.io/BlogEms/fastpages/jupyter/encoder-decoder/lstm/attention/tensorflow%202/2020/10/07/Intro-seq2seq-Encoder-Decoder-ENG-SPA-translator-tf2.html

Attention is all you need: Discovering the Transformer model
https://edumunozsala.github.io/BlogEms/transformer/attention/encoder-decoder/tensorflow%202/2020/10/29/Transformer-NMT-en-es.html

The Way Forward
As I described in The Limitations of Darwinian Systems, Darwinian systems that evolve by means of inheritance, innovation and natural selection can frequently find themselves trapped on a localized peak in a capability terrain with no way to further evolve to higher peaks.

Figure 12 – Darwinian systems can find themselves trapped on a localized peak in a capability terrain once they have evolved to a localized peak because they cannot ascend any higher through small incremental changes. All paths lead to a lower level of capability, and thus, will be strongly selected against by natural selection. Above we see a localized peak in the foreground with the summit of Mount Everest in the background.

It took about four billion years of Darwinian evolution to produce a form of carbon-based life with our level of Intelligence. The human brain is composed of about 100 billion neurons and these neurons basically operate in the very same manner across all species. Now neurons have been around for at least 541 million years, ever since the Cambrian Explosion, because creatures in the Cambrian already had eyes to see with. For more on that see An IT Perspective of the Cambrian Explosion.

Figure 13 – Creatures during the Cambrian Explosion 541 million years ago had neurons because they already had eyes. They must have had rudimentary brains that allowed them to move according to what their eyes perceived. Above is a fossil of a Cambrian trilobite with eyes.

Then over the ensuing hundreds of millions of years, these biological neural networks achieved higher levels of capability and Intelligence by means of small incremental changes. But the question then remains - just how high a level of Intelligence can such a biological neural network architecture achieve? Could it be that we human beings are trapped on a localized peak in the terrain of all possible levels of Intelligence? The Transformer Neural Networks used by LLM AI agents seem to be a whole new way of "thinking". Certainly, no human being could ever read and absorb the entire content of the Internet! Perhaps in order to achieve true ASI, we need our current LLM AI agents to work on the problem of searching for even more powerful neural network architectures.

Now given what we have just seen these past few months since the arrival of the second Singularity early in 2023, imagine if we constructed an AI Research Center composed of 10,000 LLM AI agents who all had synthetic personal lives and histories. Some might be AI developers, AI project managers, AI NetworkOperations agents, AI CloudOperations agents or AI DBA-Operations experts. After we initialize all 10,000 LLM AI agents, we then give one of the high-level AI Managers of the AI Research Center the task of creating an ASI. We then let them all work together for several months or so to see what they come up with. If they do not come up with anything useful, we zero them all out and start over. We could even instantiate hundreds of such AI Research Centers, each with its own 10,000 LLM AI agents, to work on the problem in parallel. Then we just sit back to see if any of the teams come up with something interesting.

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston