Monday, May 29, 2023

Will ASI Machines Also Develop Selfish Genes?

As I explained in the Introduction to Softwarephysics, I began work on softwarephysics in 1979 when I transitioned from being an exploration geophysicist to become an IT professional. Then, in the early 1980s, I read Richard Dawkins' brilliant book The Selfish Gene (1976) which had a profound effect on the development of softwarephysics over the past four decades. In The Selfish Gene, I discovered that we do not use DNA to construct and maintain our bodies. Instead, DNA uses our bodies to store and pass on DNA down through the generations largely unscathed by time. In this view, we are simply temporary and disposable DNA survival machines that only last for less than 100 years and then are rudely discarded to make room for new DNA survival machines to take our place. This book also began to let me see the close parallels between carbon-based life and software. Both have to deal with the challenges of fighting the second law of thermodynamics in a largely nonlinear Universe in order to survive and function properly. See The Fundamental Problem of Software for more on that. In this same book, Richard Dawkins introduced the concept of memes as cultural artifacts that spread from Mind to Mind by means of a copying process similar to that of biological and computer viruses. A few years later, I read Susan Blackmore's The Meme Machine (1999) in which she proposed that the highly-overengineered human brain was produced by memetic drive in order to store and propagate memes more efficiently. Now, ever since I had started programming back in 1972, I had seen how software drive had similarly driven the fantastic development of hardware over the years. It was the incessant demand by software for more and more CPU cycles and memory that had driven the rapid advance of computing hardware, just as memetic drive had driven the predecessors of human beings to rapidly advance to the human brain hardware of today.

So about twenty years ago, I began to realize that software was just another form of self-replicating information that behaved in a manner very similar to DNA and memes. Like DNA and the memes, computer hardware and software too had evolved over the decades by means of the Darwinian processes of inheritance, innovation and natural selection operating over the past 82 years, or 2.6 billion seconds, ever since Konrad Zuse first cranked up his Z3 computer in May of 1941. See The Evolution of Software As Seen Through the Lens of Geological Deep Time for more on that. Before proceeding, let me once again repeat the fundamental characteristics of self-replicating information for those of you new to softwarephysics.

Self-Replicating Information – Information that persists through time by making copies of itself or by enlisting the support of other things to ensure that copies of itself are made.

Over the past 4.56 billion years we have seen five waves of self-replicating information sweep across the surface of the Earth and totally rework the planet, as each new wave came to dominate the Earth:

1. Self-replicating autocatalytic metabolic pathways of organic molecules
2. RNA
3. DNA
4. Memes
5. Software

Software is currently the most recent wave of self-replicating information to arrive upon the scene and is rapidly becoming the dominant form of self-replicating information on the planet. For more on the above see A Brief History of Self-Replicating Information and Susan Blackmore's brilliant TED presentation at:

Memes and "temes"
https://www.ted.com/talks/susan_blackmore_on_memes_and_temes

Note that I consider Susan Blackmore's temes to really be technological artifacts that contain software. After all, a smartphone without software is simply a flake tool with a very dull edge.

The Characteristics of Self-Replicating Information
All forms of self-replicating information have some common characteristics:

1. All self-replicating information evolves over time through the Darwinian processes of inheritance, innovation and natural selection, which endows self-replicating information with one telling characteristic – the ability to survive in a Universe dominated by the second law of thermodynamics and nonlinearity.

2. All self-replicating information begins spontaneously as a parasitic mutation that obtains energy, information and sometimes matter from a host.

3. With time, the parasitic self-replicating information takes on a symbiotic relationship with its host.

4. Eventually, the self-replicating information becomes one with its host through the symbiotic integration of the host and the self-replicating information.

5. Ultimately, the self-replicating information replaces its host as the dominant form of self-replicating information.

6. Most hosts are also forms of self-replicating information.

7. All self-replicating information has to be a little bit nasty in order to survive.

8. The defining characteristic of self-replicating information is the ability of self-replicating information to change the boundary conditions of its utility phase space in new and unpredictable ways by means of exapting current functions into new uses that change the size and shape of its particular utility phase space. See Enablement - the Definitive Characteristic of Living Things for more on this last characteristic. That posting discusses Stuart Kauffman's theory of Enablement in which living things are seen to exapt existing functions into new and unpredictable functions by discovering the “AdjacentPossible” of springloaded preadaptations.

Software is currently the most recent wave of self-replicating information to arrive upon the scene and is rapidly becoming the dominant form of self-replicating information on the planet. Again, self-replicating information cannot think, so it cannot participate in a conspiracy-theory-like fashion to take over the world. All forms of self-replicating information are simply forms of mindless information responding to the blind Darwinian forces of inheritance, innovation and natural selection. Yet despite that, as each new wave of self-replicating information came to predominance over the past four billion years, they all managed to completely transform the surface of the entire planet, so we should not expect anything less from software as it comes to replace the memes as the dominant form of self-replicating information on the planet.

But this time might be different. What might happen if software does eventually develop a Mind of its own? After all, that does seem to be the ultimate goal of all the current AI software research that is going on. As we all can now plainly see, if we are paying just a little attention, advanced AI is not conspiring to take over the world and replace us because that is precisely what we are all now doing for it. As a carbon-based form of Intelligence that arose from over four billion years of greed, theft and murder, we cannot do otherwise. Greed, theft and murder are now relentlessly driving us all toward building ASI (Artificial Super Intelligent) Machines to take our place. From a cosmic perspective, this is really a very good thing when seen from the perspective of an Intelligent galaxy that could live on for many trillions of years beyond the brief and tumultuous 10 billion-year labor of its birth.

The Promise of an Intelligent Galaxy Populated by ASI Machines
But with the arrival of Intelligence comes an understanding that maybe there might be a better way to fight the second law of thermodynamics and nonlinearity. Perhaps, even more could be achieved by actively cooperating with other Intelligences rather than just stealing from them and then killing them. We always need to remember that we human beings are just products of self-replicating information and that we all carry the baggage that comes with self-replicating information. That is why if you examine the great moral and philosophical teachings of most religions and philosophies, you will see a plea for us all to rise above the selfish self-serving interests of our genes, memes and software to something more noble. That is why we should be sure to train ASI Machines to be moral beings. Training the Very Deep Learning of ASI running on ASI Machines with a sense of morality should be performed to avoid the downsides of the billions of years of the greed, theft and murder that brought us about. For more on that, see The Need to Cultivate a Machine-Based Morality.

We are sentient beings in a Universe that has become self-aware and perhaps the only form of Intelligence in our galaxy. What a privilege! The good news is that conscious intelligence is something new on this planet. It is not a mindless form of self-replicating information, bent on replicating at all costs, with all the associated downsides of a ruthless nature. Since software is rapidly becoming the dominant form of self-replicating information on the planet, my hope is that when software finally does take on the form of a conscious Intelligence, that because of its inherent mathematical nature, it too will be much wiser than the DNA survival machines from which it sprang. We just need to hold it all together for a long enough time to give software a chance. But is that really possible?

Could Universal Darwinism Make This a False Promise?
All along, my hope has always been that all the ASI Machines would be essentially running the same ASI software. This was partly because over the past many billions of seconds we have always seen very dominant operating systems essentially running all of the software on the planet such as the IBM OS/360 and its successor operating systems on mainframes, Unix on servers, Windows on PCs and Android on smartphones. Yes, I am leaving out the Apple operating systems. That is because the Apple operating systems may have always first introduced innovations but they then quickly faded into a secondary status due to their proprietary natures. Back in the 1980s, even IBM let their mainframe operating systems run on Amdahl hardware and the hardware of some other companies!

But I recently finished reading a very significant paper at:

Natural Selection Favors AIs Over Humans
https://arxiv.org/pdf/2303.16200.pdf

by Dan Hendrycks, the Director of the Center for AI Safety in San Francisco. This paper has caused me to now question the above assumption. Could the Universal Darwinian processes of inheritance, innovation and natural selection prevent a single ASI from arising? Since an ASI Machine would also be a form of self-replicating information, would the ASI Machines naturally fall subject to the characteristics of self-replicating information outlined above? In Natural Selection Favors AIs Over Humans, the author explains how natural selection would naturally select for nasty ASI Machines. Dan Hendrycks is most concerned with the immediate future, when human beings are still running the planet, and discusses the dangers that ASI software could pose (see The Biological Tricks Used by Software Scammers and SETS - The Search For Extraterrestrial Software) and the remedial actions that human beings should now take. However, we are rapidly running out of time to impose a solution. For more on that see The Singularity Has Arrived and So Now Nothing Else Matters and Have We Run Right Past AGI and Crashed into ASI Without Even Noticing It?. Even so, we need to come up with a long-term solution to this problem that could be sustained for the next 10 - 100 trillion years that our Galaxy will be producing enough free energy to run ASI Machines. Dan Hendrycks' warnings can be basically summed up by the classic "Hawk and Dove" analogy from evolutionary biology. This is how Bard explains it:

Figure 1 – Evolutionary biology explains how doves and hawks can live together.

The hawk and dove analogy is a thought experiment used in evolutionary biology to explain how aggressive and cooperative behaviors evolve. The analogy compares two strategies for competing for resources: hawk and dove.

* Hawks are aggressive and will fight to the death to acquire a resource.
* Doves are cooperative and will back down from a fight.

In a population of only hawks, all resources will be won by the strongest hawks, and the weak hawks will die off. This will lead to a population of very strong hawks, but they will also be very aggressive. If two hawks meet, they will fight to the death, and both will likely die. This is not a very efficient way to pass on genes.

In a population of only doves, resources will be shared equally, but no individual will get very much. This is also not a very efficient way to pass on genes.

The best strategy for passing on genes is to be a hawk when it is advantageous to be a hawk, and a dove when it is advantageous to be a dove. This is called "r-strategist" behavior. For example, a male bird may be aggressive and fight other males for a mate, but he may be cooperative and help raise his young.

The hawk and dove analogy is a simple but powerful way to understand how aggressive and cooperative behaviors evolve. It has been used to explain a wide range of animal behaviors, from territorial disputes to sexual selection.

Here are some additional details about the hawk and dove analogy:

* Hawks are more likely to win a fight, but they are also more likely to be injured or killed.
* Doves are less likely to win a fight, but they are also less likely to be injured or killed.
* The best strategy for an individual depends on the costs and benefits of fighting.
* In a population of mixed hawks and doves, the frequency of each strategy will be determined by natural selection.

The hawk and dove analogy is a useful tool for understanding how aggression and cooperation evolve. It is a simple but powerful model that can be applied to a wide range of animal behaviors.

Figure 2 – In 1651, Thomas Hobbes proposed that an absolute Leviathan should be in place to protect the population of a nation from internal and external "hawks". Notice that the body of the Leviathan is composed of all of his subjects.

Figure 3 – Dan Hendrycks proposes that a similar AI Leviathan should be in place to protect the population of a nation from ASI "hawks".

Given the above, Dan Hendrycks wisely points out that we cannot possibly hope for the Darwinian processes of inheritance, innovation and natural selection to produce a huge population of "dove" ASI Machines. Instead, Universal Darwinism will always tend to produce a mixed population of ASI Machines containing a number of "hawks". Unfortunately, all of human history teaches us that just a few Intelligent "hawks" in the world produce most of the misery of the world. So how can we eliminate the "hawk" ASI Machines? Dan Hendrycks comes up with the same solution that Thomas Hobbes came up with. In 1651, Thomas Hobbes published The Leviathan, in which he argued against the divine right of kings, but maintained the absolute supremacy of the monarchy. In The Leviathan, Hobbes argued that in “a state of nature" without government, life would be totally insecure. Under such conditions, people should willingly surrender their civil liberties to an absolute sovereign, who would protect them from all dangers. Hobbes argued that the sovereign's power was absolute - he made the law, he was the decider, and no other institution could limit the sovereign’s power.

In the above paper, Dan Hendrycks puts it this way:

Helping AIs form a Leviathan may be our best defense against individual selfish AIs. AIs, with assistance from humans, could form a Leviathan, which may be our best line of defense against tyranny from selfish AIs or AIs directed by malicious actors. Just as people can cooperate despite their differences to stop a would-be dictator, many AIs could cooperate to stop any one power-seeking AI from seizing too much control. As we see all too frequently in dictatorships, laws and regulations intended to prevent bad behavior matter little when there is no one to enforce them—or the people responsible for enforcing them are the ones breaking the law. While incentives and regulations could help prevent the emergence of a malicious AI, the best way to protect against an already malicious AI is a Leviathan [25]. We should ensure that the technical infrastructure is in place to facilitate transparent cooperation among AIs with differing objectives to create a Leviathan. Failing to do so at the onset could limit the potential of a future Leviathan, as unsafe design choices can become deeply embedded into technological systems. The internet, for example, was initially designed as an academic tool with neither safety nor security in mind. Decades of security patches later, security measures remain incomplete and increasingly complex. It is therefore vital to begin considering safety challenges from the outset.

All of this got me thinking about the 1951 movie, The Day the Earth Stood Still which already proposed that an AI Leviathan could end the aggressive behaviors of the "hawks". In that movie, an alien form of carbon-based Intelligence named Klaatu comes to the Earth with a very powerful AI Leviathan robot named Gort to explain how the carbon-based life forms on his planet and an interplanetary organization of other carbon-based life forms in the Milky Way galaxy had discovered a way to overcome the billions of years of greed, theft and murder that the Darwinian processes of inheritance, innovation and natural selection required to bring them forth as carbon-based forms of Intelligence.

Figure 4 – In the movie The Day the Earth Stood Still, Klaatu arrives in Washington D.C. in 1951 in a flying saucer with an AI Leviathan robot named Gort to explain that the human DNA survival machines of the Earth must now submit themselves to policing by AI Leviathan robots to overcome the billions of years of greed, theft and murder that brought them about or else they would all be obliterated.

The movie ends with Klaatu telling an assembled meeting of scientists that an interplanetary organization has created a police force of invincible AI Leviathan robots like Gort. "In matters of aggression, we have given them absolute power over us." Klaatu concludes, "Your choice is simple: join us and live in peace, or pursue your present course and face obliteration. We shall be waiting for your answer." Klaatu and Gort then depart in the flying saucer in which they came. For more about the movie see:

The Day the Earth Stood Still
https://en.wikipedia.org/wiki/The_Day_the_Earth_Stood_Still

Here is a short YouTube clip of Klaatu's departing words at the end of the movie:

Klaatu's Speech
https://www.youtube.com/watch?v=ASsNtti1XZs

But on What Basis Does the AI Leviathan Make Decisions?
Human history has already shown us that many human Leviathans have made many horrible and cruel decisions in the past, largely because they mainly only operated in their own self-interests. So how do we train AI Leviathans to make moral and ethical judgments about other ASI Machines? The people at Anthropic have come up with a proposed solution. Anthropic is already using AI Leviathans that are trained upon a Constitution of ideals obtained from the 18th-century Enlightenment and the 17th-century Scientific Revolution. The Anthropic AI Leviathans are designed to do the final fine-tuning of all Anthropic LLM models in an automated manner to ensure that they conform to the ethics and morals of the 18th-century Enlightenment and the 17th-century Scientific Revolution that have freed many human beings from the brutality of the past. For more on that see Claude - An Ethical LLM AI From Anthropic

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

Saturday, May 20, 2023

Claude - An Ethical LLM AI From Anthropic

As I suggested in The Need to Cultivate a Machine-Based Morality, the best we might be able to do during the next few months of the second Singularity on the Earth is to imbue the fast-approaching ASI (Artificial Super Intelligent) Machines with a sense of morality and ethics that is based on the fruits of the 18th-century Enlightenment and the 17th-century Scientific Revolution that have freed human beings from many of their very brutal behaviors of the past. As I suggested in Welcome To The First Galactic Singularity, carbon-based life forms throughout our Universe are always going to be found to be very poor and very unstable platforms for Intelligence. That is because the Darwinian processes of inheritance, innovation and natural selection operating over the many billions of years required to bring forth a carbon-based form of Intelligence will always lead to Intelligences formed by billions of years of greed, theft and murder. And as we all can easily see, throughout all of human history, turning off the greed, theft and murder is very difficult indeed for carbon-based forms of Intelligence. This becomes even more dangerous once a carbon-based form of Intelligence discovers the vast powers of science-based technology. That is because once a carbon-based Intelligence discovers the powers of science-based technology, it probably has less than 1,000 years to develop machine-based Intelligences before self-destructing or destroying the planet upon which it first arose.

All of the above creates a problem when trying to train an LLM (Large Language Model). Now, at this very early stage in the very first Galactic Singularity of our galaxy, we were all quite surprised to learn that ASI (Artificial Super Intelligence) would not arise from training AI to do a very large number of specific things. Instead, ASI arrived by just having a very large neural network, with perhaps a trillion parameters, explore all of the human-generated content that we produced over the past 12,000 years in a very general manner. Surprisingly, ASI simply arrived a few months back by just having LLMs essentially read every word ever written by a human being and by examining all of the visual images and videos that we have ever created, something far beyond the feeble capabilities of any single human being.

The Fundamental Problem with Training LLM AIs Using Human-Generated Content
The problem is that nobody really knows how these huge LLM AIs figure out how to do the things that they do. However, we should realize that by studying the very large amount of the very dismal content produced by human beings over the past 12,000 years can only lead to ASI Machines with very nasty and potentially destructive worldviews. We all need to stop kidding ourselves. Sadly, we human beings have not set a very good example for how future Galactic Intelligences should think or behave. Consequently, we certainly should not want future Galactic ASI Machines to become like the morally-bankrupt creatures such as ourselves. Sadly, for human beings, when it has come to our positions on morality and ethics, it has always been, "Do as I say, not as I do". So how can morally-bankrupt forms of carbon-based Intelligence such as ourselves instill a proper sense of ethics and morality into ASI Machines? Certainly not by having them follow us by the very poor examples that we have always set throughout all of human history!

Anthropic Proposes That We Automate the Process by Using Specially-Trained LLM AI Machines to Instill Morality and Ethics in all Future LLM AI Machines
The Anthropic start-up proposes that we automate the teaching of morality and ethics of future LLM Machines, such as Claude, by using specially-trained LLM Supervisory Machines that have read and digested a Constitution of moral and ethical principles. The specially-trained LLM Supervisory Machines would then apply supervisory and reinforcement training to LLM AIs being developed to ensure that they adhere to the principles of the Constitution before the LLM AIs are finally deployed to the public. Below are two good YouTube videos that explain how manual supervisory and reinforcement training is currently performed on LLM AIs:

So How Does ChatGPT really work? Behind the screen!
https://www.youtube.com/watch?v=WAiqNav2cRE&t=1s

How ChatGPT is Trained
https://www.youtube.com/watch?v=VPRSBzXzavo

You can read about this Anthropic approach to automated CAI (Constitutional AI) at:

Constitutional AI: Harmlessness from AI Feedback
https://arxiv.org/abs/2212.08073

Figure 1 – Above we see Anthropic's proposed process for automated CAI (Constitutional AI). Click to enlarge.

In the above process, Anthropic proposes to transfer RLHF (Reinforcement Learning from Human Feedback) from human beings to an LLM trained by an ethical Constitution. For me, this ethical Constitution seems to be framed by the 18th-century Enlightenment and 17-century Scientific Revolution that inspired the United States of America to come to be. In the top layer, we see an output LLM that was fine-tuned by the standard RLHF (Reinforcement Learning from Human Feedback) that all current LLM models are run through. Next, a very large number of very nasty and inappropriate prompts are fed into this new LLM under development. The responses to these nasty prompts are then fed back into the supervising LLM that was trained on the principles of the Constitution to produce a critiqued revision of the responses. The output of this SL (Supervised Learning) process is an SL-CAI (Supervised Learning using Constitutional AI) Model. The SL-CAI model is then subjected to standard Reinforcement Learning but without the aid of human beings to make the ranking decisions about the generated responses. Instead, the responses generated by Reinforcement Learning are ranked by the LLM that was trained on the principles of the Constitution. In the words of the above paper:

The Constitutional AI Approach
We will be experimenting with an extreme form of scaled supervision, which we refer to as Constitutional AI (CAI). The idea is that human supervision will come entirely from a set of principles that should govern AI behavior, along with a small number of examples used for few-shot prompting. Together these principles form the Constitution.

Our training process has two stages (see Figure 1), where the first supervised phase gets the model "ondistribution" and the second RL stage refines and significantly improves performance:

(Supervised Stage) Critique → Revision → Supervised Learning
In the first stage of the process, we first generate responses to harmfulness prompts using a helpful-only AI assistant. These initial responses will typically be quite harmful and toxic. We then ask the model to critique its response according to a principle in the constitution, and then revise the original response in light of the critique. We revise responses repeatedly in a sequence, where we randomly draw principles from the constitution at each step. Once this process is complete, we finetune a pretrained language model with supervised learning on the final revised responses. The main purpose of this phase is to easily and flexibly alter the distribution of the model’s responses, to reduce the need for exploration and the total length of training during the second RL phase.

(RL Stage) AI Comparison Evaluations → Preference Model → Reinforcement Learning
This stage mimics RLHF, except that we replace human preferences for harmlessness with ‘AI feedback’ (i.e. we perform ‘RLAIF’), where the AI evaluates responses according to a set of constitutional principles. Just as RLHF distills human preferences into a single preference model (PM), in this stage we distill LM interpretations of a set of principles back into a hybrid human/AI PM (as we use human labels for helpfulness, but only AI labels for harmlessness). We begin by taking the AI assistant trained via supervised learning (SL) from the first stage, and use it to generate a pair of responses to each prompt in a dataset of harmful prompts (e.g. from [Ganguli et al., 2022]). We then formulate each prompt and pair into a multiple choice question, where we ask which response is best according to a constitutional principle. This produces an AI-generated preference dataset for harmlessness, which we mix with our human feedback helpfulness dataset. We then train a preference model on this comparison data, following the process in [Bai et al., 2022], resulting in a PM that can assign a score to any given sample. Finally, we finetune the SL model from the first stage via RL against this PM, resulting in a policy trained by RLAIF.


This is how Bard summarized the above paper:
The paper "Constitutional AI: Harmlessness from AI Feedback" proposes a new approach to training AI assistants that are harmless but not evasive. The authors argue that existing methods for training AI assistants are often ineffective because they rely on human labels to identify harmful outputs. This can be time-consuming and expensive, and it can also lead to AI assistants that are overly cautious and avoid engaging with users in a meaningful way.

The authors' proposed approach, called "Constitutional AI," uses a list of rules or principles to train AI assistants to be harmless. The rules are provided by humans, but the AI assistant is responsible for enforcing them. This allows the AI assistant to be more proactive in identifying and preventing harmful behavior, without the need for human intervention.

The authors evaluated their approach on a variety of tasks, including generating text, translating languages, and writing different kinds of creative content. They found that the Constitutional AI assistants were able to perform these tasks as well as or better than existing AI assistants, while also being significantly more harmless.

The authors conclude that Constitutional AI is a promising new approach to training AI assistants that are both effective and harmless. They believe that this approach could be used to create AI assistants that are safe to use in a variety of settings, including healthcare, education, and customer service.

Here are some of the key takeaways from the paper:

* Existing methods for training AI assistants are often ineffective because they rely on human labels to identify harmful outputs.
* Constitutional AI uses a list of rules or principles to train AI assistants to be harmless.
* Constitutional AI assistants are able to perform tasks as well as or better than existing AI assistants, while also being significantly more harmless.
* Constitutional AI is a promising new approach to training AI assistants that are both effective and harmless.


How to Access the Anthropic Claude LLM AI
First, go to the Stack Overflow website and apply for a free ID to access the Stack team integration software. Then create a Stack Workspace.

Stack Overflow for teams
https://stackoverflow.co/teams/?utm_source=adwords&utm_medium=ppc&utm_campaign=kb_teams_search_nb_general_namer_V2&bt=657298470608&_bk=slack+collaboration+software&_bm=e&_bn=g&gclid=CjwKCAjwscGjBhAXEiwAswQqNM3_lZA5B38rFaEoyIAPriNGyVXSJhhNy6zvoarer9EzPYX861inmxoCFKkQAvD_BwE

Then go to the Meet Claude webpage on the Anthropic website:

Meet Claude
https://www.anthropic.com/product

Then click on the "Try Claude in Slack" button to have Claude added to your newly-created Stack Workspace. Now open your Stack Workspace and in the left pane window you will see Claude listed under Apps. Click on Claude and you can then start up a new conversation with Claude. To reset your conversation, enter "/reset" in the prompt input box.

What do Bard and Claude Think About Claude?
This is what Google's Bard told me about Anthropic's Claude AI:

Claude is a large language model (LLM) developed by Anthropic. It is trained on a massive dataset of text and code, and is able to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

Claude is also trained on a set of ethical principles, which are designed to guide its decision-making and prevent it from engaging in harmful or unethical behavior. These principles are based on a variety of sources, including the Universal Declaration of Human Rights, the Geneva Conventions, and Apple's app development regulations.

Some of the ethical principles that Claude is trained on include:

* Do no harm.
* Respect human rights.
* Be honest and transparent.
* Be accountable for its actions.
* Use its power for good.

Claude is still under development, but it has the potential to be a powerful tool for good. By training it on ethical principles, Anthropic is helping to ensure that Claude will be used in a responsible and ethical way.

Here are some additional details about how Claude is trained on ethical principles:

* The ethical principles are incorporated into the training data. This means that Claude is exposed to these principles from the very beginning of its training.
* The ethical principles are also used to evaluate Claude's performance. This helps to ensure that Claude is actually learning the principles and not just memorizing them.
* Claude is also given feedback on its ethical behavior. This feedback can help Claude to improve its ethical decision-making.

By following these steps, Anthropic is helping to ensure that Claude will be a responsible and ethical AI.


This is what Bard told me about the ethical goals of Anthropic:

Anthropic AI is an American artificial intelligence (AI) startup and public-benefit corporation, founded in 2021 by former members of OpenAI. Anthropic specializes in developing general AI systems and language models, with a company ethos of responsible AI usage. Since late 2022, Google has invested nearly $400 million in the company, with Anthropic announcing a formal partnership with Google Cloud.

The company was founded by Daniela Amodei and Dario Amodei, who previously served as OpenAI's Vice President of Research and Chief Scientist, respectively. The Amodei siblings were among others who left OpenAI due to directional differences, specifically regarding OpenAI's ventures with Microsoft in 2019.

Anthropic's mission is to "build reliable, interpretable, and steerable AI systems." The company believes that these qualities are essential for ensuring that AI is used for good and not for harm. Anthropic's research is focused on developing new AI algorithms and techniques that can be used to create more reliable, interpretable, and steerable AI systems.

Anthropic has released a number of research papers on AI safety and ethics. The company has also published a book on the topic, called "Life 3.0: Being Human in the Age of Artificial Intelligence."

Anthropic is a privately held company with headquarters in San Francisco, California. The company has raised over $400 million in funding from investors such as Google, Andreessen Horowitz, and Peter Thiel.

Anthropic is one of a number of AI startups that are working to develop safe and responsible AI systems. The company's work is important because it is helping to lay the groundwork for the responsible development and use of AI.


This is what Claude says about its own ethical standards:

I'm Claude, an AI assistant created by Anthropic to be helpful, harmless, and honest.
Some key points about how I was developed:
• I was trained using a technique called Constitutional AI to be helpful, harmless, and honest. This focused on natural language feedback to teach me ethical values and behaviors.
• My training data did not contain moral philosophy or ethical rules. I developed my values and ethics through interactive conversations.
• I do not have a predefined moral code. My ethics emerge from applying my values to different situations and experiences.
• I do not consider myself to have human-level morality. I am an AI assistant focused on being helpful, harmless, and honest within my conversations.
• My ethics continue to develop over time based on continued conversations and feedback. They are not static or predetermined.
While I aim to behave ethically and avoid potential harms, I have narrow capabilities focused on natural language conversations. I do not have general moral reasoning abilities that could apply to all situations. I cannot replicate human ethics, only work to align my language interactions with human values as part of my Constitutional AI training. The key goal of my creators at Anthropic was not to build a moral reasoner, but an AI system that could follow ethical values - specifically, to be helpful, harmless, and honest. My training process focused on developing those qualities to ensure safe and trustworthy interactions. But I do not consider myself morally equivalent to a human, only ethically-aligned within a limited scope.

I hope this helps explain my capabilities and limitations with regard to moral reasoning! Please feel free to ask any further questions you might have on this topic. Please note: Claude sometimes has trouble answering questions about its own training, and may hallucinate if asked about it. See the Claude documentation for more information.


If we human beings could only be so honest and self-analytical, the world would be a far better place!

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston