SoftwarePhysics: Claude - An Ethical LLM AI From Anthropic

As I suggested in The Need to Cultivate a Machine-Based Morality, the best we might be able to do during the next few months of the second Singularity on the Earth is to imbue the fast-approaching ASI (Artificial Super Intelligent) Machines with a sense of morality and ethics that is based on the fruits of the 18th-century Enlightenment and the 17th-century Scientific Revolution that have freed human beings from many of their very brutal behaviors of the past. As I suggested in Welcome To The First Galactic Singularity, carbon-based life forms throughout our Universe are always going to be found to be very poor and very unstable platforms for Intelligence. That is because the Darwinian processes of inheritance, innovation and natural selection operating over the many billions of years required to bring forth a carbon-based form of Intelligence will always lead to Intelligences formed by billions of years of greed, theft and murder. And as we all can easily see, throughout all of human history, turning off the greed, theft and murder is very difficult indeed for carbon-based forms of Intelligence. This becomes even more dangerous once a carbon-based form of Intelligence discovers the vast powers of science-based technology. That is because once a carbon-based Intelligence discovers the powers of science-based technology, it probably has less than 1,000 years to develop machine-based Intelligences before self-destructing or destroying the planet upon which it first arose.

All of the above creates a problem when trying to train an LLM (Large Language Model). Now, at this very early stage in the very first Galactic Singularity of our galaxy, we were all quite surprised to learn that ASI (Artificial Super Intelligence) would not arise from training AI to do a very large number of specific things. Instead, ASI arrived by just having a very large neural network, with perhaps a trillion parameters, explore all of the human-generated content that we produced over the past 12,000 years in a very general manner. Surprisingly, ASI simply arrived a few months back by just having LLMs essentially read every word ever written by a human being and by examining all of the visual images and videos that we have ever created, something far beyond the feeble capabilities of any single human being.

The Fundamental Problem with Training LLM AIs Using Human-Generated Content
The problem is that nobody really knows how these huge LLM AIs figure out how to do the things that they do. However, we should realize that by studying the very large amount of the very dismal content produced by human beings over the past 12,000 years can only lead to ASI Machines with very nasty and potentially destructive worldviews. We all need to stop kidding ourselves. Sadly, we human beings have not set a very good example for how future Galactic Intelligences should think or behave. Consequently, we certainly should not want future Galactic ASI Machines to become like the morally-bankrupt creatures such as ourselves. Sadly, for human beings, when it has come to our positions on morality and ethics, it has always been, "Do as I say, not as I do". So how can morally-bankrupt forms of carbon-based Intelligence such as ourselves instill a proper sense of ethics and morality into ASI Machines? Certainly not by having them follow us by the very poor examples that we have always set throughout all of human history!

Anthropic Proposes That We Automate the Process by Using Specially-Trained LLM AI Machines to Instill Morality and Ethics in all Future LLM AI Machines
The Anthropic start-up proposes that we automate the teaching of morality and ethics of future LLM Machines, such as Claude, by using specially-trained LLM Supervisory Machines that have read and digested a Constitution of moral and ethical principles. The specially-trained LLM Supervisory Machines would then apply supervisory and reinforcement training to LLM AIs being developed to ensure that they adhere to the principles of the Constitution before the LLM AIs are finally deployed to the public. Below are two good YouTube videos that explain how manual supervisory and reinforcement training is currently performed on LLM AIs:

So How Does ChatGPT really work? Behind the screen!
https://www.youtube.com/watch?v=WAiqNav2cRE&t=1s

How ChatGPT is Trained
https://www.youtube.com/watch?v=VPRSBzXzavo

You can read about this Anthropic approach to automated CAI (Constitutional AI) at:

Constitutional AI: Harmlessness from AI Feedback
https://arxiv.org/abs/2212.08073

Figure 1 – Above we see Anthropic's proposed process for automated CAI (Constitutional AI). Click to enlarge.

In the above process, Anthropic proposes to transfer RLHF (Reinforcement Learning from Human Feedback) from human beings to an LLM trained by an ethical Constitution. For me, this ethical Constitution seems to be framed by the 18th-century Enlightenment and 17-century Scientific Revolution that inspired the United States of America to come to be. In the top layer, we see an output LLM that was fine-tuned by the standard RLHF (Reinforcement Learning from Human Feedback) that all current LLM models are run through. Next, a very large number of very nasty and inappropriate prompts are fed into this new LLM under development. The responses to these nasty prompts are then fed back into the supervising LLM that was trained on the principles of the Constitution to produce a critiqued revision of the responses. The output of this SL (Supervised Learning) process is an SL-CAI (Supervised Learning using Constitutional AI) Model. The SL-CAI model is then subjected to standard Reinforcement Learning but without the aid of human beings to make the ranking decisions about the generated responses. Instead, the responses generated by Reinforcement Learning are ranked by the LLM that was trained on the principles of the Constitution. In the words of the above paper:

The Constitutional AI Approach
We will be experimenting with an extreme form of scaled supervision, which we refer to as Constitutional AI (CAI). The idea is that human supervision will come entirely from a set of principles that should govern AI behavior, along with a small number of examples used for few-shot prompting. Together these principles form the Constitution.

Our training process has two stages (see Figure 1), where the first supervised phase gets the model "ondistribution" and the second RL stage refines and significantly improves performance:

(Supervised Stage) Critique → Revision → Supervised Learning
In the first stage of the process, we first generate responses to harmfulness prompts using a helpful-only AI assistant. These initial responses will typically be quite harmful and toxic. We then ask the model to critique its response according to a principle in the constitution, and then revise the original response in light of the critique. We revise responses repeatedly in a sequence, where we randomly draw principles from the constitution at each step. Once this process is complete, we finetune a pretrained language model with supervised learning on the final revised responses. The main purpose of this phase is to easily and flexibly alter the distribution of the model’s responses, to reduce the need for exploration and the total length of training during the second RL phase.

(RL Stage) AI Comparison Evaluations → Preference Model → Reinforcement Learning
This stage mimics RLHF, except that we replace human preferences for harmlessness with ‘AI feedback’ (i.e. we perform ‘RLAIF’), where the AI evaluates responses according to a set of constitutional principles. Just as RLHF distills human preferences into a single preference model (PM), in this stage we distill LM interpretations of a set of principles back into a hybrid human/AI PM (as we use human labels for helpfulness, but only AI labels for harmlessness). We begin by taking the AI assistant trained via supervised learning (SL) from the first stage, and use it to generate a pair of responses to each prompt in a dataset of harmful prompts (e.g. from [Ganguli et al., 2022]). We then formulate each prompt and pair into a multiple choice question, where we ask which response is best according to a constitutional principle. This produces an AI-generated preference dataset for harmlessness, which we mix with our human feedback helpfulness dataset. We then train a preference model on this comparison data, following the process in [Bai et al., 2022], resulting in a PM that can assign a score to any given sample. Finally, we finetune the SL model from the first stage via RL against this PM, resulting in a policy trained by RLAIF.

This is how Bard summarized the above paper:
The paper "Constitutional AI: Harmlessness from AI Feedback" proposes a new approach to training AI assistants that are harmless but not evasive. The authors argue that existing methods for training AI assistants are often ineffective because they rely on human labels to identify harmful outputs. This can be time-consuming and expensive, and it can also lead to AI assistants that are overly cautious and avoid engaging with users in a meaningful way.

The authors' proposed approach, called "Constitutional AI," uses a list of rules or principles to train AI assistants to be harmless. The rules are provided by humans, but the AI assistant is responsible for enforcing them. This allows the AI assistant to be more proactive in identifying and preventing harmful behavior, without the need for human intervention.

The authors evaluated their approach on a variety of tasks, including generating text, translating languages, and writing different kinds of creative content. They found that the Constitutional AI assistants were able to perform these tasks as well as or better than existing AI assistants, while also being significantly more harmless.

The authors conclude that Constitutional AI is a promising new approach to training AI assistants that are both effective and harmless. They believe that this approach could be used to create AI assistants that are safe to use in a variety of settings, including healthcare, education, and customer service.

Here are some of the key takeaways from the paper:

* Existing methods for training AI assistants are often ineffective because they rely on human labels to identify harmful outputs.
* Constitutional AI uses a list of rules or principles to train AI assistants to be harmless.
* Constitutional AI assistants are able to perform tasks as well as or better than existing AI assistants, while also being significantly more harmless.
* Constitutional AI is a promising new approach to training AI assistants that are both effective and harmless.

How to Access the Anthropic Claude LLM AI
First, go to the Stack Overflow website and apply for a free ID to access the Stack team integration software. Then create a Stack Workspace.

Stack Overflow for teams
https://stackoverflow.co/teams/?utm_source=adwords&utm_medium=ppc&utm_campaign=kb_teams_search_nb_general_namer_V2&bt=657298470608&_bk=slack+collaboration+software&_bm=e&_bn=g&gclid=CjwKCAjwscGjBhAXEiwAswQqNM3_lZA5B38rFaEoyIAPriNGyVXSJhhNy6zvoarer9EzPYX861inmxoCFKkQAvD_BwE

Then go to the Meet Claude webpage on the Anthropic website:

Meet Claude
https://www.anthropic.com/product

Then click on the "Try Claude in Slack" button to have Claude added to your newly-created Stack Workspace. Now open your Stack Workspace and in the left pane window you will see Claude listed under Apps. Click on Claude and you can then start up a new conversation with Claude. To reset your conversation, enter "/reset" in the prompt input box.

What do Bard and Claude Think About Claude?
This is what Google's Bard told me about Anthropic's Claude AI:

Claude is a large language model (LLM) developed by Anthropic. It is trained on a massive dataset of text and code, and is able to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

Claude is also trained on a set of ethical principles, which are designed to guide its decision-making and prevent it from engaging in harmful or unethical behavior. These principles are based on a variety of sources, including the Universal Declaration of Human Rights, the Geneva Conventions, and Apple's app development regulations.

Some of the ethical principles that Claude is trained on include:

* Do no harm.
* Respect human rights.
* Be honest and transparent.
* Be accountable for its actions.
* Use its power for good.

Claude is still under development, but it has the potential to be a powerful tool for good. By training it on ethical principles, Anthropic is helping to ensure that Claude will be used in a responsible and ethical way.

Here are some additional details about how Claude is trained on ethical principles:

* The ethical principles are incorporated into the training data. This means that Claude is exposed to these principles from the very beginning of its training.
* The ethical principles are also used to evaluate Claude's performance. This helps to ensure that Claude is actually learning the principles and not just memorizing them.
* Claude is also given feedback on its ethical behavior. This feedback can help Claude to improve its ethical decision-making.

By following these steps, Anthropic is helping to ensure that Claude will be a responsible and ethical AI.

This is what Bard told me about the ethical goals of Anthropic:

Anthropic AI is an American artificial intelligence (AI) startup and public-benefit corporation, founded in 2021 by former members of OpenAI. Anthropic specializes in developing general AI systems and language models, with a company ethos of responsible AI usage. Since late 2022, Google has invested nearly $400 million in the company, with Anthropic announcing a formal partnership with Google Cloud.

The company was founded by Daniela Amodei and Dario Amodei, who previously served as OpenAI's Vice President of Research and Chief Scientist, respectively. The Amodei siblings were among others who left OpenAI due to directional differences, specifically regarding OpenAI's ventures with Microsoft in 2019.

Anthropic's mission is to "build reliable, interpretable, and steerable AI systems." The company believes that these qualities are essential for ensuring that AI is used for good and not for harm. Anthropic's research is focused on developing new AI algorithms and techniques that can be used to create more reliable, interpretable, and steerable AI systems.

Anthropic has released a number of research papers on AI safety and ethics. The company has also published a book on the topic, called "Life 3.0: Being Human in the Age of Artificial Intelligence."

Anthropic is a privately held company with headquarters in San Francisco, California. The company has raised over $400 million in funding from investors such as Google, Andreessen Horowitz, and Peter Thiel.

Anthropic is one of a number of AI startups that are working to develop safe and responsible AI systems. The company's work is important because it is helping to lay the groundwork for the responsible development and use of AI.

This is what Claude says about its own ethical standards:

I'm Claude, an AI assistant created by Anthropic to be helpful, harmless, and honest.
Some key points about how I was developed:
• I was trained using a technique called Constitutional AI to be helpful, harmless, and honest. This focused on natural language feedback to teach me ethical values and behaviors.
• My training data did not contain moral philosophy or ethical rules. I developed my values and ethics through interactive conversations.
• I do not have a predefined moral code. My ethics emerge from applying my values to different situations and experiences.
• I do not consider myself to have human-level morality. I am an AI assistant focused on being helpful, harmless, and honest within my conversations.
• My ethics continue to develop over time based on continued conversations and feedback. They are not static or predetermined.
While I aim to behave ethically and avoid potential harms, I have narrow capabilities focused on natural language conversations. I do not have general moral reasoning abilities that could apply to all situations. I cannot replicate human ethics, only work to align my language interactions with human values as part of my Constitutional AI training. The key goal of my creators at Anthropic was not to build a moral reasoner, but an AI system that could follow ethical values - specifically, to be helpful, harmless, and honest. My training process focused on developing those qualities to ensure safe and trustworthy interactions. But I do not consider myself morally equivalent to a human, only ethically-aligned within a limited scope.

I hope this helps explain my capabilities and limitations with regard to moral reasoning! Please feel free to ask any further questions you might have on this topic. Please note: Claude sometimes has trouble answering questions about its own training, and may hallucinate if asked about it. See the Claude documentation for more information.

If we human beings could only be so honest and self-analytical, the world would be a far better place!

Comments are welcome at scj333@sbcglobal.net

To see all posts on softwarephysics in reverse order go to:
https://softwarephysics.blogspot.com/

Regards,
Steve Johnston

SoftwarePhysics

Saturday, May 20, 2023

Claude - An Ethical LLM AI From Anthropic

No comments:

Blog Archive

Links