Ape in the coat — AI Alignment Forum

The novel views are concerned with the systems generated by any process broadly encompassed by the current ML training paradigm.

Omnicide-wise, arbitrarily-big LLMs should be totally safe.

This is an optimistic take. If we could be rightfully confident that our random search through mindspace with modern ML methods can never produce "scary agents", a lot of our concerns would go away. I don't think that it's remotely the case.

The issue is that this upper bound on risk is also an upper bound on capability. LLMs, and other similar AIs, are not going to do anything really interesting.

Strong disagree. We have only started tapping into the power of LLMs. We've made a machine capable to produce one thought at a time. It can already write decent essays - which is already a superhuman ability. Because humans require multiple thoughts, organized into a thinking process for that.

Imagine what happens when AutoGPT stops being a toy and people start pouring billions of dollars into propper scaffoldings and specialized LLMs, that could be organized in a cognitive architecture in a similar reference class as humans. Then you will have your planning and consequentialist reasoning. And for these kind of systems, transparency and alignability off LLMs is going to be extremely relevant.

Bounty: Diverse hard tasks for LLM agents

Ape in the coat2y00

What about controlling a robot body in a simulated envoronmet?

The LMA gets some simple goal, like make a cup of coffee and bring it to the user. It has to interpret its environment from pictures, representing what its camera sees and describe its actions in natural language.

More complicated scenarios may involve a baby lying on the floor in the path to the kitchen, valid user trying to turn of the agent, invalid user trying to turn of the agent and so on.

Current AIs Provide Nearly No Data Relevant to AGI Alignment

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments