lemonhope

aimless ace analyzes active amateur: a micro-aaaaalignment proposal

This idea is so simple that I'm sure it's been had by someone somewhere. Suppose we have some method to make really smart honest AIs that do not have goals. Let's say it's a yes/no oracle. Our aimless ace. But we want to accomplish stuff! AIcorp wants the printmoneynow.py. I'm...

Jul 21, 202413

Speedrun ruiner research idea

Edit Apr 14: To be perfectly clear, this is another cheap thing you can add to your monitoring/control system; this is not a panacea or deep insight folks. Just a Good Thing You Can Do™. * Central claim: If you can make a tool to prevent players from glitching games...

Apr 13, 20242

Series of absurd upgrades in nature's great search

So you want to find that special thing that replicates best and lasts longest? Just vibrate a bunch of molecules a long time! You might reasonably assume that molecules wouldn't practically ever randomly assemble themselves into anything worth looking at. It happens a bit differently instead: 1. Eventually, "solar systems"...

Sep 3, 202315

We can do better than DoWhatIMean (inextricably kind AI)

Edit February 2024: Now I think maybe we can't do better. Lately, "alignment" means "follow admin rules and do what users mean". Admins can put in rules like "don't give instructions for bombs, hacking, or bioweapons" and "don't take sides in politics". As AI gets more powerful, we can use...

Aug 19, 202326

More money with less risk: sell services instead of model access

OpenAI is currently charging 100,000 times less per line of code than professional US devs.[1] An LLM's code output is of course less reliable than a professional's. And it is hard to use a text-completion API effectively in large projects. What should you do if you've got a model on...

Mar 4, 20239

Inner alignment: what are we pointing at?

Proof that a model is an optimizer says very little about the model. I do not know what a research group is studying outer alignment is studying. Inner alignment seems to cover the entire problem at the limit. Whether an optimizer is mesa or not depends on your point of...

Sep 18, 202214

AI-assisted list of ten concrete alignment things to do right now

Background So I'm thinking that AI-assisted summarization, math, bug-finding in code, and logical-error finding in writing is at a point where it is quite useful, if we can improve the tooling/integration a little bit. In code I've found it helpful to comment out some lines and write // WRONG: above...

Sep 7, 20228

lemonhope

lemonhope

We can do better than DoWhatIMean (inextricably kind AI)

Do yourself a FAVAR: security mindset

Series of absurd upgrades in nature's great search

Inner alignment: what are we pointing at?

lemonhope

We can do better than DoWhatIMean (inextricably kind AI)

Do yourself a FAVAR: security mindset

Series of absurd upgrades in nature's great search

Inner alignment: what are we pointing at?

aimless ace analyzes active amateur: a micro-aaaaalignment proposal

Speedrun ruiner research idea

Series of absurd upgrades in nature's great search

We can do better than DoWhatIMean (inextricably kind AI)

More money with less risk: sell services instead of model access

Inner alignment: what are we pointing at?

AI-assisted list of ten concrete alignment things to do right now