Buck — AI Alignment Forum

AI ALIGNMENT FORUM
AF

The Thinking Machines Tinker API is good news for AI control and security

Yeah for sure. A really nice thing about the Tinker API is that it doesn't allow users to specify arbitrary code to be executed on the machine with weights, which makes security much easier.

The Thinking Machines Tinker API is good news for AI control and security

Buck6d20

Yeah, what I'm saying is that even if the computation performed in a hook is trivial, it sucks if that computation has to happen on a different computer than the one doing inference.

The Thinking Machines Tinker API is good news for AI control and security

Buck6d30

Yeah totally there's a bunch of stuff like this you could do. The two main issues:

Adding methods like this might increase complexity and if you add lots of them they might interact in ways that allow users to violate your security properties.
Some natural things you'd want to do for interacting with activations (e.g. applying arbitrary functions to modify activations during a forward pass) would substantially reduce the efficiency and batchability here--the API server would have to block inference while waiting for the user's computer to compute the change to activations and send it back.

It would be a slightly good exercise for someone to go through the most important techniques that interact with model internals and see how many of them would have these problems.

Buck's Shortform

Buck2mo226

Ugh, I think you're totally right and I was being sloppy; I totally unreasonably interpreted Eliezer as saying that he was wrong about how long/how hard/how expensive it would be to get between capability levels. (But maybe Eliezer misinterpreted himself the same way? His subsequent tweets are consistent with this interpretation.)

I totally agree with Eliezer's point in that post, though I do wish that he had been clearer about what exactly he was saying.

Buck's Shortform

Buck2mo2512

@Eliezer Yudkowsky tweets:

> @julianboolean_: the biggest lesson I've learned from the last few years is that the "tiny gap between village idiot and Einstein" chart was completely wrong
I agree that I underestimated this distance, at least partially out of youthful idealism.
That said, one of the few places where my peers managed to put forth a clear contrary bet was on this case. And I did happen to win that bet. This was less than 7% of the distance in AI's 75-year journey! And arguably the village-idiot level was only reached as of 4o or o1.

I was very interested to see this tweet. I have thought of that "Village Idiot and Einstein" claim as the most obvious example of a way that Eliezer and co were super wrong about how AI would go, and they've AFAIK totally failed to publicly reckon with it as it's become increasingly obvious that they were wrong over the last eight years.

It's helpful to see Eliezer clarify what he thinks of this point. I would love to see more from him on this--why he got this wrong, how updating changes his opinion about the rest of the problem, what he thinks now about time between different levels of intelligence.

Why it's hard to make settings for high-stakes control research

Buck3mo20

For control evaluations, it doesn't matter whether the models can tell it's an eval or not--we're not trying to evaluate their propensity to do bad things, we're evaluating their capability to do so, so we tell the AIs to try to do the side tasks (which makes it obvious to them that they're in some kind of eval).

ryan_greenblatt's Shortform

Buck3mo20

Ryan discusses this at more length in his 80K podcast.

What's worse, spies or schemers?

Buck3mo32

You can undeploy them, if you want!

One difficulty is again that the scheming is particularly correlated. Firing a single spy might not be traumatic for your organization's productivity, but ceasing all deployment of untrusted models plausibly grinds you to a halt.

And in terms of fixing them, note that it's pretty hard for fix spies! I think you're in a better position for fixing schemers than spies, e.g. see here.

How much novel security-critical infrastructure do you need during the singularity?

Buck3mo40

I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.

I agree that the increasing value of compute might increase the isolation you use for the reason you said. One reason I'm skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.

Making deals with early schemers

Buck4mo1323

The thing you're describing here is committing to punish the AIs for defecting against us. I think that this is immoral and an extremely bad idea.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments