I'm glad to see this post come out. I've previously opined that solving these kinds of problems is what proves a field has become paradigmatic:
Paradigms gain their status because they are more successful than their competitors in solving a few problems that the group of practitioners has come to recognize as acute. ––Thomas Kuhn
It has been proven many times across scientific fields that a method that can solve these proxy tasks is more likely to achieve an application. The approaches sketched out here seem like a particularly good fit for a large lab like GDM, because the North Star can be somewhat legible and the team has enough resources to tackle a series of proxy tasks that are relevant and impressive. Not that it would be a bad fit elsewhere either.
Seems difficult for three reasons:
Given these difficulties I'd put it below 50/50, but this challenge seems significantly harder than the one I think we will actually face, which is more like having an AI we can defer to that doesn't try to sabotage the next stage of AI research, for each stage until the capability level that COULD disempower 2025 humanity, plus maybe other things to keep the balance of power stable.
Also I'm not sure what "drawn from the same distribution" means here, AI safety is trying dozens of theoretical and empirical directions, plus red teaming and model organisms can get much more elaborate with impractical resource investments, so things will look very qualitatively different in a couple of decades.
Thanks for writing this, I've suspected for a while that we're ahead, which is great but a bit emotionally difficult when I'd spent basically my whole career with the goal of heroically solving an almost impossible problem. And this is a common view among AI safety people, e.g. Ethan Perez said recently he's focused on problems other than takeover due to not being as worried about it.
I do expect some instrumental pressure towards misaligned power-seeking, but the number of tools we have to understand, detect, and prevent it is now large enough that it seems we'll be fine until the Dyson sphere is under construction, at which point things are a lot more uncertain but probably we'll figure something out there too.
Agree that your research didn't make this mistake, and MIRI didn't make all the same mistakes as OpenAI. I was responding in context of Wei Dai's OP about the early AI safety field. At that time, MIRI was absolutely being uncooperative: their research was closed, they didn't trust anyone else to build ASI, and their plan would end in a pivotal act that probably disempowers some world governments and possibly ends up with them taking over the world. Plus they descended from a org whose goal was to build ASI before Eliezer realized alignment should be the focus. Critch complained as late as 2022 that if there were two copies of MIRI, they wouldn't even cooperate with each other.
It's great that we have the FLI statement now. Maybe if MIRI had put more work into governance we could have gotten it a year or two earlier, but it took until Hendrycks got involved for the public statements to start.
We absolutely do need to "race to build a Friendly AI before someone builds an unFriendly AI". Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.
Disagree, the fact that there needs to be a friendly AI before an unfriendly AI doesn't mean building it should be plan A, or that we should race to do it. It's the same mistake OpenAI made when they let their mission drift from "ensure that artificial general intelligence benefits all of humanity" to being the ones who build an AGI that benefits all of humanity.
Plan A means it would deserve more resources than any other path, like influencing people by various means to build FAI instead of UFAI.
Also mistakes, from my point of view anyway
The Interwebs seem to indicate that that's only if you give it a laser spot to aim at, not with just GPS.
Good catch.
Agree grenade sized munitions won't damage buildings, I think the conversation is drifting between FPVs and other kinds of drones, and also between various settings, so I'll just state my beliefs.
Nor would I. In WWII bombers didn't even know where they were, but we have GPS now such that Excalibur guided artillery shells can get 1m CEP. And the US and possibly China can use Starlink and other constellations for localization even when GPS is jammed. I would guess 20m is easily doable with good sensing and dumb bombs, which would at least hit a building.
IMO it is too soon to tell whether drone defense will hold up to countercountermeasures.
I agree that Israel will probably be less affected than larger, poorer countries, but given that drones have probably killed over 200,000 people in Ukraine even a small percentage of this would be a problem for Israel.
To clarify I'm not very confident that AI will be aligned; I still have a >5% p(takeover doom | 10% of AI investment is spent on safety). I'm not really sure why it feels different emotionally but I guess this is just how brains are sometimes.