Thanks for writing this, I've suspected for a while that we're ahead, which is great but a bit emotionally difficult when I'd spent basically my whole career with the goal of heroically solving an almost impossible problem. And this is a common view among AI safety people, e.g. Ethan Perez said recently he's focused on problems other than takeover due to not being as worried about it.
I do expect some instrumental pressure towards misaligned power-seeking, but the number of tools we have to understand, detect, and prevent it is now large enough that it seems we'll be fine until the Dyson sphere is under construction, at which point things are a lot more uncertain but probably we'll figure something out there too.
Agree that your research didn't make this mistake, and MIRI didn't make all the same mistakes as OpenAI. I was responding in context of Wei Dai's OP about the early AI safety field. At that time, MIRI was absolutely being uncooperative: their research was closed, they didn't trust anyone else to build ASI, and their plan would end in a pivotal act that probably disempowers some world governments and possibly ends up with them taking over the world. Plus they descended from a org whose goal was to build ASI before Eliezer realized alignment should be the focus. Critch complained as late as 2022 that if there were two copies of MIRI, they wouldn't even cooperate with each other.
It's great that we have the FLI statement now. Maybe if MIRI had put more work into governance we could have gotten it a year or two earlier, but it took until Hendrycks got involved for the public statements to start.
We absolutely do need to "race to build a Friendly AI before someone builds an unFriendly AI". Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.
Disagree, the fact that there needs to be a friendly AI before an unfriendly AI doesn't mean building it should be plan A, or that we should race to do it. It's the same mistake OpenAI made when they let their mission drift from "ensure that artificial general intelligence benefits all of humanity" to being the ones who build an AGI that benefits all of humanity.
Plan A means it would deserve more resources than any other path, like influencing people by various means to build FAI instead of UFAI.
Also mistakes, from my point of view anyway
The Interwebs seem to indicate that that's only if you give it a laser spot to aim at, not with just GPS.
Good catch.
Agree grenade sized munitions won't damage buildings, I think the conversation is drifting between FPVs and other kinds of drones, and also between various settings, so I'll just state my beliefs.
Nor would I. In WWII bombers didn't even know where they were, but we have GPS now such that Excalibur guided artillery shells can get 1m CEP. And the US and possibly China can use Starlink and other constellations for localization even when GPS is jammed. I would guess 20m is easily doable with good sensing and dumb bombs, which would at least hit a building.
IMO it is too soon to tell whether drone defense will hold up to countercountermeasures.
I agree that Israel will probably be less affected than larger, poorer countries, but given that drones have probably killed over 200,000 people in Ukraine even a small percentage of this would be a problem for Israel.
The data is pretty low-quality for that graph because the agents we used were inconsistent and Claude 3-level models could barely solve any tasks. Epoch has better data for SWE-bench Verified, which I converted to time horizon here and found to also be doubling every 4 months ish. Their elicitation is probably not as good for OpenAI as Anthropic models, but both are increasing at similar rates.
How does your work differ from Forking Paths in Neural Text Generation (Bigelow et al.) from ICLR 2025?
Seems difficult for three reasons:
Given these difficulties I'd put it below 50/50, but this challenge seems significantly harder than the one I think we will actually face, which is more like having an AI we can defer to that doesn't try to sabotage the next stage of AI research, for each stage until the capability level that COULD disempower 2025 humanity, plus maybe other things to keep the balance of power stable.
Also I'm not sure what "drawn from the same distribution" means here, AI safety is trying dozens of theoretical and empirical directions, plus red teaming and model organisms can get much more elaborate with impractical resource investments, so things will look very qualitatively different in a couple of decades.