Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.
We don't know why the +2000 vector works but the +100 vector doesn't.
My guess is it's because in the +100 case the vectors are very similar, causing their difference to be something un-natural.
"I talk about weddings constantly " and "I do not talk about weddings constantly" are technically opposites. But if you imagine someone saying this, you notice that their neural language meaning is almost identical.
What sort of person says "I do not talk about weddings constantly"? That sounds to me like someone who talks about weddings almost constantly. Why else would they feel the need to say that?
To steer a forward pass with the "wedding" vector, we start running an ordinary GPT-2-XL forward pass on the prompt "I love dogs" until layer 6. Right before layer 6 begins, we now add in the cached residual stream vectors from before:
I have a question about the image above this text.
Why do you add the embedding from the [<endofotext> -> "The"] stream? This part has no information about wedding.
If you think it would be helpful, you are welcome to suggest a meta philpsophy topic for AI Safety Camp.
More info at aisafety.camp. (I'm typing on a phone, I'll add actuall link later if I remember too)
But I think orgs are more likely to be well-known to grant-makers on average given that they tend to have a higher research output,
I think your getting the causality backwards. You need money first, before there is an org. Unless you count informal multi people collaborations as orgs.
I think people how are more well-known to grant-makers are more likely to start orgs. Where as people who are less known are more likely to get funding at all, if they aim for a smaller garant, i.e. as an independent researcher.
Counter point. After the FTX collapse, OpenPhil said publicly (some EA Forum post) that they where raising their bar for funding. I.e. there are things that would have been funded before that would now not be funded. The stated reason for this is that there are generally less money around, in total. To me this sounds like the thing you would do if money is the limitation.
I don't know why OpenPhil don't spend more. Maybe they have long timelines and also don't expect any more big donors any time soon? And this is why they want to spend carefully?
From what I can tell, the field have been funding constrained since the FTX collapse.
What I think happened:
FTX had lots of money and a low bar for funding, which meant they spread a lot of money around. This meant that more project got started, and probably even more people got generally encouraged to join. Probably some project got funded that should not have been, but probably also some really good projects got started that did not get money before because not clearing the bar before due to not having the right connections, or just bad att writing grant proposals. In short FTX money and the promise of FTX money made the field grow quickly. Also there where where also some normal field growth. AIS has been growing steadily for a while.
Then FTX imploded. There where lots of chaos. Grants where promised but never paid out. Some orgs don't what to spend the money they did get from FTX because of risk of clawback risks. Other grant makers cover some of this but not all of this. It's still unclear what the new funding situation is.
Some months later, SFF, FTX and Nonlinear Network have their various grant rounds. Each of them get overwhelmed with applications. I think this is mainly from the FTX induced growth spurt, but also partly orgs still trying to recover from loss of FTX money, and just regular growth. Either way, the outcome of these grant rounds make it clear that the funding situation has changed. The bar for getting funding is higher than before.
Todays thoughts:
I suspect it's not possible to build autonomous aligned AIs (low confidence). The best we can do is some type of hybrid humans-in-the-loop system. Such a system will be powerful enough to eventually give us everything we want, but it will also be much slower and intellectually inferior to what is possible with out humans-in-the-loop. I.e. the alignment tax will be enormous. The only way the safe system can compete, is by not building the unsafe system.
Therefore we need AI Governance. Fortunately, political action is getting a lot of attention right now, and the general public seems to be positively inclined to more cautious AI development.
After getting an immediate stop/paus on larger models, I think next step might be to use current AI to cure aging. I don't want to miss the singularity because I died first, and I think I'm not the only one who feels this way. It's much easier to be patient and cautious in a world where aging is a solved problem.
We probably need a strict ban on building autonomous superintelligent AI until we reached technological maturity. It's probably not a great idea to build them after that either, but they will probably not pose the same risk any longer. This last claim is not at all obvious. The hardest attack vector to defend against would be manipulation. I think reaching technological maturity will make us able to defend against any military/hard-power attack. This includes for example having our own nano-bot defence system, to defend against hostile nanobots. Manipulation is harder, but I think there are ways to solve that, with enough time to set up our defences.
An important crux for what there end goal is, including if there is some stable end where we're out of the danger, is to what extent technological maturity also leads to a stable cultural/political situation, or if that keeps evolving in ever new directions.
Recently an AI safety researcher complained to me about some interaction they had with an AI Safety communicator. Very stylized, there interaction went something like this:
(X is some fact or topic related to AI Safety
Communicator: We don't know anything about X and there is currently no research on X.
Researcher: Actually, I'm working on X, and I do know some things about X.
Communicator: We don't know anything about X and there is currently no research on X.
I notice that I semi-frequently hear communicators saying things like the thing above. I think what they mean is that our our understanding of X is far from the understanding that is needed, and the amount of researchers working on this is much fewer than what would be needed, and this get rounded off to we don't know anything and no one is doing anything about it. If this is what is going on then I think this is bad.
I think that is some cases when someone says "We don't know anything about X and there is currently no research on X." they probably literally mean it. There are some people who think that approximately no-one working on AI Safety is doing real AI Safety researchers. But I also think that most people who are saying "We don't know anything about X and there is currently no research on X." are doing some mixture of rounding off, some sort of unreflexively imitation learning, i.e. picking up the sentence structure from others, especially from high status people.
I think using a language that hides the existence of the research that does exist is bad. Primarily because it's misinformative. Do we want all new researchers to start from scratch? Because that is what happens if you tell them there is no pre-existing research and they believe you.
I also don't think this exaggeration will help with recruitment. Why do you think people would prefer to join a completely empty research field instead of a small one? From a personal success perspective (where success can mean either impact or career success) a small research field is great, lots if low-hanging fruit around. But a completely untrodden research direction is terrible, you will probably just get lost, not get anything done, and even if you fid something, there's nowhere to publish it.
Recording though in progress...
I notice that I don't expect FOOM like RSI, because I don't expect we'll get an mesa optimizer with coherent goals. It's not hard to give the outer optimiser (e.g. gradient decent) a coherent goal. For the outer optimiser to have a coherent goal is the default. But I don't expect that to translate to the inner optimiser. The inner optimiser will just have a bunch of heuristics and proxi-goals, and not be very coherent, just like humans.
The outer optimiser can't FOOM, since it don't do planing, and don't have strategic self awareness. It's can only do some combination of hill climbing and random trial and error. If something is FOOMing it will be the inner optimiser, but I expect that one to be a mess.
I notice that this argument don't quite hold. More coherence is useful for RSI, but complete coherence is not necessary.
I also notice that I expect AIs to make fragile plans, but on reflection, I expect them to gett better and better with this. By fragile I mean that the longer the plan is, the more likely it is to break. This is true for human too though. But we are self aware enough about this fact to mostly compensate, i.e. make plans that don't have too many complicated steps, even if the plan spans a long time.
Yes!
Yes! The big question to me is if we can reduced error rates enough. And "error rates" here is not just hardware signal error, but also randomness that comes from interacting with the environment.
It has to be smooth relative to the jumps the jumps that can be achieved what ever is generating the variation. Natural mutation don't typically do large jumps. But if you have a smal change in motivation for an intelligent system, this may cause a large shift in behaviour.
I though so too to start with. I still don't know what is the right conclusion, but I think that substrate-needs convergence it at least still a risk even with a singleton. Something that is smart enough to be a general intelligence, is probably complex enough to have internal parts that operate semi independently, and therefore these parts can compete with each other.
I think the singleton scenario is the most interesting, since I think that if we have several competing AI's, then we are just super doomed.
And by singleton I don't necessarily mean a single entity. It could also be a single alliance. The boundaries between group and individual is might not be as clear with AIs as with humans.
This will probably be correct for a time. But will it be true forever? One of the possible end goals for Alignment research is to build the aligned super intelligence that saves us all. If substrate convergence is true, then this end goal is of the table. Because even if we reach this goal, it will inevitable start to either value drift towards self replication, or get eaten from the inside by parts that has mutated towards self replication (AI cancer), or something like that.
Cancer is an excellent analogy. Humans defeat it in a few ways that works together
Point 4 is very important. If there is only one agent, this agent needs perfect cancer fighting ability to avoid being eaten by natural selection. The big question to me is: Is this possible?
If you on the other hand have several agents, they you defiantly don't escape natural selection, because these entities will compete with each other.