Sorted by New

# Wiki Contributions

I'd be interested to know how you estimate the numbers here, they seem quite inflated to me.

If 4 big tech companies were to invest $50B each in 2023 then, assuming average salary as$300k and 2:1 capital to salary then investment would be hiring about 50B/900K = 55,000 people to work on this stuff. For reference the total headcount at these orgs is roughly 100-200K.

50B/yr is also around 25-50% of the size of the total income, and greater than profits for most which again seems high.

Perhaps my capital ratio is way too low but I would find it hard to believe that these companies can meaningfully put that level of capital into action so quickly. I would guess more on the order of \$50B between the major companies in 2023.

Agree with paul's comment above that timeline shifts are the most important variable.

Yeah I agree demonstrating it and examining it is very important, especially in chain-of-thought where the existence of hidden information isn't a certainty.

They're a bit out of date but the original proposal is ELK Sub - Note-taking in internal rollouts and the multiple model protocol is in Note Taking Without Hidden Messages.

Haven't done any proper testing but it's high up on my agenda, would be interested in working out what tests would be best - will elaborate later.

For the ELK contest I was working on something similar, but where the problem is more acute (you're trying to force a MuZero-style reinforcement learner to compress its hidden state into words).

The methods I proposed to deal with this are:

• Cut down the null space by training a language model to rewrite the written words in a different format which preserves the meaning as closely as possible, so that the CoT reasoner is forced to respond to similar sentences in similar ways.
• Use multiple models which are trained separately but have to respond to each others' thoughts, making it more difficult to have a shared steganographic protocol.
• Use something like dropout on the chains of thought, so that it can't hide information in the combination of multiple sentences.
• Have some non-linguistic bits which we don't apply optimization to, but instead focus our intepretability efforts on. The hope is that this soaks up the highest-order bits that aren't easily expressible in language,so there's less need to hide them within words. Applying dropout to these is particularly necessary.

I'm really interested to see this progress, it would feel very healthy if we could have a solid integrated definition of optimizer to work with.

I'm not sure I understand why you don't agree with the 'small' criterion for the target set. It seems that you should be able to say something about the likelihood of the target in the absence of any agent (or if the agent takes a max-ent distribution over actions or something), and that's the relevant notion of smallness, which then becomes large in the presence of the agent. Or is it that you expect it to be difficult to properly specify what it means to have no agent or random decisions?

On the relationships between the three ways of defining acts - is there a trivial way of connecting (1) and (3) by saying that the action that the agent takes in (1) is just baked into the trajectory as some true fact about the trajectory that doesn't have consequences until the agent acts on it? Or instead of the action itself, we could 'bake in' the mapping from some information about the trajectory to the action. Either way we could see this as being determined initially or at the point of decision without a difference in the resulting trajectories.

'all dependent variables in the system of equations' - I think this should be 'independent'.

Suggestion:

Eliezer has huge respect in the community; he has strong, well thought-out opinions (often negative) on a lot of the safety research being done (with exceptions, Chris Olah mentioned a few times); but he's not able to work full time on research directly (or so I understand, could be way off).

Perhaps he should institute some kind of prize for work done, trying to give extra prestige and funding to work going in his preferred direction? Does this exist in some form without my noticing? Is there a reason it'd be bad? Time/energy usage for Eliezer combined with difficulty of delegation?

The Research Engineer job for the Alignment team is no longer open - is this because it's reached some threshold of applications? In any case might not be helpful to advertise!

Thanks for doing this though, the context is very useful (I've applied as RE to both).

The synthesis of these options would be an AGI research group whose plan consists of:

• Develop safe AGI.
• Try to convince world governments to perform some such pivotal act (Idea A) - note that per current institutions this needs consensus and strong implementation across all major and medium tech powers.
• Have a back-up plan, if AGI research is proliferating without impending shutdown, to shut down world research unilaterally (Idea B).

What do you think of such a plan?

I think this would be reasonable, but if the plan is taken up then it becomes a cost-benefit analysis of when Idea B should be deployed, which plausibly could be very aggressive, so it could easily boil down to just Idea B.

It's also worth noting that a research group with an AGI who want world governments to perform a pivotal act would need to be incredibly effective and persuasive. Their options would run a spectrum from normal public-channel and lobbying efforts to AGI-takes-over-the-world-behind-the-scenes (depending on sufficient capability), with a variety of AGI-assisted persuasion techniques in between. At some degree of AI/research group control over government, it's not clear if this would be an improvement over the original act. Demonstrating the power of AGI in a way that would force governments to listen would need to at least threaten a transformative act (self-driving cars, solving protein folding, passing normal Turing tests clearly aren't enough) and so the necessary levels of influence and demonstrated capability would be large (and demonstrating capability has obvious potential drawbacks in sparking arms races).

I mentioned it in my standalone post but I'll register a question here:

In the counterexamples for 'Strategy: train a reporter that is useful for another AI', the main difficulty is the ability for agents to hide information in human language somehow, given the many available degrees of freedom.

I grant that this is a big risk but one advantage we have is that if we trained multiple agents, they would all be encoding hidden information, but most likely they would all encode this extra information in different ways.

The question is, given multiple agents encoding hidden messages in the reporter's answers, can we find a training strategy which penalizes or removes any kind of information in the answers which is not common to all agents?

My suggestion, based on my idea of internal note-taking rather than reporting per-se, would be to take the 'notes' on the internal state written by one agent and feeding these to another agent. This seems like it would eventually converge on them sharing the same hidden encoding though, albeit with more difficulty. Apologies if not clear more spelled out in the full post, but yeah, are there better proposals, or reasons why this kind of strategy wouldn't work?

I wonder if the discussion of the scientific capabilities of e.g. GPT-3 would be more productive if it were anchored to some model of the wider scientific feedback loop in which it's situated?

Consider three scenarios:

• A: A model trained to predict the shapes of proteins from their DNA sequences.
• B: A model with access to custom molecule synthesizer and high-volume, high-quality feedback about the scientific value of its text production, trained to write papers with scientific value about the nature of the molecules produced.
• C: A model given the resources of an psychology department and trained to advance the state of psychological understanding (without just hiring other humans).

As we go from A to C we see a decrease in the quality of the feedback loop, and with it an increasing need for general, rather than narrow intelligence. I would argue that even A should count as doing science, since it advances the state of the art knowledge about an important phenomena, and current models are clearly capable of doing so. C is clearly well beyond the capabilities of GPT-3 and also many well qualified, intelligent scientists, because the feedback loop is so poor. B is intermediate and I expect beyond GPT-3 but I'm not confident that current techniques couldn't provide value.

If you're interested in taking the point further, perhaps one of you could specify the loosest scientific feedback loop that you think the current paradigm of AI is capable of meaningfully participating in?

Turns out our methods are not actually very path-dependent in practice!

Yeah I get that's what Mingard et al are trying to show but the meaning of their empirical results isn't clear to me - but I'll try and properly read the actual paper rather than the blog post before saying any more in that direction.

"Flat minimum surrounded by areas of relatively good performance" is synonymous with compression. if we can vary the parameters in lots of ways without losing much performance, that implies that all the info needed for optimal performance has been compressed into whatever-we-can't-vary-without-losing-performance.

I get that a truly flat area is synonymous with compression - but I think being surrounded by areas of good performance is anti-correlated with compression because it indicates redundancy and less-than-maximal sensitivity.

I agree that viewing it as flat eigendimensions in parameter space is the right way to think about it, I still worry that the same concerns apply that maximal compression in this space is traded against ease of finding what would be a flat plain in many dimensions, but a maximally steep ravine in all of the other directions. I can imagine this could be investigated with some small experiments, or they may well already exist but I can't promise I'll follow up, if anyone is interested let me know.