orthonormal's Comments

Bottle Caps Aren't Optimisers

Okay, so another necessary condition for being downstream from an optimizer is being causally downstream. I'm sure there are other conditions, but the claim still feels like an important addition to the conversation.

Bottle Caps Aren't Optimisers

I'm surprised nobody has yet replied that the two examples are both products of significant optimizers with relevant optimization targets, and that the naive definition seems to work with one modification:

A system is downstream from an optimizer of some objective function to the extent that that objective function attains much higher values than would be attained if the system didn't exist, or were doing some other random thing.

Embedded Agents

Insofar as the AI Alignment Forum is part of the Best-of-2018 Review, this post deserves to be included. It's the friendliest explanation to MIRI's research agenda (as of 2018) that currently exists.

The Credit Assignment Problem
Removing things entirely seems extreme.

Dropout is a thing, though.

The Credit Assignment Problem

Shapley Values [thanks Zack for reminding me of the name] are akin to credit assignment: you have a bunch of agents coordinating to achieve something, and then you want to assign payouts fairly based on how much each contribution mattered to the final outcome.

And the way you do this is, for each agent you look at how good the outcome would have been if everybody except that agent had coordinated, and then you credit each agent proportionally to how much the overall performance would have fallen off without them.

So what about doing the same here- send rewards to each contributor proportional to how much they improved the actual group decision (assessed by rerunning it without them and seeing how performance declines)?

Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More

Good comment. I disagree with this bit:

I would, for instance, predict that if Superintelligence were published during the era of GOFAI, all else equal it would've made a bigger splash because AI researchers then were more receptive to abstract theorizing.

And then it would probably have been seen as outmoded and thrown away completely when AI capabilities research progressed into realms that vastly surpassed GOFAI. I don't know that there's an easy way to get capabilities researchers to think seriously about safety concerns that haven't manifested on a sufficient scale yet.

Proposal for an Implementable Toy Model of Informed Oversight

I like this suggestion of a more feasible form of steganography for NNs to figure out! But I think you'd need further advances in transparency to get useful informed oversight capabilities from (transformed or not) copies of the predictive network.

HCH as a measure of manipulation

I should have said "reliably estimate HCH"; I'd also want quite a lot of precision in addition to calibration before I trust it.

HCH as a measure of manipulation

Re #2, I think this is an important objection to low-impact-via-regularization-penalty in general.

HCH as a measure of manipulation

Re #1, an obvious set of questions to include in are questions of approval for various aspects of the AI's policy. (In particular, if we want the AI to later calculate a human's HCH and ask it for guidance, then we would like to be sure that HCH's answer to that question is not manipulated.)

Load More