AI ALIGNMENT FORUM
Petrov Day
AF

41
Christian Kleineidam
Ω71111
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
0ChristianKl's Shortform
6y
0
0ChristianKl's Shortform
6y
0
Consent across power differentials
ChristianKl1y22

I'm surprised by the list of forms of power by what it leaves out.

A stereotypical example of power differences is bosses having relationships with their employees.

The boss has power over a different domain of the life of the employee than the domain of the relationship.

It's the problem of corruption where power from one domain leaks into a different domain where it doesn't belong.

If there's an option to advance one's career by sleeping with one's boss, that makes it issues of consent more tricky. Career incentives might pressure a person in the relationship even if they wouldn't want to be in it otherwise.

Reply
Examples of practical implications of Judea Pearl's Causality work
ChristianKl3y00

What is selection for modularity and why is it important?

Reply
Is there a convenient way to make "sealed" predictions?
ChristianKl3y00

Mailing an envelope to your self does not allow other people to verify whether the envelope wasn't opened in between.

Maybe a quality forensic lab has the ability to tell whether the envelope was opened in between but most people you might show the letter don't.

Reply
Intelligence or Evolution?
ChristianKl4y00

Error correction codes help a superintelligence to avoid self-modifying but they don't allow goals necessarily to be stable with changing reasoning abilities. 

Reply
BASALT: A Benchmark for Learning from Human Feedback
ChristianKl4y00

The AI safety community claims it is hard to specify reward functions. If we actually believe this claim, we should be able to create tasks where even if we allow people to specify reward functions, they won't be able to do so. That's what we've tried to do here.

It being hard to specify reward functions for a specific task and it being hard to specify reward functions for a more general AGI seem to me like two very different problems. 

Additionally, developing a safe system and developing a nonsafe system are very different. Even if your reward function works 99,9% of the time it can be exploited in those cases where it fails.

Reply
Developmental Stages of GPTs
ChristianKl5y*40

Maybe put out some sort of prize for the best ideas for plans?

Reply
Are we in an AI overhang?
ChristianKl5y30

That's the terrifying thing about NNs and what I dub the "neural net overhang": the cost to create a powerful NN is millions of times greater than the cost to run that NN.

I'm not sure why that's terrifying. It seems reassuring to me because it means that there's no way for the NN to suddenly go FOOM because it can't just quickly retrain.

Reply
Goodhart Taxonomy
ChristianKl8y01

I upvoted the post for the general theory. On the other hand, I think the examples could be more clear and it would be good to find examples that are more commonly faced.

Reply
Anki
a year ago
Marine Cloud Brightening
2 years ago
Basic Questions
2 years ago
(+87)
Covid-19 Origins
3 years ago
Tulpa
3 years ago
ChatGPT
3 years ago
Fecal Microbiota Transplants
3 years ago