AI ALIGNMENT FORUM
AF

Nicky Pochinkov
Ω99721
Message
Dialogue
Subscribe

Nicky Pochinkov

https://nicky.pro

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Ban development of unpredictable powerful models?
NickyP2y30

Maybe not fully understanding, but one issue I see is that without requiring "perfect prediction", one could potentially Goodhart on on the proposal. I could imagine something like:

In training GPT-5, add a term that upweights very basic bigram statistics. In "evaluation", use your bigram statistics table to "predict" most topk outputs just well enough to pass.

This would probably have a negative impact to performance, but this could possibly be tuned to be just sufficient to pass. Alternatively, one could use a toy model trained on the side that is easy to understand, and regularise the predictions on that instead of exactly using bigram statistics, just enough to pass the test, but still only understanding the toy model.

Reply
ChatGPT struggles to respond to the real world
NickyP3y10

I suspect that with a tuned initial prompt that ChatGPT would do much better. For example, something like:

Simulate an assistant on the other end of a phone call, who is helping me to cook a turmeric latte in my kitchen I have never cooked before and need extremelly specific. Only speak one sentence at a time. Only explain one instruction at a time. Never say "and". Please ask clarifying questions if necessary. Only speak one sentence at a time, and await a response. Be explicit about: 
- where I need to go.
- what I need to get
- where I need to bring things

Do you understand? Say "I Accept" and we can begin

I have not fully tested this, but I guess a tuned prompt of this sort would make it possible, though it is not tuned to amswer this way by default. ( ChatGPT can also simulate a virtual linux shell )

In addition, I have found it is much better when you go back and edit the prompt before an incorrect answer as it starts to reference itself a lot. Though I also expect that in this situaton having a reference recipie at the top would be useful.

Reply
Machine Unlearning
2y
(+1330)
4Literature Review of Text AutoEncoders
6mo
0
14AISC 2024 - Project Summaries
2y
0
13Machine Unlearning Evaluations as Interpretability Benchmarks
2y
0
7Ideation and Trajectory Modelling in Language Models
2y
0
38LLM Modularity: The Separability of Capabilities in Large Language Models
2y
0
23LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space
3y
0
6Speculation on Path-Dependance in Large Language Models.
3y
2
7How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)
3y
1