Lauren (often wrong)

I go by "Lauren (often wrong)" on most public websites these days, eg bluesky, inspired by Often Wrong Soong, Data's creator in Star Trek.

I want literally every human to get to go to space often and come back to a clean and cozy world.

[updated 2023/03] Mad Librarian. Bio overview: Crocker's Rules; Self-taught research approach; Finding stuff online & Paper list posts; Safety & multiscale micro-coprotection objectives; My research plan and recent history.

:: The all of disease is as yet unended. It has never once been fully ended before. ::

Please critique eagerly - I try to accept feedback/Crocker's rules but fail at times; I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I'll try to reciprocate kindly. More communication between researchers is needed, anyhow. I downvote only unhelpful rudeness, call me on it if I'm unfair. I can be rather passionate, let me know if I missed a spot being kind while passionate.

.... We shall heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ....

I'm self-taught, often missing concepts, but usually pretty good at knowing what I know; I often compare my learning to a visual metaphor of jump point search, in contrast to schooled folks' A*. I don't defer on timelines at all - my view is it's obvious to any who read enough research what big labs' research plans must be to make progress, just not easy to agree on when they'll succeed, and it requires a lot of knowledge to actually make the progress on basic algorithms, and then a ton of compute to see if you did it right. But as someone who learns heavily out of order, I believe this without being able to push SOTA myself. It's why I call myself a librarian.

Let's speed up safe capabilities and slow down unsafe capabilities. Just be careful with it! Don't get yourself in denial thinking it's impossible to predict, just get arrogant and try to understand, because just like capabilities, safety is secretly easy, we just haven't figured out exactly why yet. learn what can be learned pre-theoretically about the manifold of co-protective agency and let's see if we (someone besides me, probably) can figure out how to distill that into exact theories that hold up.

.:. To do so, we must know it will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.

some current favorite general links (somewhat related to safety, but human-focused):

  • https://www.microsolidarity.cc/ - incredible basic guide on how to do human micro-coprotection. It's not the last guide humanity will need, but it's a wonderful one.
  • https://activisthandbook.org/ - solid intro to how to be a more traditional activist. If you care about bodily autonomy, freedom of form, trans rights, etc, I'd suggest at least getting a sense of this.
  • https://metaphor.systems/ - absolutely kickass search engine.

More about me:

  • ex startup founder. it went ok, not a unicorn, I burned out in 2019. couple of jobs since, quit last one early 2022. Independent mad librarian from savings until I run out, possibly joining a research group soon.
  • lots of links in my shortform to youtube channels I like

:.. make all safe faster: end bit rot, forget no non-totalizing aesthetic's soul. ..:

(I type partially with voice recognition, mostly with Talon, patreon-funded freeware which I love and recommend for voice coding; while it's quite good, apologies for trivial typos!)

Wiki Contributions

Comments

I think there may have been a communication error. It sounded to me like you were making the point that the policy does not have to internalize the reward function, but he was making the point that the training setup does attempt to find a policy that maximizes-as-far-as-it-can-tell the reward function. in other words, he was saying that reward is the optimization target of RL training, you were saying reward is not the optimization target of policy inference. Maybe.

well, the fact that I don't have an answer ready is itself a significant component of an answer to my question, isn't it?

A friend on an alignment chat said something to the effect of:

i think they are just sorely underestimating again and again the difference between a cute gang of sincere EA red teamers and the internet. the internet is where [...] lives for gods sake.

And so I figured I'd come here and ask about it. This eval seems super shallow, only checking if the model is, on its own, trying to destroy the world. Seems rather shallow and uncreative - it barely touched on any of the jailbreaks or ways to pressure or trick the model into misbehaving.

I do think there's real risk there even with base models, but it's important to be clear where it's coming from - simulators can be addictive when trying to escape the real world. Your agency needs to somehow aim away from the simulator, and use the simulator as an instrumental tool.

my impression is that by simulator and simulacra this post is not intending to claim that the thing it is simulating is realphysics but rather that it learns a general "textphysics engine", the model, which runs textphysics environments. it's essentially just a reframing of the prediction objective to describe deployment time - not a claim that the model actually learns a strong causal simplification of the full variety of real physics.