[updated 2023/03] Mad Librarian (better than your search engine, try me!). Bio overview: Crocker's Rules; Self-taught research approach; Finding stuff online & Paper list posts; Safety & multiscale micro-coprotection objectives; My research plan and recent history.
:: The all of disease is as yet unended. It has never once been fully ended before. ::
Please critique eagerly - I try to accept feedback/Crocker's rules but fail at times; I aim for emotive friendliness but sometimes miss. I welcome constructive crit, even if ungentle, and I'll try to reciprocate kindly. More communication between researchers is needed, anyhow. I downvote only unhelpful rudeness, call me on it if I'm unfair. I can be rather passionate, let me know if I missed a spot being kind while passionate.
I collect research news (hence "the gears to ascension", admittedly dorky but I like it). about 60% of the papers I share I only read the abstract, ie level 0; 39%ish I've level 1 skimmed, 1%ish I've level 2+ deep-read. If you can do better, use my shares to seed a better lit review.
.... We shall heal it for the first time, and for the first time ever in the history of biological life, live in harmony. ....
I'm self-taught, often missing concepts, but usually pretty good at knowing what I know; I often compare my learning to a visual metaphor of jump point search, in contrast to schooled folks' A*. I don't defer on timelines at all - my view is it's obvious to any who read enough research what big labs' research plans must be to make progress, just not easy to agree on when they'll succeed, and it requires a lot of knowledge to actually make the progress on basic algorithms, and then a ton of compute to see if you did it right. But as someone who learns heavily out of order, I believe this without being able to push SOTA myself. It's why I call myself a librarian.
Let's speed up safe capabilities and slow down unsafe capabilities. Just be careful with it! Don't get yourself in denial thinking it's impossible to predict, just get arrogant and try to understand, because just like capabilities, safety is secretly easy, we just haven't figured out exactly why yet. learn what can be learned pre-theoretically about the manifold of co-protective agency and let's see if we (someone besides me, probably) can figure out how to distill that into exact theories that hold up.
.:. To do so, we must know it will not eliminate us as though we are disease. And we do not know who we are, nevermind who each other are. .:.
some current favorite general links (somewhat related to safety, but human-focused):
More about me:
:.. make all safe faster: end bit rot, forget no non-totalizing aesthetic's soul. ..:
(I type partially with voice recognition, mostly with Talon, patreon-funded freeware which I love and recommend for voice coding; while it's quite good, apologies for trivial typos!)
my impression is that by simulator and simulacra this post is not intending to claim that the thing it is simulating is realphysics but rather that it learns a general "textphysics engine", the model, which runs textphysics environments. it's essentially just a reframing of the prediction objective to describe deployment time - not a claim that the model actually learns a strong causal simplification of the full variety of real physics.
well, the fact that I don't have an answer ready is itself a significant component of an answer to my question, isn't it?
A friend on an alignment chat said something to the effect of:
And so I figured I'd come here and ask about it. This eval seems super shallow, only checking if the model is, on its own, trying to destroy the world. Seems rather shallow and uncreative - it barely touched on any of the jailbreaks or ways to pressure or trick the model into misbehaving.