Logical counterfactuals are when you say something like "Suppose π=3 , what would that imply?" They play an important role in logical decision theory. Suppose you take a false proposition P and then take a logical counterfactual in which P is true. I am imagining this counterfactual as a function...
A response to this paper. https://asi-safety-lab.com/DL/Kill-Switch-For_ASI_EW_21_12_14.pdf A substantial fraction of my argument comes down to. It is plausible that a hypothetical hypercompetent civilization could coordinate to build such infrastructure. However we don't live in a hypothetical hypercompetent civilization. We are not remotely close, and any attempt to get us there...
Current LLM based AI systems are getting pretty good at maths by writing formal proofs in Lean or similar. https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ So, how can we use these models to help align AI? The Simple Alignment Solution Assumption states that many problems in alignment, for example corrigibility or value learning, have simple...
Microsoft has released a paper called "who is Harry Potter" in which they claim to make a neural network forget who Harry Potter is. https://www.microsoft.com/en-us/research/project/physics-of-agi/articles/whos-harry-potter-making-llms-forget-2/ Here are some of my predictions on how I think this method might fail. I am predicting publicly without looking at the evidence first. This...
Suppose you have some positive utility functions A,B , mathematically considered to be random variables dependant on choice of policy. Let K be a random variable that is 1 if the button is pressed, 0 otherwise. Again dependant on choice of policy. I am considering that there is a particular...
So this is a bunch of related technical questions about logical induction. Firstly, do you need the formal theorem prover section? Can you just throw out the formal theorem prover, but give some programs in the market unbounded capital and get the same resultant behaviour? (For example, give the program...
Suppose you have a ML model trained to output formal proofs. Maybe you start with ZFC and then add extra tokens for a range of common concepts. (along with definitions. ). So a human mathematician needs to type in the definition of a gradient in terms of limits, and the...