All Posts

Sorted by Magic (New & Upvoted)

Wednesday, March 27th 2024
Wed, Mar 27th 2024

Quick Takes
2Linda Linsefors18h
Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I'm pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me: Saturday Morning Breakfast Cereal - Law (smbc-comics.com) Law is not a good representation or explanation of most of what we care about, because it's not trying to be. Law is mainly focused on the contentious edge cases.  Training an AI on trolly problems and other ethical dilemmas is even worse, for the same reason. 
3
Wiki/Tag Page Edits and Discussion

Sunday, March 24th 2024
Sun, Mar 24th 2024

No posts for March 24th 2024
Wiki/Tag Page Edits and Discussion

Saturday, March 23rd 2024
Sat, Mar 23rd 2024

No posts for March 23rd 2024
Wiki/Tag Page Edits and Discussion

Friday, March 22nd 2024
Fri, Mar 22nd 2024

Quick Takes
17Fabien Roger7d
I just finished listening to The Hacker and the State by Ben Buchanan, a book about cyberattacks, and the surrounding geopolitics. It's a great book to start learning about the big state-related cyberattacks of the last two decades. Some big attacks /leaks he describes in details: * Wire-tapping/passive listening efforts from the NSA, the "Five Eyes", and other countries * The multi-layer backdoors the NSA implanted and used to get around encryption, and that other attackers eventually also used (the insecure "secure random number" trick + some stuff on top of that) * The shadow brokers (that's a *huge* leak that went completely under my radar at the time) * Russia's attacks on Ukraine's infrastructure * Attacks on the private sector for political reasons * Stuxnet * The North Korea attack on Sony when they released a documentary criticizing their leader, and misc North Korean cybercrime (e.g. Wannacry, some bank robberies, ...) * The leak of Hillary's emails and Russian interference in US politics * (and more) Main takeaways (I'm not sure how much I buy these, I just read one book): * Don't mess with states too much, and don't think anything is secret - even if you're the NSA * The US has a "nobody but us" strategy, which states that it's fine for the US to use vulnerabilities as long as they are the only one powerful enough to find and use them. This looks somewhat nuts and naive in hindsight. There doesn't seem to be strong incentives to protect the private sector. * There are a ton of different attack vectors and vulnerabilities, more big attacks than I thought, and a lot more is publicly known than I would have expected. The author just goes into great details about ~10 big secret operations, often speaking as if he was an omniscient narrator. * Even the biggest attacks didn't inflict that much (direct) damage (never >10B in damage?) Unclear if it's because states are holding back, if it's because they suck, or if it's because it's hard. It seem
Wiki/Tag Page Edits and Discussion

Wednesday, March 20th 2024
Wed, Mar 20th 2024

Quick Takes
30Lawrence Chan9d
I finally got around to reading the Mamba paper. H/t Ryan Greenblatt and Vivek Hebbar for helpful comments that got me unstuck.  TL;DR: authors propose a new deep learning architecture for sequence modeling with scaling laws that match transformers while being much more efficient to sample from. A brief historical digression As of ~2017, the three primary ways people had for doing sequence modeling were RNNs, Conv Nets, and Transformers, each with a unique “trick” for handling sequence data: recurrence, 1d convolutions, and self-attention.   * RNNs are easy to sample from — to compute the logit for x_t+1, you only need to store the most recent hidden state h_t and the last token x_t, which means it’s both fast and memory efficient. RNNs generate a sequence of length L with O(1) memory and O(L) time. However, they’re super hard to train, because  you need to sequentially generate all the hidden states and then (reverse) sequentially calculate. The way you actually did this is called backpropogation through time — you basically unroll the RNN over time — which requires constructing a graph of depth equal to the sequence length. Not only was this slow, but the graph being so deep caused vanishing/exploding gradients without careful normalization. The strategy that people used was to train on short sequences and finetune on longer ones. That being said, in practice, this meant you couldn’t train on long sequences (>a few hundred tokens) at all. The best LSTMs for modeling raw audio could only handle being trained on ~5s of speech, if you chunk up the data into 25ms segments. * Conv Nets had a fixed receptive field size and pattern, so weren’t that suited for long sequence  modeling. Also, generating each token takes O(L) time, assuming the receptive field is about the same size as the sequence. But they had significantly more stability (the depth was small, and could be as low as O(log(L))), which meant you could train them a lot easier. (Also, you could use FFT
8

Tuesday, March 19th 2024
Tue, Mar 19th 2024

No posts for March 19th 2024

Load More Days