When Hindsight Isn't 20/20: Incentive Design With Imperfect Credit Allocation

A crew of pirates all keep their gold in one very secure chest, with labelled sections for each pirate. Unfortunately, one day a storm hits the ship, tossing everything about. After the storm clears, the gold in the chest is all mixed up. The pirates each know how much gold they had - indeed, they’re rather obsessive about it - but they don’t trust each other to give honest numbers. How can they figure out how much gold each pirate had in the chest?

Here’s the trick: the captain has each crew member write down how much gold they had, in secret. Then, the captain adds it all up. If the final amount matches the amount of gold in the chest, then we’re done. But if the final amount does not match the amount of gold in the chest, then the captain throws the whole chest overboard, and nobody gets any of the gold.

I want to emphasize two key features of this problem. First, depending on what happens, we may never know how much gold each pirate had in the chest or who lied, even in hindsight. Hindsight isn’t 20/20. Second, the solution to the problem requires outright destruction of wealth.

The point of this post is that these two features go hand-in-hand. There’s a wide range of real-life problems where we can’t tell what happened, even in hindsight; we’ll talk about three classes of examples. In these situations, it’s hard to design good incentives/mechanisms, because we don’t know where to allocate credit and blame. Outright wealth destruction provides a fairly general-purpose tool for such problems. It allows us to align incentives in otherwise-intractable problems, though often at considerable cost.

The Lemon Problem

Alice wants to sell her old car, and Bob is in the market for a decent quality used vehicle. One problem: while Alice knows that her car is in good condition (i.e. “not a lemon”), she has no cheap way to convince Bob of this fact. A full inspection by a neutral third party would be expensive, Bob doesn’t have the skills to inspect the car himself, and any words Alice speaks on the matter could just as easily be spoken by someone selling a lemon.

In order to convince Bob that the car is not a lemon, Alice needs to say or do something which a lemon-seller would not. What can she do?

One easy answer: offer to pay for any mechanical problems which come up after the sale. If Alice knew about expensive mechanical problems hiding under the car’s hood, then she wouldn’t offer Bob this sort of insurance (at least not for a low price). Conversely, if Alice is reasonably confident there are no mechanical problems, then offering to pay for the probably-non-existent problems costs her little.

There is one problem with this approach, however: if Alice is paying for mechanical problems, then Bob has no incentive to take good care of the car.

Ideally, if we could figure out in hindsight which problems were already present at the time of the sale, then Alice could offer to pay for only problems which were present beforehand. But in practice, if the car’s brakes fail 6 months or a year after the sale, we have no way to tell when the problem began. Were they already worn down, or has Bob been visiting the racetrack?

We can get a less-than-perfect solution using a proxy. For instance, if the car’s belt snaps a week after the sale, then it was probably frayed beforehand. If it snaps five years after the sale, then it probably wasn’t a noticeable issue beforehand. In this case, we can use time-at-which-a-problem-is-detected as a proxy for whether-a-problem-was-present-at-time-of-sale. This isn’t perfectly reliable, and there will be grey areas, but it gets us one step closer to figuring out in hindsight what happened.

Alternatively, we could try to align incentives without figuring out what happened in hindsight, using a trick similar to our pirate captain throwing the chest overboard. The trick is: if there’s a mechanical problem after the sale, then both Alice and Bob pay for it. I do not mean they split the bill; I mean they both pay the entire cost of the bill. One of them pays the mechanic, and the other takes the same amount of money in cash and burns it. (Or donates to a third party they don’t especially like, or ….) This aligns both their incentives: Alice is no longer incentivized to hide mechanical problems when showing off the car, and Bob is no longer incentivized to ignore maintenance or frequent the racetrack.

However, this solution also illustrates the downside of the technique: it’s expensive. Sometimes accidents happen - e.g. the air conditioner fails without Alice hiding it or Bob abusing the car. Our both-pay solution will make such accidents twice as expensive. If we can’t tell in hindsight whether a problem was Alice’ fault, Bob’s fault, or an accident, then both Alice and Bob need to pay the full cost of the problem in order to fully align their incentives. That means they’ll both need to pay for accidents, which reduces the overall surplus from the car-sale. If the car is worth enough to Bob and little enough to Alice, there may still be room to make the deal work, but the (expected) cost of accidental problems will eat into both of their wallets.

Similarly, if Alice and Bob have less-than-perfect trust in each others’ capabilities, that will eat into (expected) value. If Bob thinks that Alice just doesn’t know her own car very well, he may expect problems that Alice doesn’t know about. If Alice thinks that Bob is a careless driver regardless of incentives, then she’ll expect problems. These sorts of problems are effectively the same as accidents: they’re problems which won’t be avoided by good incentives, and therefore their overall cost will be doubled when both Alice and Bob need to pay for them.

O-Ring Production Functions

Suppose we have 100 workers, all working to produce a product. In order for the product to work, all 100 workers have to do their part correctly; if even just one of them messes up, then the whole product fails. This is an o-ring production function - named for the explosion of the space shuttle Challenger, where the failure of one o-ring led to the fatal failure of the whole shuttle. The model has some interesting economic implications - in particular, under o-ring-like production, adding a high-skill worker to a team of other high-skilled workers generates more value than adding the same high-skill worker to a team of low-skill workers. Conversely, it offers theoretical support for common claims like “hiring one bad worker creates more damage than hiring ten good workers creates benefit”.

Here, I want to think about incentive design in an o-ring-like production model. If any worker fails to build their component well, then the whole product fails. How do we incentivize each worker to make their particular component work well? If we can figure out in hindsight which component(s) failed, then incentive design is easy: reward workers whose components succeeded, punish workers whose components failed. But what if we can’t tell in hindsight which components failed? What if we only know whether the product as a whole failed?

We can apply our value-destruction trick: if the product fails, then punish each worker as though their component had failed. Each worker is then fully incentivized to make their component work; if it fails, they’ll face the full cost of failure.

Just like the used car example, accidents are a problem. If there’s a non-negligible chance of accident, then workers will expect a non-negligible chance of failure outside of their control. In order to make up for that chance of punishment, the company will have to offer extra base pay to convince workers to work for them in the first place.

Also like the used car example, if the workers don’t trust each others’ capabilities, then that has the same effect as expecting accidents. Anything which makes the workers expect failure regardless of the incentives makes them expect punishment outside of their control, which makes them demand higher base pay in order to make it worthwhile to work for this company at all.

Even worse: if the workers think there’s a high probability of failure regardless of incentives, that reduces their own incentive to avoid failure. If they expect the final product to fail regardless of whether their own component fails, then they have little incentive to make their own component work. In order for this whole strategy to work well, there has to be a high probability that the end product succeeds, assuming the incentives are aligned. Accidents and incompetence have to be rare. (Drawing the analogy back to the used-car problem: if Alice knows that the clutch is bad, but expects Bob to abuse the clutch enough that it would be ruined anyway regardless of incentives, then she has little reason to mention the bad clutch, even under the both-pay strategy.)

Telephone

In the context of a modern business, one model I think about is the game of telephone. The players all sit in a line, and the first player receives a secret message. The first player whispers the message in the ear of the second, the second whispers it to the third, and so forth. When the message reaches the last player, we compare the message received to the message sent to see if they match. Inevitably, a starting message of “please buy milk and potatoes at the store” turns into “cheesy guys grow tomatoes on the shore”, or something equally ridiculous, one mistake at a time.

In a business context, the telephone chain might involve a customer research group collecting data from customers, then passing that data to product managers, who turn it into feature requests for designers, who then hand the design over to engineers, who build and release the product, often with several steps of information passing up and down management chains in the middle. This goes about as well as the game of telephone - thus, “jokes” like this:

Viewed as economic production, the game of telephone is itself an example of an o-ring production function. In order to get a successful final product - i.e. a final message which matches the original message - every person in the chain must successfully convey the message. If one person fails, the whole product fails. (Even if individual failures are only minor, a relatively small number of them still wipes out the contents of the message.) And, if there’s an end-to-end mismatch, it will often be expensive to figure out where communication failed, even in hindsight.

So, we have the preconditions for our technique: we can incentivize good message-passing by punishing everyone in the chain when the output message doesn’t match the input message.

Would this be a good idea? It depends on how much miscommunication can be removed by good incentives. If the limiting factor is poor communication skills, and the people involved can’t do any better even if they try, then we’re in the “expect accidents” regime: the incentives will be expensive and the system will often fail anyway. On the other hand, if incentivizing reliable communication produces reliable communication, then the strategy should work.

That said, we’re talking about punishing managers for miscommunicating, so presumably few managers would want to adopt such a rule regardless. Good incentive design doesn’t make much difference if the people who choose the incentives do not want to fix them.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

28

When Hindsight Isn't 20/20: Incentive Design With Imperfect Credit Allocation

28

The Lemon Problem

O-Ring Production Functions

Telephone