AI ALIGNMENT FORUM
AF

AI Races and Macrostrategy
Software ToolsLanguage Models (LLMs)AIPractical
Frontpage

13

The Codex Skeptic FAQ

by Michaël Trazzi
24th Aug 2021
2 min read
24

13

Software ToolsLanguage Models (LLMs)AIPractical
Frontpage
Previous:
Why Copilot Accelerates Timelines
10 comments35 karma
Next:
Phil Trammell on Economic Growth Under Transformative AI
3 comments14 karma
Log in to save where you left off
The Codex Skeptic FAQ
12Vaniver
4Michaël Trazzi
7Taran
4Michaël Trazzi
4Daniel Kokotajlo
3Daniel Kokotajlo
3adamShimi
3Daniel Kokotajlo
4adamShimi
1interstice
1Michaël Trazzi
Taran
2Michaël Trazzi
New Comment
13 comments, sorted by
top scoring
Click to highlight new comments since: Today at 8:04 PM
[-]Vaniver4y120

[Note: I use Copilot and like it. The 'aha' moment for me was when I needed to calculate the intersection of two lines, a thing that I would normally just copy/paste from Stack Overflow, and instead Copilot wrote the function for me. Of course I then wrote tests and it passed the tests, which seemed like an altogether better workflow.]

Language models are good enough at generating code to make the very engineers building such models slightly more productive

How much of this is 'quality of code' vs. 'quality of data'? I would naively expect that the sort of algorithmic improvements generated from OpenAI engineers using Copilot/Codex/etc. are relatively low-impact compared to the sort of benefits you get from adding your company's codebase to the corpus (or whatever is actually the appropriate version of that). I'm somewhat pessimistic about the benefits of adding Copilot-generated code to the corpus as a method of improving Copilot.

Reply
[-]Michaël Trazzi4y40

I buy that "generated code" will not add anything to the training set, and that Copilot doesn't help for having good data or (directly) better algorithms. However, the feedback loop I am pointing at is when you accept suggestions on Copilot. I think it is learning from human feedback on what solutions people select. If the model is "finetuned" to the specific dev's coding style, I would expect Codex to suggest even better code (because of high quality of finetuning data) to someone at OAI than me or you.

How much of this is 'quality of code' vs. 'quality of data'?

I'm pointing at overall gains in dev's productivity. This could be used for collecting more data, which, AFAIK, happens by collecting automatically data from the internet using code (although possibly the business collaboration between OAI and github helped). Most of the dev work would then be iteratively cleaning that data, running trainings, changing the architecture, etc. before getting to the performance they'd want, and those cycles would be a tiny bit faster using such tools.

To be clear, I'm not saying that talented engineers are coding much faster today. They're probably doing creative work at the edge of what Codex has seen. However, we're using the first version of something that, down the line, might end up giving us decent speed increases (I've been increasingly more productive the more I've learned how to use it). A company owning such model would certainly have private access to better versions to use internally, and there are some strategic considerations in not sharing the next version of its code generating model to win a race, while collecting feedback from millions of developers.

Reply
[-]Taran4y*70

-

Reply
[-]Michaël Trazzi4y40

Wait, they did plain forbid you to use at all during work time, or they forbid to use its outputs for IT issues? Surely, using Codex for inspiration, given a natural language prompt and looking at what function it calls does not seem to infringe any copyright rules?

  • 1) If you start with your own variable names, it would auto-complete with those, maybe using something he learned online. would that count as plagiarism in your sense? How would that differ from copy-pasting from stack overflow changing the variable names (I'm not an expert in SO copyright terms but you should probably quote SO if doing so and there might be some rules about distributing it commercially).
  • 2) imagine you are using line-by-line auto-complete, and sometimes you re-arrange the ordering of the lines, adding your own code, even modifying it a bit. At one point does it become your own code?
  • 3) In the cases 1. and 2. that I mentioned above, even if some of the outputs were verbatim (which apparently happens a tiny fraction of the time) and had exactly the same (probably conventional) variable names, would "I have some line of code with exact the same normal naming of variables on the internet" be enough for going to court?
  • 4) Assuming that developers are, or will be, more productive using such tools, don't you think they would still use Copilot-like software to a) get inspiration b) copy-paste code that they would later modify to bypass IP infringements if they are smart enough about it, even though their companies "forbids" them from using it?
Reply
[-]Daniel Kokotajlo4y40

I'm extremely keen to hear from people who have used Codex a decent amount (or tried to) and decided it isn't worth it. Specifically, people who wouldn't pay $15/mo for a subscription to it. Anyone?

Reply
[-]Daniel Kokotajlo4y30

For context, GitHub has 60,000,000 users. If 10% of them buy a $15/mo subscription, that's a billion dollars a year in annual revenue. A billion dollars is about a thousand times more than the cost to create Codex. (The cost to train the model was negligible since it's only the 12B param version of GPT-3 fine-tuned. The main cost would be the salaries of the engineers involved, I imagine.)

Reply
[-]adamShimi4y30

Maybe I'm wrong, but my first reaction to your initial number is that users doesn't mean active users. I would expect a difference of an order of magnitude, which keeps your conclusion but just with a hundred times more instead of a thousand times more.

Reply
[-]Daniel Kokotajlo4y30

That's reasonable. OTOH if Codex is as useful as some people say it is, it won't just be 10% of active users buying subscriptions and/or subscriptions might cost more than $15/mo, and/or people who aren't active on GitHub might also buy subscriptions.

Reply
[-]adamShimi4y40

Agreed. Part of the difficulty here is that you want to find who will buy a subscription and keep it. I expect a lot of people to try it, and most of them to drop it (either because they don't like it or because it doesn't help them enough for their taste) but no idea how to Fermi estimate that number.

Reply
[-]interstice4y10

Regarding your first point, I think when people say that language models "don't bring us closer to full code automation" they mean there's no way of improving/upgrading language models such that they implement full code automation. I think it would be better to argue against that claim directly instead of bringing up language model's productivity-boosting effects. There are many things that could potentially boost programmers' productivity -- better nootropics, say -- but it seems overly broad to say that they all "bring us closer to full code automation", even if it might be causally true that they reduce the time to automation in expectation.

Reply
[-]Michaël Trazzi4y10

The problem with arguing against that claim is that nobody knows whether transformers/scaling language models are sufficient for full code automation. To take your nootropics example, an analogy would be if nootropics were legal, did not have negative side effects, with a single company giving "beta access" (for now) to a new nootropic in unlimited amount at no cost to a market of tens of millions of users, that the data from using this nootropic was collected by the company to improve the product, that there actually were 100k peer-reviewed publications per year in the field of nootropics, where most of the innovation behind the tech came from a >100B-parameters model trained on open-source nootropic chemistry instructions. Would such advancements be evidence for something major we're not certain about (e.g. high bandwidth brain computer interface) or just evidence for increased productivity that would be reinjected into more nootropic investments?

Reply
[-]Taran4y*00

-

Reply
[-]Michaël Trazzi4y20

I created a class initializing the attributes you mentioned, and when adding your docstring to your function signature it gave me exactly the answer you were looking for. Note that it was all in first try, and that I did not think at all about the initialization for components, marginalized or observed—I simply auto-completed.

class Distribution:
def __init__(self):
self.components = []
self.marginalized = None
self.observed = None


def unobserved(self) -> Set[str]:

"""Returns a set of all unobserved random variable names inside this Distribution -- that is,

those that are neither observed nor marginalized over.

"""
return set(self.components) - set(self.observed) - set(self.marginalized)
Reply
Moderation Log
More from Michaël Trazzi
View more
Curated and popular this week
13Comments
Mentioned in
13Why Copilot Accelerates Timelines

Most of my programmer friends believe that Language Models trained on code will not affect their day job anytime soon. In this post, I make the case that 1) code generation is already useful (assuming minimal prompt engineering skills) 2) even if you do not believe in 1), code generation will increase programmers' throughput way sooner than it will fully automate them.

Language Models trained on Code do not bring us closer to Full Code Automation

This misconception comes from thinking linearly instead of exponentially. Language models are good enough at generating code to make the very engineers building such models slightly more productive, for instance when dealing with a new API. In other words, the returns (aka the improvements in the algorithm) from investing more resources in code generation directly helps (with better developer tools) create a better code-generating algorithm.

Code generation does not automate the part of my workday where I think hard

  • It still accelerates “glue code” or “API work”—a substantial fraction of large codebases.
  • Besides, only a set of privileged engineers get to think about the broad picture every day.
  • Plus, hard thinking is mostly required at the start, when designing the architecture.
  • And thinking seldom happens in a silo. It instead requires many iterations, through coding.

I asked a model to generate code but it doesn't seem to be able to solve it

More often than not, the issue is not about the model. Try another prompt. (Example)

The output is outdated code from average programmers

Code quality (length, variable naming, taste) is prompt and hyperparameter dependent. Generally, language models use variables from the prompt and you can rename those yourself.

Only developers who repeat the same tasks will be automated so it will not affect me

You might still see gains in productivity in learning how to use a more advanced version.

My job does not involve solving simple coding tests from docstrings

You should be capable of separating your code in smaller functions and write docstrings.

Codex cannot solve my problem since it has only access to a limited training set

Github Copilot stores your data. Supposedly, the same applies to the Codex beta.

Current Language Models still make silly mistakes

If the mistake is silly, then fixing it is trivial.

Anyway, it is error prone so it cannot be used for critical software

It generates less error than I do when writing code for the first time.


I would strongly suggest applying to Github Copilot or OpenAI Codex access to check for yourself, avoiding cherry-picked examples on the internet (in good and in bad). Indeed, if you search online, you might run into outdated reviews, where it turns out that highlighted errors actually work now. If you cannot wait for beta access, I recommend asking a friend for a demo (I'm happy to showcase it to everyone), trying genji python or reading this up-to-date review.

More generally, programmers should seriously consider learning prompt engineering to avoid being left behind, and, I believe, any future forecast about AI progress should include this shorter loop between deep learning models and programmer productivity.