The idea that software can be developed by artificial intelligence without requiring a human developer opens up a world of possibilities – and questions. Software development AI applications primarily target developers, promising to act as “co-pilots” and make them more productive. Could it go even further to the point where developers are no longer needed at all? What benefit could this have for business users? Having recently gained preview access to the OpenAI Codex application, Ravi Sawhney Took him on tour through the lens of a professional user.
In May 2020, OpenAI, an artificial research lab, released a new type of AI model called GPT-3. This great language model was trained on a body of hundreds of billions of words, with the aim of predicting which text comes next at the user’s request. The model quickly gained media attention for its ability to be applied to a wide variety of language tasks with minimal user prompting, known as “step-by-step learning”. For example, it has been shown that the model can translate from English to French with a good level of efficiency thanks to the prior use of a few examples by the user. He also performed well in text summary, classification, and question-and-answer tasks.
Moving on from the initial buzz, which was accompanied by growing concerns about the use of AI in decision-making, GPT-3 fell silent as it remained in private beta and it was not clear whether this model was ready to be incorporated into software production and what could be its use cases beyond general entertainment.
However, it looks like things are accelerating and Microsoft has started to commercialize this technology, which isn’t too surprising given the company’s significant investment in OpenAI. Microsoft has subtly incorporated GPT-3 into its low-code app, Power Apps, allowing users to type in their intent in natural language and the app will then return the appropriate syntax.
More important, however, was the preview by Github (a Microsoft-owned company) of their Co-Pilot product. This app, aimed primarily at software developers, promises to act as a “co-pilot” by suggesting code to the developer based on the comments they write.
Co-Pilot was developed using, as OpenAI calls it, a top-down model of GPT-3 called Codex. The Codex has been formed on billions of lines of source code from publicly available sources including, of course, Github.
The wider promise
Having recently gained preview access to OpenAI Codex, I gave it a tour through the lens of a professional user.
My goal was to understand if this technology can be used in practice to make software developers more productive. Could it go even further to the point where developers are no longer needed at all? What benefit could this have for business users? How capable is he of understanding human intention? Which, really, is the ultimate promise of this technology.
Before diving into concrete examples of Codex, it is worth understanding the potential importance of what this technology offers. The terms no-code or low-code have only recently entered our vocabulary. The idea is that software applications can be developed without requiring a software developer, or in other words, the actual end user can convert their intent into software with zero or minimal understanding of coding. If you consider that software applications exist in almost every aspect of our personal and professional lives, this ability offers a radically new method for building applications beyond hiring a full-time engineer or purchasing of a standard application.
Put into practice
Codex works by the user by providing prompts. Then it takes those prompts, with some parameters controlled by the user, to predict what it thinks the user wants next. In simplistic terms, it can be thought of as a turbocharged autocomplete. In principle, it is the same as GPT-3, but the model was trained on sample code. Two models came out here: “Davinci” and “Cushman”. The latter is designed to be a davinci relation but faster, at the expense of predicted accuracy. For this demo I stuck with davinci, as speed was not an issue, but it’s interesting that OpenAI is considering the performance / speed tradeoff for real world applications where low latency is a must.
To demonstrate what it can do, I present a series of input prompts (“Inputs”) and document the response provided (“AI Output”).
I’ll start with examples that convert natural language to widely adopted Structured Query Language (SQL), which start off simple, then get more complicated, and as you’ll see, don’t always work. I also demonstrate Codex’s ability to convert English to Python.
It’s hard not to be impressed with OpenAI Codex. Simply writing what you want and producing the code in seconds is a product manager’s dream. The Python example illustrates that Codex knew how to call the CoinDesk API to get the price of bitcoin, although it didn’t capture the intent exactly correctly, as it started the plot from early 2020 and not from 2021. These little mistakes happened with more complicated examples, but in many of them it only took a few minor edits to fix them.
It was also perhaps not surprising that SQL produced the best examples given the syntax’s proximity to English natural language. In fact, as I experimented, it became evident how useful technology could be from an educational point of view for someone learning to code from scratch. Instead of using Google, the student can ask the AI for help, and more likely than not, they’ll return something useful that will move their thinking forward.
It’s fair for me to add that these examples above were taken after spending some time learning how to best frame the input prompt. In the same way that if you mischaracterize your business needs to your human engineer you may end up with a poor quality product, a wave invite to Codex will result in an output that is not executable or does not match your intentions.
There are some interesting points to note about Codex, which help guide its wider application in the business. First of all, it should be pointed out that it is informed by the existing code. This may result in a textual quote back without attribution to the original developer. Although very rare, the fact that this could happen could create a headache in trying to understand the legal ramifications of how this code could then be used.
Second, the model itself is non-deterministic. While the level of creativity can be controlled by exposed parameters, reproducibility of model output with the same input cannot be guaranteed. While this might sound problematic, especially for code production, I have noticed that in some cases the increased creativity of the model led to producing the desired results from ill-defined inputs, which was impressive.
What does this mean for the future of software development?
While the above examples demonstrate that Codex can generate executable code to match users’ intent, I don’t see it replacing developers anytime soon. Deploying AI-generated code to production enterprise systems without at least a code review is just too risky right now.
The most relevant question for today is: Can Codex help software engineers make them more productive? As someone who works on the business side of software development, I find it hard to make a definitive call on this. From a quick survey of engineers in my network, it appears that AI definitely has the potential to improve developer efficiency if used in the right way.
Many corporate code bases are large and complicated in nature and it would be difficult to see how Codex could provide high quality and secure suggestions to the developers working there when trained on unverified public repositories such as than Github. However, if OpenAI allowed Codex to train on private code bases, which it does on GPT-3 through a process called fine-tuning, it could be a game-changer. Engineering teams would have certainty about the quality of the training data and this would make the model highly relevant to existing business applications. This could reduce the time it takes for a new engineer to be productive when learning a new code base.
Codex only released a few weeks ago as a private beta and is still in development. Still, I’m really impressed because it gives a real insight into how software might be developed in the very near future. Whether it’s lowering the barrier to entry for novice programmers, making expert programmers more productive, and accelerating the low-code movement that is currently capturing the imaginations of many business leaders. The economic value of AI in the software development industries cannot be underestimated and warrants further research.
Authors’ disclaimer: All opinions expressed are my own.
- The post represents the author’s point of view, not the position of LSE Business Review or the London School of Economics.
- Image presented by Markus Spiske on Unsplash
- When you leave a comment, you agree to our comments policy.