Using NLP (Natural Language Programming) for the future of coding
Articles
January 20, 2022

Could Natural Language Programming change the future of coding for the better?

In 30 seconds...

Introduction

Over the last couple of decades, computing technology has evolved astronomically. From the earliest forms of assembly language developed in the first half of the 20th Century to the functional and object-oriented languages of the modern era, communicating with our machines has become more powerful, efficient and reliable than ever before. Now, the evolution of modern computer programming is taking its next step. With the explosion of artificial Intelligence (AI) research and development, some companies have taken the first steps to  using natural language processing to allow computers to write our code for us. One such innovation is OpenAI. They recently launched an updated version of the OpenAI Codex: a platform that uses AI to translate natural language into code.

The idea for OpenAI Codex was originally conceived when scientists at OpenAI attempted to use the GPT-3 AI model to write simple python programs from python docstrings. GPT-3 was surprisingly successful; it was not explicitly trained for code generation but was still capable of writing simple programs.

I’d like to preface the rest of this article by saying that most of this is conjecture and speculation. I am not an expert in this field of Artificial Intelligence, nor am I capable of predicting the future potential of the technologies discussed. What this is, is an educated guess at what might occur, supposing that AI models like Codex continue to develop to the point where we can realistically expect a non-coder to effectively generate a program from scratch using only the AI. (I’d love to hear your thoughts via our Twitter!)

So, how exactly does it work? To give you an example, if we were making a web-based game using Javascript (JS), we could tell OpenAI Codex “Use this image <link>” and paste the URL of the image we wanted to use. The platform would then automatically write JS code to import the image from the source and insert it into the body of the website. We could keep going by telling it to “Make the image bounce off the walls” and a block of code would then be written to animate the image traversing the screen and ricocheting off the sides. 

Contrast this with the way we code now: A programmer must know the vocabulary and syntax of a language almost by heart, they must know how to abstract code into functions, they have to remember to apply coding standards to every line of code, and the list of requirements goes on and on. There are some interesting consequences for this, which I’ll get into below.

The future of coding: Positives

Natural language programming has the potential to revolutionize computer science (CS) education and software development across all industries. It’s simplicity could make programming accessible to a huge number of people who otherwise may not be able to learn to code. Especially useful in a world with a growing deficit of skilled programmers.

The benefits of OpenAI’s approach for software developers are numerous. The act of programming would become simpler and more result-focused because programmers wouldn’t have to worry about the intricacies of coding best-practices and could instead pay attention to their desired outcome. The time required to write a functioning program could be dramatically reduced and the code would always be error-free, standardized and readable. However, would it really work that well?

Want to make your coding classroom future proof? Learn more about CodeGrade now!

The future of coding: Negatives

While Codex is able to create simple programs from natural language, the length and complexity of the language used to craft the instructions impact the likelihood of success. When the user writes long, imprecise sentences or requires a large volume of operations or variables, Codex struggles to generate code that satisfies the requirements. [1]

Another issue with Codex is that it’s trained on the billions of lines of code in the public domain. With it’s obvious limitations, it’s uncertain that Codex will ever reach a level of reliability that could be applied to industry. Additionally, while new code continues to be published in huge quantities, code that is now written with the use of OpenAI Codex could be polluting its own source of training data. That’s a problem for machine learning models because they could become overfitted meaning they are no longer generalizable. 

How should it be used?

As with any technological development, the more we improve on a technology, the more it becomes like a black box - we can see what it does, but not exactly how. Programming already uses principles of “black box code”, which abstract information away into neat functions with a clear set of expected inputs and outputs. The purpose of black box code is to simplify coding by removing the need to write out every single process required in a program. Alternatively, by using Codex, the idea is to give coders a set of tools that don’t require an understanding of the inner workings to be used.

With OpenAI Codex however, the level of abstraction reaches a critical point. If it’s application replaces a thorough understanding of coding concepts, then the creators could be faced with a lot of potential issues. If the code fails to do what the creator intended they would struggle to debug the code that the AI has written for them.

This begs the question: Should natural language programming replace or simply augment our current development practices? From a pedagogical standpoint it may be wise to continue teaching the fundamentals of programming to students before engaging with natural language programming. This would not only help students to understand and modify their code generated with AI technology, it would also teach them best practices for code structure and efficiency. What’s more, understanding software architecture and having a deep understanding of the way programming languages work will continue to be a mandatory prerequisite for any successful software developer in the foreseeable future. A programmer must know how data is stored in computers at its most fundamental level and how languages are built on many thousands of binary operations.

As the use of natural language coding platforms such as Codex become more commonplace, their adoption might unintentionally create an entirely new profession: “Prompt engineering”. Crafting prompts that deliver the best results with Codex will require a specialised understanding of the way prompts must be structured. It will essentially require its own vocabulary and syntax that may be less complex than writing out the actual code but still require a rather scientific approach. [2]

Conclusions

That being said, OpenAI Codex and all the other natural language programming platforms that will inevitably emerge over the coming years will massively change the CS education landscape. Teachers will be able to introduce programming concepts to much younger students, scientists will be able to build powerful toolkits without needing to take an entire course in CS, and businesses will be able to build software products at lightning-speed. How the technology unfolds in the next few years will be exciting to behold!

Bibliography

  1. https://arxiv.org/pdf/2107.03374.pdf
  2. https://spectrum.ieee.org/openai-wont-replace-coders
Samuel Natarajan

Samuel Natarajan

Teacher Success Manager
Samuel is Teacher Success Manager at CodeGrade and works hand-in-hand with Teachers and Professors in CS education. He’s trained in Cognitive Neuroscience but has a broad view on education, software development, and tech that sees him fit in comfortably with the IT crowd. In his free time he boulders, throws frisbees for fun and makes a mean stir-fry.

Continue reading

Teaching Intro to Python with CodeGrade

The CodeGrade Introduction to Python course is an 8-week basic Python course. Students are not required to have any prior knowledge on programming or Python. This course will cover the basic concepts of programming up to Python specific modules and OOP design. It is available for all instructors now.

New release CodeGrade QuietStorm.1!

Find out about all the features and updates coming with our latest release, QuietStorm.1!

New release CodeGrade QuietStorm.1!

Automatically grading Haskell code assignments

Learn about autograding Haskell coding assignments for Computer Science education courses. CodeGrade can help you use tools like input and output checking, Quickcheck, Tasty and HUnit unit test autograding, HLint code quality checking and code structure autograding using semgrep.

Top tips for teaching programming

The most efficient ways to teach students how to code, by defining your target audience, designing your assignments in a meaningful way and eliminating distractions in the classroom.

Top tips for teaching programming

Learn more about CodeGrade!

Grow your coding classroom
without compromise.