ChatGPT in CS education, how can teachers design ChatGPT-proof coding assignments?

March 7, 2023

How ChatGPT impacts Computer Science Education

In 30 seconds...

OpenAI's ChatGPT lets students easily generate code and solve coding problems;
Code generated by ChatGPT is often correct, but still far from perfect. The software still has many limitations;
Banning ChatGPT is both impractical and unwanted, CS instructors should consider ChatGPT as the "next calculator";
ChatGPT detectors / classifiers / plagiarism scanners are not reliable and it is implausible they will ever be for generated code;
Multiple pointers are discussed to design more ChatGPT-proof coding assessments, but teaching AI Literacy is equally important;
CodeGrade will share more example code assignments and tips on dealing with ChatGPT in upcoming blog posts, stay tuned! Click here to read our next article!

Since its release in late 2022, ChatGPT has been the talk of the town. ChatGPT allows users to chat with an AI language model (”GPT-3.5” to be exact) in an easy-to-use environment for free. With just simple prompts, users can task ChatGPT to answer questions, explain complicated material or write full essays, papers, poems, songs, code or anything you can think of.

Computer Science teachers have reacted with concern to the sudden rise of ChatGPT. And understandably so: ChatGPT can easily pass the 2022 AP Computer Science A exam and early survey’s show already 30% of students in the US have used ChatGPT in their assignments. In this article, we will discuss the challenges that ChatGPT poses on current Computer Science education. We will also attempt to explore first steps you can take to make your coding course more resilient against ChatGPT, but most importantly discuss whether that is needed in the first place.

Want to learn more about ChatGPT in Coding Education? Join our live webinar on June 1st 2023, sign up below:

Power and limitations of ChatGPT for code

Before we dive into the consequences of ChatGPT on Computer Science education, let’s take a step back and discuss what it exactly is. ChatGPT is a large language model based on the GPT-family of algorithms, in short: it is trained to generate plausible text, as if written by a human.

What this means is that, even though the text or code ChatGPT produces is often surprisingly correct, it is not trained to generate true statements. This is why Princeton Computer Science professor Arvind Narayanan has coined the term ‘Bullshit Generator’. The output of ChatGPT may sound very convincing, but many have reported it generating made-up non-existing references and StackOverflow quickly banned ChatGPT because of the many incorrect answers it generated.

But, taking into account that the output is not necessarily true, our own tests can only conclude that for (easier) Computer Science assignments, the output of ChatGPT is more often correct than not. In our tests, ChatGPT can successfully generate code if we feed it:

The entire assignment description, including context and background information;
Just the example (terminal) output and ask it to reverse engineer that;
Partial code and ask it to complete that.

The code provided by ChatGPT is often correct, but far from perfect. It misses documentation, comments, edge-cases, basic error handling or generates overly complicated code. Many also report that ChatGPT inherently fails to understand basic logic. But as mentioned earlier, the flawed code it generates is already able to pass common CS assessments.

Finally, another important limitation of ChatGPT (that is of lesser importance for generating code) is its “knowledge cutoff year”. ChatGPT is trained on data from before the year of 2021.

CodeGrade is the most advanced autograder, plagiarism detector and editor for code education, learn more today!

Book a demo

Banning ChatGPT in coding classrooms?

If we know that ChatGPT is good at solving coding problems, students will know that too. This brings up some novel discussions on academic integrity and the place of ChatGPT in coding classrooms. Here at CodeGrade, we are already actively working on new assignment designs to better equip teachers “to deal with” ChatGPT. But why not ban it?

OpenAI, the company behind ChatGPT, state the following regarding AI ethics in education:

Ultimately, we believe it will be necessary for students to learn how to navigate a world where tools like ChatGPT are commonplace. This includes potentially learning new kinds of skills, like how to effectively use a language model, as well as about the general limitations and failure modes that these models exhibit.

Some of this is STEM education, but much of it also draws on students’ understanding of ethics, media literacy, ability to verify information from different sources, and other skills from the arts, social sciences, and humanities.

From our research, this “don’t ban ChatGPT, teach with it” approach seems to be widespread amongst CS teachers too, with some professors seeing ChatGPT as just the next tool. Education has already adapted to the use of calculators and the use of Wikipedia or StackOverflow by simply changing the goals and means of assessments. What is important here is to inform students about the use, power and limitations of these tools. AI Literacy will become more important, especially in K-12 education. A great open source K-12 AI literacy course is the DAILy curriculum designed by MIT educators. It is interesting to see how closely the concerns on Wikipedia in the early 2000s resemble the current concerns with ChatGPT.

But even if banning would be the better option, it would still be impossible to implement. At the time of writing, there are multiple ChatGPT detectors or classifiers out there. All of these tools report to still be very limited and that they should not be used as a primary decision making tool for written text. Detecting generated code makes classification even more implausible, as can be seen in the classification limitations listed by OpenAI themselves. They explicitly write that their classifier is unreliable on code, and more importantly, will never work on very predictable text. This seems to be a fundamental limitation, as (simple) code assignments inherently yield predictable and homogeneous answers.

ChatGPT-proof code assessments

If we want to go forward with the “ChatGPT as new calculator” approach, how can we design our code assessments in a way that they are ChatGPT-proof? In other words: design our assignments in a way that we allow (maybe even expect) our students to make use of ChatGPT?

First, let’s review how ChatGPT differs from previous resources that students already had access to, such as StackOverflow or W3Schools. In theory, students could already find all the code they needed to answer (introductory) code assignments, right? True, but ChatGPT is different in the way that students search for answers. Using Google to find code snippets to fit your problem did require some basic skills from the students: first understanding the problem, then splitting up the problem into searchable sub-tasks and then forming a good search query. These problem solving skills are no longer needed with ChatGPT, where, as we discussed before, students can simply copy and paste the entire assignment description or example output.

We believe answering this is not just in the assignment design, but also about the AI Literacy we previously mentioned. Informing students on the limitations of ChatGPT and “proper” ways to use it. This can start with defining how students should cite their ChatGPT use. Is a simple citation enough, or do you want students to export the entire chat that led to the correct answer as part of their assignment? This functionality is not yet implemented by ChatGPT, but can already be achieved by third-party browser extensions.

There’s no one-size-fits-all solution for assignment design. Depending on your coding course’s level, goals and technologies, you may consider the following:

Split up your assessment in multiple parts, monitor your students’ progress by letting them hand in intermediate (autograded) sub-tasks;
Remove concrete assignment goals and edge-cases from the assignment description, but rather have your students discover them as they progress and unlock new automatic tests;
Require more creativity from students in final projects and assess design choices;
Add written out logic puzzles or recent data (sets) to the assignment.

In line with teaching students about the limitations of ChatGPT, a great new assignment type could be the “debug ChatGPT” assignment, as discussed by Orit Hazzan from ACM here. Students are tasked to fix flawed ChatGPT code, improve the style, consistency and add documentation. Debugging and rewriting code is not only a valuable skill, this assignment also very clearly shows students the poor quality of ChatGPT output.

The above are just some preliminary thoughts and pointers towards ChatGPT-proof code assessments. As ChatGPT evolves and our efforts continue, you can expect more example assignments and tips on dealing with ChatGPT in CodeGrade in the near future. We have since this article published a new article with 5 assignment design ideas to deal with ChatGPT in the coding classroom, click here to read it!

How we can help teachers deal with ChatGPT

At CodeGrade, we have been keeping a close eye on all ChatGPT developments and will keep doing so in the future. We have also opened up discussions in our team and with the many CS professors that use CodeGrade on how to best deal with ChatGPT.

In the upcoming months, we expect to publish more resources on ChatGPT in the coding classroom, how CS instructors can even benefit from ChatGPT themselves and example ChatGPT proof coding assignments. You can find our next article, with 5 assignment design ideas to deal with ChatGPT in the coding classroom, here!

Continue reading

Sign up to our newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.