Automatically grading Jupyter (IPython) Notebooks for university courses

April 2, 2021

Grading Jupyter Notebooks, manually and automatically

In 30 seconds...

Jupyter Notebook is a commonly used tool in computer science and programming teaching. Students can create and share documents that are a combination of Python code and text;
But it is still awkward to grade! Most teachers download the notebook, open it in Jupyter, manually review it then submit it back to the LMS;
Manually grading does make sense for Notebooks, as many include written text and graphics. In CodeGrade we render and run the code portion of the notebook directly within our interface! Just click on a line and leave feedback;
It is also very easy to setup automatic testing: unit testing or easy I/O tests are a great way to assess student’s IPython Notebooks automatically.

Jupyter Notebook, formerly known as IPython Notebook, is a fantastic and very commonly used tool in computer science and programming education. Jupyter Notebooks allow a student to create and share documents that are a combination of Python code and text, which can include equations, visualizations and narrative text. This combination makes it very powerful for the more applied computer science courses like data science, machine learning and computer graphics.

Teaching with Jupyter Notebooks has become a common practice because of all these advantages. However, grading Jupyter Notebooks (files ending with the `.ipynb` extension) is still very cumbersome. Most teachers simply download the submitted notebook, open it using the Jupyter application, manually review it, and then submit a grade back to the learning management system. A far from practical and scalable solution. In this guide, we will explain how you can more effectively manually grade Jupyter Notebooks using CodeGrade and how you can even set up automatic tests for them too.

Prefer to watch a webinar? All of the techniques mentioned in this blog can also be found in our webinar on Python and Jupyter Notebook autograding.

Manual grading

CodeGrade makes it very easy to automatically grade and run a Jupyter Notebook in our interface, but we will start with a short section discussing grading them manually. The main reason for that is that Jupyter Notebooks are pre-eminently suited for some way of manual grading.

Jupyter Notebooks are most often chosen by instructors because they very intuitively combine written reports and code, which can interact and add to each other in a notebook. Moreover, the types of courses Jupyter Notebooks are most commonly used for result in predominantly graphs, visualizations and other graphics. The code part of notebooks is very effectively graded manually, but written text and graphics are often chosen to be graded manually.

CodeGrade makes this intuitive and highly efficient for you, as it renders and runs the Jupyter Notebook directly within our interface in your browser. This makes grading it manually extremely easy: just as you have come to expect from CodeGrade, you can simply click on any line in the notebook to leave feedback to the students. All of CodeGrade’s efficiency- and feedback-enhancing tools, think feedback snippets, rubrics and grading management, are also available for Jupyter Notebooks. This is all available as a plugin to your learning management system (LMS) too, like Canvas, Blackboard and Open edX. You can read an article here where we discuss Jupyter Notebooks in Brightspace.

Automatically running a Jupyter Notebook

When grading a Jupyter Notebook manually, it is a good practice to first run the notebook. CodeGrade can do this automatically.

To understand why this is a good practice, it is important to understand the inner workings of a Jupyter Notebook first. In essence, notebooks are simple JSON files that store the different cells in it. Next to storing these cells, the results of these cells are also saved in this JSON, meaning that students hand in a notebook in a certain state. The state of the Jupyter Notebook that students hand in is not necessarily the latest state and output can possibly be manually altered. With automatic grading, tests will check the correctness of the actual code, but with manual grading the visual results are often leading. To make sure you are grading the latest you can automatically run the notebook using CodeGrade AutoTest.

For this, we will use CodeGrade’s AutoTest Output functionality. This allows us to generate output using AutoTest, that can be displayed in our Code Viewer! CodeGrade has custom scripts that you can use to do this very easily, but using the pre-installed `jupyter` package you can do this very easily yourself too using:

-!- CODE language-shell -!-jupyter nbconvert --execute --to notebook
--output $AT_OUTPUT/jupyter.ipynb $STUDENT/jupyter.ipynb
--allow-errors
‍

By adding this line to a Run Program step in your AutoTest, you can generate the run Jupyter Notebook in the AutoTest output folder to manually review!

The automatically run Jupyter Notebook in the Code Viewer

Start autograding your Jupyter Notebook assignments now with CodeGrade!

Book a demo

Automatic grading using I/O Tests

Jupyter Notebooks are essentially a wrapper around Python code, and therefore very suitable for automatic grading using CodeGrade! By converting the notebook to a regular Python script, we can use all the easy autograding tools and options we normally use for Python (see this webinar on autograding Python code) on our Jupyter Notebooks too!

Luckily, the `jupyter` package has a function that allows us to convert a Jupyter Notebook to a valid Python script. This Python script is simply constructed by appending all code cells in your Jupyter Notebook. You can use the following line of code in a Run Program step to achieve this:

-!- CODE language-shell -!-jupyter nbconvert --to script YOURFILE.ipynb

In the same AutoTest category as this line, you can now interact with the generated script like you would normally do. One way to do this easily is using the Input/Output Tests (I/O Tests). Before interacting with the script, you will have to import it. We will do this by opening the Python interpreter with the following command: `python3 -ic "import your_script"`. By making this the “Program to test” in your I/O Test, you will be able to interact with the script using the stdin and stdout. For example, by printing the result of a function as input and writing the expected output as output: `print(your_script.function(1, 5))`.

Autograding variables in a Jupyter Notebook

Autograding functions in a Jupyter Notebook

As the input is regular Python code you are inputting to the Python interpreter, you can call functions, do arithmetic operations and print variables. As can be seen in the examples above.

Importing Python code without printing

One thing to be aware of is that we are checking the stdout of the scripts, which are run completely when importing. As a result, students can clutter the output with additional print statements outside of functions. There’s two ways to prevent this:

Importing the script with the stdout redirected. This can be done using this little code snippet, which you run via `python3 -i import_without_print.py`:

-!- CODE language-python -!-from contextlib import redirect_stdout
from os import devnull

with redirect_stdout(open(devnull, 'w')):
import jupyter #<-- The name of your script

If you are providing Jupyter Notebook skeleton code to your students, you can make sure to add a `if __name__ == "__main__":` statement every time you print solutions. This way, the students can see their solutions while interacting with the notebook, but these solutions will not be printed when importing the code (using the script above or with a regular import).

Automatically grading using unit testing

Just like with any programming language, unit testing is a great way to assess student’s Jupyter Notebooks easily and automatically. The I/O tests already allow you to check individual functions and variables, but have their limitations. Unit testing Jupyter Notebooks is especially useful if you want to assess and interpret more advanced data types (Numpy arrays and pandas dataframes work well using I/O tests too!), if you need unit testing functionalities like testing for exceptions or complex tests or if you want to use random input / output testing.

A very common way to unit test Jupyter Notebooks is using a tool called nbgrader, a great yet complex tool that is often used in education. CodeGrade helps you bring the functionality of nbgrader to the cloud with our own unit testing framework `cg-jupyter-unit`, which is in open beta right now. Would you like to try out `cg-jupyter-unit` yourself? Please send an email to support@codegrade.com and we’d be happy to get you started with it!

Jupyter Notebooks in CodeGrade

With more and more instructors using Jupyter Notebooks in education and academics, it is becoming increasingly important to streamline the grading process of them. With their unique format and characteristics, it is more challenging to manually grade them in an efficient way or autograde them at all. This guide provides tools to do this in CodeGrade. This list is by no means exhaustive, but provides all information required to start autograding Jupyter Notebooks for almost all programming assignments. Would you like to learn more about grading Jupyter Notebooks or do you have any questions regarding this guide? I’d be more than happy to help you out via support@codegrade.com.