Automatically grading Jupyter (IPython) Notebooks for university courses
Guides
April 2, 2021

Grading Jupyter Notebooks, manually and automatically

In 30 seconds...

  • Jupyter Notebook is a commonly used tool in computer science and programming teaching. Students can create and share documents that are a combination of Python code and text;
  • But it is still awkward to grade! Most teachers download the notebook, open it in Jupyter, manually review it then submit it back to the LMS;
  • Manually grading does make sense for Notebooks, as many include written text and graphics. In CodeGrade we render and run the code portion of the notebook directly within our interface! Just click on a line and leave feedback;
  • It is also very easy to setup automatic testing: unit testing or easy I/O tests are a great way to assess student’s IPython Notebooks automatically.

Jupyter Notebook, formerly known as IPython Notebook, is a fantastic and very commonly used tool in computer science and programming education. Jupyter Notebooks allow a student to create and share documents that are a combination of Python code and text, which can include equations, visualizations and narrative text. This combination makes it very powerful for the more applied computer science courses like data science, machine learning and computer graphics. 

Teaching with Jupyter Notebooks has become a common practice because of all these advantages. However, grading Jupyter Notebooks (files ending with the `.ipynb` extension) is still very cumbersome. Most teachers simply download the submitted notebook, open it using the Jupyter application, manually review it, and then submit a grade back to the learning management system. A far from practical and scalable solution. In this guide, we will explain how you can more effectively manually grade Jupyter Notebooks using CodeGrade and how you can even set up automatic tests for them too.

Prefer to watch a webinar? All of the techniques mentioned in this blog can also be found in our webinar on Python and Jupyter Notebook autograding.

Manual grading

CodeGrade makes it very easy to automatically grade and run a Jupyter Notebook in our interface, but we will start with a short section discussing grading them manually. The main reason for that is that Jupyter Notebooks are pre-eminently suited for some way of manual grading. 

Jupyter Notebooks are most often chosen by instructors because they very intuitively combine written reports and code, which can interact and add to each other in a notebook. Moreover, the types of courses Jupyter Notebooks are most commonly used for result in predominantly graphs, visualizations and other graphics. The code part of notebooks is very effectively graded manually, but written text and graphics are often chosen to be graded manually.

CodeGrade makes this intuitive and highly efficient for you, as it renders and runs the Jupyter Notebook directly within our interface in your browser. This makes grading it manually extremely easy: just as you have come to expect from CodeGrade, you can simply click on any line in the notebook to leave feedback to the students. All of CodeGrade’s efficiency- and feedback-enhancing tools, think feedback snippets, rubrics and grading management, are also available for Jupyter Notebooks. This is all available as a plugin to your learning management system (LMS) too, like Canvas, Blackboard and Open edX. You can read an article here where we discuss Jupyter Notebooks in Brightspace.

Automatically running a Jupyter Notebook

When grading a Jupyter Notebook manually, it is a good practice to first run the notebook. CodeGrade can do this automatically. 

To understand why this is a good practice, it is important to understand the inner workings of a Jupyter Notebook first. In essence, notebooks are simple JSON files that store the different cells in it. Next to storing these cells, the results of these cells are also saved in this JSON, meaning that students hand in a notebook in a certain state. The state of the Jupyter Notebook that students hand in is not necessarily the latest state and output can possibly be manually altered. With automatic grading, tests will check the correctness of the actual code, but with manual grading the visual results are often leading. To make sure you are grading the latest you can automatically run the notebook using CodeGrade AutoTest.

For this, we will use CodeGrade’s AutoTest Output functionality. This allows us to generate output using AutoTest, that can be displayed in our Code Viewer! CodeGrade has custom scripts that you can use to do this very easily, but using the pre-installed `jupyter` package you can do this very easily yourself too using:

-!- CODE language-shell -!-jupyter nbconvert --execute --to notebook
   --output $AT_OUTPUT/jupyter.ipynb $STUDENT/jupyter.ipynb
   --allow-errors

By adding this line to a Run Program step in your AutoTest, you can generate the run Jupyter Notebook in the AutoTest output folder to manually review!


The automatically run Jupyter Notebook in the Code Viewer


Start autograding your Jupyter Notebook assignments now with CodeGrade!

Automatic grading using I/O Tests

Jupyter Notebooks are essentially a wrapper around Python code, and therefore very suitable for automatic grading using CodeGrade! By converting the notebook to a regular Python script, we can use all the easy autograding tools and options we normally use for Python (see this webinar on autograding Python code) on our Jupyter Notebooks too!

Luckily, the `jupyter` package has a function that allows us to convert a Jupyter Notebook to a valid Python script. This Python script is simply constructed by appending all code cells in your Jupyter Notebook. You can use the following line of code in a Run Program step to achieve this: 

-!- CODE language-shell -!-jupyter nbconvert --to script YOURFILE.ipynb

In the same AutoTest category as this line, you can now interact with the generated script like you would normally do. One way to do this easily is using the Input/Output Tests (I/O Tests). Before interacting with the script, you will have to import it. We will do this by opening the Python interpreter with the following command: `python3 -ic "import your_script"`. By making this the “Program to test” in your I/O Test, you will be able to interact with the script using the stdin and stdout. For example, by printing the result of a function as input and writing the expected output as output: `print(your_script.function(1, 5))`.

Autograding variables in a Jupyter Notebook
Autograding functions in a Jupyter Notebook

As the input is regular Python code you are inputting to the Python interpreter, you can call functions, do arithmetic operations and print variables. As can be seen in the examples above.

Importing Python code without printing

One thing to be aware of is that we are checking the stdout of the scripts, which are run completely when importing. As a result, students can clutter the output with additional print statements outside of functions. There’s two ways to prevent this:

  1. Importing the script with the stdout redirected. This can be done using this little code snippet, which you run via `python3 -i import_without_print.py`:

-!- CODE language-python -!-from contextlib import redirect_stdout
from os import devnull

with redirect_stdout(open(devnull, 'w')):
    import jupyter #<-- The name of your script


  1. If you are providing Jupyter Notebook skeleton code to your students, you can make sure to add a `if __name__ == "__main__":` statement every time you print solutions. This way, the students can see their solutions while interacting with the notebook, but these solutions will not be printed when importing the code (using the script above or with a regular import).

Automatically grading using unit testing

Just like with any programming language, unit testing is a great way to assess student’s Jupyter Notebooks easily and automatically. The I/O tests already allow you to check individual functions and variables, but have their limitations. Unit testing Jupyter Notebooks is especially useful if you want to assess and interpret more advanced data types (Numpy arrays and pandas dataframes work well using I/O tests too!), if you need unit testing functionalities like testing for exceptions or complex tests or if you want to use random input / output testing.

A very common way to unit test Jupyter Notebooks is using a tool called nbgrader, a great yet complex tool that is often used in education. CodeGrade helps you bring the functionality of nbgrader to the cloud with our own unit testing framework `cg-jupyter-unit`, which is in open beta right now. Would you like to try out `cg-jupyter-unit` yourself? Please send an email to support@codegrade.com and we’d be happy to get you started with it!

Jupyter Notebooks in CodeGrade

With more and more instructors using Jupyter Notebooks in education and academics, it is becoming increasingly important to streamline the grading process of them. With their unique format and characteristics, it is more challenging to manually grade them in an efficient way or autograde them at all. This guide provides tools to do this in CodeGrade. This list is by no means exhaustive, but provides all information required to start autograding Jupyter Notebooks for almost all programming assignments. Would you like to learn more about grading Jupyter Notebooks or do you have any questions regarding this guide? I’d be more than happy to help you out via support@codegrade.com

Devin Hillenius

Devin Hillenius

Co-founder, Product Expert
Devin is co-founder and Product Expert at CodeGrade. During his studies Computer Science and work as a TA at the University of Amsterdam, he developed CodeGrade together with his co-founders to make their life easier. Devin supports instructors with their programming courses, focusing on both their pedagogical needs and innovative technical possibilities. He also hosts CodeGrade's monthly webinar.

Continue reading

Teaching Intro to Python with CodeGrade

The CodeGrade Introduction to Python course is an 8-week basic Python course. Students are not required to have any prior knowledge on programming or Python. This course will cover the basic concepts of programming up to Python specific modules and OOP design. It is available for all instructors now.

New release CodeGrade QuietStorm.1!

Find out about all the features and updates coming with our latest release, QuietStorm.1!

New release CodeGrade QuietStorm.1!

Automatically grading Haskell code assignments

Learn about autograding Haskell coding assignments for Computer Science education courses. CodeGrade can help you use tools like input and output checking, Quickcheck, Tasty and HUnit unit test autograding, HLint code quality checking and code structure autograding using semgrep.

Top tips for teaching programming

The most efficient ways to teach students how to code, by defining your target audience, designing your assignments in a meaningful way and eliminating distractions in the classroom.

Top tips for teaching programming

Learn more about CodeGrade!

Grow your coding classroom
without compromise.