Testing unit tests for Python code assignments in CodeGrade
February 15, 2021

Testing the Tests: Autograding student unit tests in Python assignments (using code coverage and mutation testing)

All computer scientists will have to learn how to write effective unit tests at some point in their (academic) career. Because of that, most computer science degrees teach their students how to do this in some form or another nowadays. Sometimes this is already done in introductory courses, to lay a good foundation, but often we see this is done in more advanced Software Development courses.

For this guide, I have researched the best ways to autograde student unit tests for Python assignments and will explain step by step how you can automatically assess pytest unit tests in CodeGrade. However, all theory and principles covered will also be very useful for achieving this in different programming languages and articles on different programming languages (e.g. Java’s JUnit) will follow soon. This blog will explain how to grade unit tests, click here to learn how to grade python code by using automated (unit) tests in CodeGrade.

Unit Test assessment metrics

There are multiple ways in which we can assess unit tests, two industry standard metrics will be covered in this guide: code coverage and mutation tests. Both serve different purposes and differ in complexity.

  • Code Coverage is the most common metric to assess unit test quality. It very simply measures what percentage of source code is covered by the unit tests that you have written. Different metrics can be used to calculate this, for instance the percentage of subroutines or the percentage of instructions of the code that the tests cover. 
  • Mutation Testing is a more advanced metric to assess unit test quality, that goes beyond simple code coverage. During mutation testing, a tool makes minor changes to your code (known as mutations), that should break your code. It then checks whether a mutation makes your unit tests fail (this is what we want if our unit test is correct and complete) or not (meaning our unit test was not correct or incomplete).

You may now be wondering why we want to go as advanced as Mutation Testing to test our simple unit test assignment. The answer is that even though we do measure how many lines or instructions our unit tests cover with the Code Coverage metric, we do not know how well these lines are actually covered by our tests. Unit tests themselves are written code too, and can have bugs or typos in them or can miss to test certain edge cases. A bug or incompleteness in your unit test can be found very well using mutation testing, making it a very useful metric to assess the quality of unit tests for educational purposes. Of course, a combination of both metrics is even more effective and useful to assess your students!

Code Coverage Assessment for Python using pytest-cov

As it is by far the most easy to set up, we will start by assessing the student unit tests on Code Coverage in AutoTest. This can be very easily autograded as the resulting percentage can be simply converted to the points we give our students.

Pytest, one of the most common unit testing frameworks for Python and supported out of the box by CodeGrade, has a really nifty package to calculate code coverage called pytest-cov. We can install this in the Global Setup script together with installing cg-pytest using cg-pytest install pytest-cov. It is now very easy to run our student unit tests, while also reporting on their test coverage by using the unit test step in one of your AutoTest categories, by just adding `-- --cov=. --cov-report xml:cov.xml`to your usual command. This will generate a coverage report in xml format called `cov.xml`. Your final command for your unit test step will become:

-!- CODE language-console -!-cg-pytest run -- --cov=. --cov-report xml:cov.xml $FIXTURES/test_sudoku.py

Now, all we have to do is parse the coverage report and use the percentage to give our students points. Luckily, CodeGrade’s Capture Points step is exactly what we need for this. I wrote a small script called `print_cov.py` and uploaded this as a fixture. Running this in the Capture Points step will print the coverage rate, which will then be used to calculate the number of points gotten for this test.

Capturing the Code Coverage in CodeGrade
Start autograding all your Python assignments now!

Mutation Testing for Pyton using mutmut

For most basic assignments, code coverage will be a good metric to easily and quickly assess the completeness of unit tests that students write. For more advanced assignments however, adding mutation testing can greatly improve the quality of grading pytests by also grading the quality and correctness of the tests.

After researching possible python packages that do mutation testing, I found out that mutmut is by far the easiest and most supported one. Additionally, I found a great article by Opensource.com, which shares some very easy example code which you can use to try out mutmut and I also use for this example assignment.

We can install mutmut in the same way as we did with pytest-cov above, in the Global Setup Script while installing cg-pytest: `cg-pytest install pytest-cov mutmut`. Afterwards, in the Per-student setup script, we will move the unit tests submitted by the students to a test folder (you can also enforce this using Hand In Requirements), using the command `mkdir tests; mv test_angle.py tests/test_angle.py;`. Make sure pytest tests have to be named something starting with `tests_` in order to be valid, you can also enforce this using Hand In Requirements.

We are now ready to run mutmut, this can be done in a simple Run Program step using the command: `mutmut run --paths-to-mutate angle.py --tests-dir tests/ --runner "python3.7 -m pytest -x --assert=plain" || true`. In this command, we specify `angle.py` as the script to mutate (uploaded by the student) and point to the tests directory we set up in the Per-student setup script. Finally, we have to select a runner to make sure python version 3.7 is used, for which we have installed pytest. We finish the command with `|| true`, which makes sure our exit code is always successful and thus we get a point for the Run Program.

After running mutmut, we can now use a Capture Points step again to capture the percentage of mutations killed. I use the following command to generate the `junitxml` report, write it to `mutmut.xml` and then again use a simple python script to parse and print:

-!- CODE language-console -!-mutmut junitxml --suspicious-policy=ignore --untested-policy=ignore > mutmut.xml && python3 $FIXTURES/print_mutmut_score.py

-!- CODE language-python -!-import xml.etree.ElementTree as ET
tree = ET.parse('mutmut.xml')
root = tree.getroot()
failures = int(root.get('failures'))
tests = int(root.get('tests'))
print((tests - failures) / tests)

CodeGrade Mutation Test autograder category

Teaching unit testing for python effectively

The tools explained in this guide will help you set up effective automatic grading for your python assignments in which students have to create unit tests themselves. With CodeGrade’s continuous feedback, students can submit their code and unit tests many times and improve after getting their instant feedback.

You can make this process even more effective by adding custom descriptions to your tests, which you can see I also did in the last screenshot in this guide. Many of the steps we covered use relatively large and hard to understand commands, which can be rather confusing for our students. Recently, we have added the possibility to customize test descriptions in CodeGrade, which allows you to rename your tests to something that will help your students understand them better, for example: 

Mutation Test Score: The percentage of mutations that got caught by your unit tests. 0% means none were caught and 100% means all were caught. (see chapter 3)

I hope this guide has helped you set up your assignments in which you grade unit tests from students. As I mentioned, these metrics are applicable to many other programming languages, and these will be covered in new guides very soon. Would you like to learn more about code coverage or mutation testing in CodeGrade? Or would you like to apply this to another language? Feel free to email me at support@codegrade.com and I’d be more than happy to help you out!

Devin Hillenius

Devin Hillenius


Continue reading

The Case for Change: Accessibility, Inclusion, and Equity in Computer Science Education

Learn how to drive institutional change, create safe spaces, teach accessible, and develop inclusive coding assignments with Dr. Brianna Blaser, Dr. Yasmine Elglaly, and Dr. Richard Ladner.

How to Check for Plagiarism in C, C++, and C# source code

Simplify plagiarism checks on C-family source code by integrating CodeGrade into your courses.

How to Stop Caring about Grades

Learn to save time with grading with alternative methods for teaching. Make your course accessible and inclusive with methods backed by research.

How to Stop Caring about Grades

Embracing a New Era: Harnessing Generative AI for Computer Science Education

Discover the transformative impact of Generative AI (Gen AI) on computer science education. Explore innovative tools and methodologies reshaping learning experiences and preparing learners for an AI-centric future. Uncover benefits, challenges, and evolving practices at the intersection of technology and pedagogy.

Sign up to our newsletter

See how CodeGrade can transform your courses today!