CodeGrade: best autograder for data science education with AutoTest Caching

June 10, 2021

Better Data Science assignments with AutoTest Caching in CodeGrade

In 30 seconds...

In the past year, we have seen a big rise in Data Science oriented courses. These courses are not only popular amongst Computer Science students, but also for non Computer Science students from Business, Econometric or other STEM degrees. With our recent release CodeGrade Orchid, we have added AutoTest caching to CodeGrade, which solves some challenges posed in Data Science assignments:

In Data Science, as the name suggests, you often work with vast quantities of data. Downloading or uploading these large data sets can cause the configuration of an autograder to slow down, making feedback to students less instant.
Many R and Python Data Science courses rely on additional packages that are very specific to Data Science and have to be installed in the autograder. A good example is the Python Tensorflow package, which may take more than 10 minutes to install. If you have to do that for each student, you are losing a lot of speed.

While there are a lot of challenges with autograding Data Science, it is a field that especially benefits from autograding. It is often a first encounter with coding or scripting for students, and giving them instant feedback helps motivate them and accelerate their learning. Furthermore, Data Science assignments are often compiled of different subtasks, which are perfectly suited for autograding: students will see visible progress while solving each subtask and get feedback very quickly. We are proud to announce that with the addition of AutoTest caching to CodeGrade, we have solved the challenges to help you benefit even more from the advantages of autograding for your Data Science course. In the rest of this article, we will explain how to turn on AutoTest caching and how exactly it works.

Start automatically grading your Data Science assignments now with the best autograder for Data Science!

Book a demo

AutoTest caching will cache the state of your AutoTest environment after your configuration (e.g. after downloading data, installing additional software or packages or unarchiving). This means that you can configure your environment in any way you want, download your data set or install the required packages, and still give instant feedback to students. When a new student hands in, AutoTest can use this cached state to restart instantly and allow for very fast feedback to that student. It is also very useful when iteratively developing your AutoTest, as changes in test steps will not require a cache refresh.

Turning on AutoTest Caching for your assignment.

When to use and not use AutoTest Caching?

For most use cases, you will probably want to keep AutoTest Caching turned on. If on, it will give your students more instant feedback and help you iteratively test your AutoTest steps quicker. You should turn on AutoTest caching if:

You install new software or packages in the AutoTest Global Setup. By using caching, these will only be installed the first time you run AutoTest and will be cached after that.
You download or upload large fixtures / data sets in your AutoTest setup. With caching, these will only be downloaded once and then cached for all future submissions.
You are creating new tests for your AutoTest and iteratively testing them. Please note: AutoTest caching will reset every time you alter your Global Setup Script or fixtures.

There are however some (uncommon) use cases that do require you to turn off AutoTest Caching. For instance, if:

You need to always use the latest version of the software you install. When using cache, you will lock to a specific version of the software or package you install. Turn it off if it is important to always have the latest version installed.
You need to always download the latest version of the data set you download. For instance, because this data is real-time or will be changed by you throughout the assignment.
Your configuration depends on live data in any other way. It is however recommended to execute any code that depends on live data in the "Per student setup script to run" so that you are always sure this runs exactly when and every time a student hands in.

Want to learn more about Data Science in CodeGrade?

We have recently published a webinar on Data Science in CodeGrade, in which we explain how to best use AutoTest Caching, set up a basic autograded R Data Science assignment and autograded Python and Jupyter Notebook assignments. Watch it here: