A case for using Google Colab notebooks as an alternative to web servers for scientific software
2022-10-18 Typo correction and included a list of links to learn more about Google Colab.
2022-10-20 The title and description changed. A PDF version of the article is uploaded to Zenodo at https://doi.org/10.5281/zenodo.7232109
I recently came across ColabFold (Mirdita et al. 2022), a slimmer and faster implementation of AlphaFold2 (Jumper et al. 2021) (the famous protein structure prediction software from DeepMind) implemented on Google Colab in the form of a Jupyter notebook, giving it an easy-to-use web server-like interface. I found this idea intriguing as it removes the overhead of maintaining a webserver while providing a web-based graphical user interface.
Google Colab is a free (with options for pro subscriptions) Jupyter notebook environment for Python (R indirectly) provided by Google that runs on unoccupied Google servers. This free resource also includes access to GPU and TPU making it attractive to various machine learning and data science tasks. For the most part, Google Colab is utilized in machine learning and data science education. However, following the example of ColabFold and my implementation of ColabHDStIM, I want to make a case that it can also be used for providing an easy-to-use interface or live demo for scientific software without maintaining the complex infrastructure of a web server.
Coming from a bioinformatics/computational biology background, I know there is a craze for developing web servers worldwide. However, although many web servers are created yearly, many groups, especially in developing countries, lack the resources to build one. On the flip side, many of these initially well-funded web servers are either of low quality, are not kept updated, or go offline soon after the publication, thus squandering the resources (Veretnik, Fink, and Bourne 2008; Schultheiss et al. 2011; Kern, Fehlmann, and Keller 2020). Therefore, there is a need for an alternative where scientists can distribute their software in an easy-to-use interface like interactive notebooks. Even if the notebook environments are limited in executing production-scale software, they can still be utilized to provide a live demo on a minimal dataset. In my opinion, it is better than the vignettes accompanying software.
Below are some pros and cons of using Google Colab.
Pros
- Easy to implement
- Free hardware resources from Google, including GPU and TPU
- Option to buy more resources from Google as per need
- While hosted on Google’s server, the same notebook can be executed using a local runtime to take advantage of local hardware resources.
- Forkable and hackable if the original maintainer stops the development.
Cons
- Free hardware resources can be limiting
- Uploading and downloading data to a Colab is slow and require a workaround
- All the instances are transient; therefore, on every restart, all the required software is re-installed, which takes time.
- Colab notebooks are meant to run interactively; therefore, maintaining a long background session is hard or impossible.
- Colab primarily supports Python and requires workarounds to support other languages.
Learn more about Google Colab
- Google Colab frequently asked questions
- Welcome to Colab!
- Practical introduction to Google Colab for data science (YouTube video)
References
Citation
@misc{farmer2022,
author = {Farmer, Rohit},
publisher = {Zenodo},
title = {A Case for Using {Google} {Colab} Notebooks as an Alternative
to Web Servers for Scientific Software},
date = {2022-10-17},
url = {https://doi.org/10.5281/zenodo.7232109},
doi = {10.5281/zenodo.7232109},
langid = {en}
}