British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology. 

Compbio 011: The fantastic Jupyter (IPython) notebook

The Jupyter (IPython) notebook changed my working habits for the better. You can write code, execute blocks of your code, one at a time, and have the output printed out below the block. Therefore you can generate a beautiful graph with matplotlib and have it appear just below the code that generate it. It works as a wonderful notebook, recording what you did, seeing the inputs, analysis and output all in the same place. This present a great new opportunity to share the work you have done by sharing the notebook. This can be to get feedback from a colleague when you run into a problem, or for making science more reproducible. 

Keeping the code and the plot together is also great record keeping, allowing you to return to the work months or years later and being then being able to understand how you made the figure. Just like a well kept notebook in the wetlab. 

In this post I will walkthrough how to install the Jupyter notebook using pip (on Ubuntu on Windows, see my previous post, but this should work on any Linux system). Then I will give an example of how the Jupyter notebook works by using numpy and matplotlib to generate a plot. 

To install the Jupyter (IPython) notebook on Ubuntu with pip, ensure that you are using the latest version of pip first: 

$ pip3 install --upgrade pip

Then install as so: 

$ pip3 install --user jupyter

or if you are still running Python 2.7: 

$ pip install --user jupyter

To run the notebook, enter this command: 

$ jupyter notebook

If you are running this on MacOS or Linux, then a new tab should open up in your browser with the Jupyter notebook. However, if you are using Ubuntu on Windows, a few moments after entering this command, a URL should appear. Simply copy and paste this into your browser and the Jupyter notebook should load. If not, kill the command with Ctrl+C and try again. 


From the drop down, you can select whether to run a new notebook as Python2 or Python3 - choose your poison and then we can get started. Folders and files in the directory you launched the notebook from can be seen. This is usually your home directory. 

The first code block of the new notebook should contain (the $ signs here are to signify new lines of text: they should not be added to the notebook):

$ %matplotlib inline
$ import matplotlib.pyplot as plt
$ import numpy as np

This means that when you plot a figure, it will appear within the notebook and allow you to immediately see what you have produced. Let's give it a try. With numpy, we can make some random numbers and then plot this as a histogram. To call the random numbers, add this to a block: 

$ np.random.seed(42)
$ x = np.random.randn(10000)

By setting the random.seed() to 42, we will be able to replicate the "random" numbers being generated as long as we always set the seed value to 42. To get a different set of "random" numbers, you can select a different seed number. Now to make the histogram with matplotlib by calling the hist() function (matplotlib was also imported by pylab inline): 

$ plt.hist(x, 50, normed=1, alpha=0.5)

Now we have the plot below the code we used to generate it. Each separate notebook is its own thing. Within each notebook, it is important to keep track of what variables you assign in Python. While each block of code is run independently of the other blocks in the notebook, the code in one affects the whole notebook. If you assign a variable in one block, it will be set across the whole notebook. Hence, when we assigned the array full of random numbers to x (x = np.random.randn(10000)) in one block of code, we could access the variable (x) in the next block to make the plot (plt.hist(x, 50, normed=1, alpha=0.5). This also means that you can (accidentally) overwrite what x is assigned to and when you try to remake the plot, you could be trying to plot the values of the wrong variable. So be careful. 

Finally, good code needs good comments, but a notebook also needs good notes. So the Jupyter notebook not only allows you to have blocks of codes, but also codes of text, specifically markdown blocks. Select the block type from the drop down at the top and then make your notes to help future you or someone else trying to understand and reproduce your work. 


In addition to selecting markdown from the dropdown, you can select headings, which turns a block into a title, allowing you to keep you blocks of code and markdown organized. 

I have only scratched the surface of what the notebook can do, but it is a wonderful tool for keeping your data/code organized and reproducible. Here are a few resources to get you started: 

Nice and simple introduction to how to use Jupyter (IPython) notebook

Jupyter notebook tutorial

Video Introduction to Jupyter Notebook

A short demo on how to use Jupyter (IPython) Notebook as a research notebook - old but useful

Compbio 012: Making Venn diagrams the right way (using Python)

Compbio 010: Ubuntu on Windows