Compbio 005: Why should a biologist learn R?

If you are already learning another programming language (like Python), then you might have the attitude of "why should I also learn R"? It is certainly true that pretty much everything you do in R, you can do in Python. There are in fact many modules that replicate features of R in Python. If you are wondering what to learn as your first language, R is a powerful way to analyse data. I find it much more direct for dealing with data but there is less focus on some of the basic mechanics of programming (like "for loops") when learning and using R.

I resisted learning R for a long time; I was happy to use Python to do everything. But I finally changed my mind and I am happy to have R at my disposal. So here are some reasons to learn a little R, even if you are already learning another language for computational biology:

1. R is centred around the dataframe object: dataframes are essentially tables. Much of what we do in computational biology is centred around manipulating tables of text and numbers. R is a great language for this central part of computational biology, with many functions and libraries that can manipulate dataframes in powerful ways - saving time compared to writing your own loops and functions in Python.

2. Plotting is beautiful and (fairly) straightforward with ggplots. I love matplotlib as a Python module to make plots but I have often run into situations where what I wanted to do, was far easier in R with ggplots despite having less experience with it. So I switch between the two, using the best tool for the task at hand. See this beautiful univariant scatter plot made in ggplot compared to a bar plot made in Excel (post on how to make univariant plots in R coming soon).

3. A lot of bioinformatics software is written as an R package. This means that if you want to use a certain program, you cannot just run it on the command line from a binary, like you can with Cufflinks or Salmon, but you have to load up R and either enter the code line-by-line or run an R script. Learning to use the R package Sleuth to analyse RNA-seq data forced me to learn more R. I could only get so far by copy-pasting the commands in the tutorial before I had to customize the commands I was entering, and knowing a little R went a long way.

It is true that you can use the Pandas module in Python to replicate dataframes. That is great, many people never learn R and are happy to do everything either without dataframes in Python or by using Pandas. I am not a fan of Pandas, but I think that has more to do with how my head works - I just find the syntax for handling dataframes diffecult in Pandas, despite normally like Python's syntax more than R.

If you are in the middle of learning your first (non-R) language, then I'm not sure jumping into R right away is the best approach. Get good with what you are doing. But when the day comes when you want to do some stats, modify a table in a strange way or use a library for analysis which is distributed as an R package, be willing to learn R - it will not be easy. But I have spent hours writing Python code to re-arrange a table, which takes ages to run, only to discover much later that R has a function to re-arrange the table exactly as I wanted in a single line of code, which runs almost instantly. I felt rather stupid for not realizing sooner.

There are many ways to learn R. Here is one research to check out:
http://swcarpentry.github.io/r-novice-inflammation/
http://swcarpentry.github.io/r-novice-gapminder

As with most things, the easiest way to learn is by doing. Doing it with your own data to solve a problem you have. Next time you run into a problem you are really having a tricky time with, do some Googling to see if it has an easy solution in R.

British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology.

A geneticist interested in splicing, RNA decay, DNA methylation and synthetic biology. This is my blog focusing on my adventures in computational biology.

British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology.

Oct 13 Compbio 005: Why should a biologist learn R?

Oct 16 Compbio 006: More than one way to arrange a table - the long and wide of it

Sep 14 Compbio 004: Practical Python for biologists - Use dictionaries

A geneticist interested in splicing, RNA decay, DNA methylation and synthetic biology. This is my blog focusing on my adventures in computational biology.