British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology. 

Compbio 001: A what is programming guide for a biologist

Throughout my PhD, I never programmed anything. I used Excel to filter large tables and that is all. Some people are certainly pros with Excel, but there is a limit with Excel. Some data sets are just too large to deal with. Some tasks too complicated. But I know from personal experience just how daunting it is for someone with no programming experience to pick some up. When I was first setting out, I imagined having to think in binary, having worry about the hardware and account for the RAM. I was very naive. This sort of situation was more true in the very early days of programming. But today, higher level programming languages mean you do not write lines and lines of code to instruct a specific piece of hardware to do something, and you do not think in binary (or Matrix code - but you can have green text on a black background so that's cool). Instead, programming languages like Python, very popular in compbio at the moment, make programming feel more like you are writing in English. You can build sentences:

$ if x == 1 and y == 2:
$     print("Total of both would be three")

This is valid code in Python. I use the $ sign to indicates lines of inputted code (not code itself). Basically, you can enter code like this into the Python interpreter, the part of the computer that allows it to read Python, and then it is turned into something complicated that the machine can use. You can enter Python code into the command line (link to previous blog: link) or you can enter it into a IDE (Integrated development environment; basically a program that lets you enter code, run it and can spot errors before you run it). 

I will not go into why one might need to learn how to program, if you are reading this, you probably already know that you need to learn how to code. I will not teach you how to code (links at the bottom to the online resources that helped me get started). Instead I just want to get you in the mindset needed before tackling this endeavor. Something I wish I had when I set out. 

So coding/programming in Python is essentially telling the computer that you have set of a lot of variables what values they are. These values can be numbers or words - the variable is what the computer uses to link your value to a space in the memory of the computer. Variables are just text and can be named almost anything. Not anything because the programming language uses some words as special functions (like "print", "if", and "def"). You link a value to the variable with the equals sign (=). In programming this does not mean "same as", instead it is a linker or a binding command. So I can set the variable ryan to mean the number 22: 

$ ryan = 22

Now if I was to check what value the variable ryan was, I can ask Python to print the value of the variable to the terminal/console like this:

$ print(ryan)

And you can see, ryan is 22. Let's say that you wanted the variable ryan to be assigned to text rather than a number. You can do it like so

$ ryan = "twenty two"
$ print(ryan)
twenty two

You need quotations around text. This signals to Python that the material between the quotation marks are text rather than a number or another variable. And Python calls text a string, but as a biologist, you should be used to jargon. 

So with just these few lines of code, I have demonstrated how to bind a value to a variable and how to use a function (print). With just a little bit more knowledge, you can get the computer to do all sorts of wonderful things for you. So please go and brave the resources below and IRL classes to learn how to code. Then I am sure you can deal with most problems in genomics and compbio that comes your way. 

Python for Biologists - A really great website for biologists to use to learn how to code in a very practical way that can quickly be applied to problems you might want to solve. Reading this after taking a few Python classes is how I first applied programming to solve a scientific problem.

Codecademy - An interactive website to get practical experience coding to solve the problems they offer. A really useful website to get used to coding in Python.

A wonderful tutorial that will walk you through each stage of getting Python on your computer and how to start writing your own programs.

The MIT Introduction to Computer Science and Programming from 2008 lectures are freely available on YouTube (link to the first lecture below). Watching the first few of these was a great way to understand what is going on with the computer while you type in words and numbers into the terminal (or Codecademy). It really made me feel more comfortable with what I was doing, even if I still do not have a deep appreciation of computer science.


Footnote: Better people than I have discussed at length what programming language you should choose. I mention Python here because it is what I first learnt. The general consensus seems to be 1) It does not matter what language you learn, just learn one, 2) Learn something that others are using (hence why I learnt Python, very common in compbio and in the group I was in) and 3) Whatever language you learn, someone will argue on the internet how their favourite language is better, so resign yourself to that now. 

Compbio 002: The command line is your friend

Compbio 000: Introduction