Here is the basic method for organizing your ecological data analysis projects in R. Why do this? Reproducing analyses is critical for good science. There is nothing worse than trying to re-run a script when you finally get comments back from your reviewers only to find that your results are a bit different than before. What?! Speaking from personal experience, it’s taken days of blood, sweat, and tears to figure out what was different in the data, what code I was running in the wrong order, or that I was running the wrong code all together! Start now and get in the habit of sticking to a system for organizing your R projects.
While there are many methods and variations on how to do this (see links at the end of the post), the scope of this current post is to offer a short and simple overview of my own method so that you can get started ASAP. Those that follow me know that I am a big fan of getting right into the code and data—that is the best way to learn. So let's get to it.
1) Use RStudio for all your analyses. Some of you 1% hardcore coders might prefer the minimalist terminal-type interface included in the basic R download, but for everyone else, use RStudio. It’s a no-brainer. Click here to download it.
2) Create a new project (File > New Project). The directory you set here will be the folder where you store your data, scripts, and other files related to your analysis.
3) Create the folder structure inside your project folder so that it looks like this:
4) Create your R scripts. Unless your analysis is very simple and direct, you should be using multiple scripts (pretty much always the case when your project is large enough for an entire publication). Ideally, each script should be a set of code that you can run in one go. This is not always possible, but strive for that and use a separate script for each component of the analysis. I recommend you create the following scripts right away:
5) Start off each R script with a good description of the entire project and particular scope of the script. The more comments the better, but more on script commenting in another post. Here's an example:
That’s pretty much it! Each time you open the project in RStudio, all the scripts will open. Just make sure to run the packages and dataclean scripts before the others. By using RStudio Projects, there is no need to include a setwd() line, just add in “data/processed/“ before your filename whenever uploading any data, or add “output/“ or “temp/“ whenever exporting something.
If you want some longer in-depth explanations on code management in R, check out these other excellent blog posts:
If you enjoyed this post and want to learn more, stay tuned for my upcoming training courses on data analysis in R for ecologists. Sign up here to be notified when courses are available:
* By clicking "Subscribe" you are agreeing to receive email notifications from The Grad Ecologist and you can unsubscribe at any time.
Luka Negoita, PhD
I received my BA in Human Ecology from College of the Atlantic in 2011 and my PhD in Biology with a focus in theoretical plant ecology in May 2018 with Dr. Jason Fridley at Syracuse University. I love teaching and working with ecology students on everything from mental health to data analysis, research design, and study techniques.