Using RMarkdown
R Markdown Basics
Purpose of This Tutorial
The purpose of this tutorial is to illustrate the reason to use RMarkdown whenever you are programming with R. We will walk through why you should use RMarkdown, some of the basics of using RMarkdown, customizing RMarkdown to fit your needs and other aspects of RMarkdown that may be useful to you in the future.
Why Use RMarkdown?
R Markdown provides a way to tie together both code and prose into a single document. Within an R Markdown file you can:
- run code
- create results/reports that can be publication ready in many different formats.
R Markdown can be used to create useful reports you can send to a stakeholder/coworker, etc., but it can also be a useful programming tool to use daily to document and organize your code.
Getting Started with RMarkdown
Although RStudio is generally preloaded with R Markdown, if necessary, you can install R Markdown and it’s functionality by running:
install.packages("RMarkdown")
To get started, you can use the basic layout given to you by RStudio.
This can be found by going to
File -> New File -> R Markdown... -> Click OK
(we
are going to leave the defaults for now).
If you look at a basic R Markdown file, you have three key sections:
- The YAML (which is a header section used to set certain properties)
- The prose section
- Code chunks
I will initially cover the final two sections and come back around to how you can customize your markdown documents using the YAML.
All About the Prose
The prose portion of a markdown file (.Rmd) is a place where you can type as if you are in a Word/notepad/html type document. In this portion you can add small details about your analysis or you can create a fully featured, customized report. The prose portion, by default, is the area of the markdown file that has a white background. This portion of the markdown file is essentially treated like HTML and can be modified as you would modify a website.
Prose Basics
While typing out your report in the prose section you can modify the
text in many ways to organize your report in a meaningful way. The first
thing that you can do to organize your report is to create headers.
Headers provide a way to create sections in your file. The headers
themselves will generally appear larger and will stand out from the
normal text. Headers are created within the prose section by using the
#
sign followed by a blank space and then the name of the
header. Make sure to have a blank space or it will not work
properly.
While html uses h1 h2 h3
, etc. as a way of designating
the level of the header, markdown uses the number of #
s.
One #
indicates the first or highest level. This will be
the largest and boldest header.
This is usually reserved for the main sections of a paper (Results,
Abstract, etc.). To create a subsection you can then use ##
which will be a second level that falls underneath the first. This can
be followed by even more subsections down to the third, fourth, etc.
levels. A top/first level header can be seen in this document with the
R Markdown Basics
section. A second level
section/subsection can be seen below with the Writing Code
section. And this current section falls under a third level header.
Creating sections will make it easier for you to organize your report
and for a reader of the document to follow the logic of the report.
Writing Code
Within an R Markdown file to create code that can be run and
interpreted you can create an R code chunk (another way exists that I
will touch on in the next section). These R code chunks essentially
behave as an R script that can be run alongside the prose. You can tell
a code chunk from the prose because it will have a grey background (by
default), it will have three ` marks followed by {r}
in the
top row, and then three more ` marks at the bottom of the chunk.
Everything between the top and bottom line of accent marks is the code that you run. This can be everything from a simple line of code to a whole analysis. Generally, I tend to break up my code chunks by tasks that need to be done.
Running Code in Code Chunks
To run the code within your code chunk you have a few options. You can generally run your code as you would in a normal R script by putting the cursor on the line of code you want to run and using the keyboard shortcut Ctrl+Shift+Enter (CMD + Shift + Enter on a Mac). You can also run the entire chunk at once by pressing the green arrow button in the upper right hand corner of the code chunk. Finally, you can also use the Run drop down menu in the upper right hand corner of the scripting window:
This drop down menu gives you to run the selected lines you have highlighted, run the entire chunk, run all of the chunks in the entire file, among other options.
Embedding code into your prose
One very useful tool to use within a markdown file is embedding code within the prose. This is useful if you would like to directly call results into your report. You can create code inline by using an accent mark ` followed by the letter r, the code you want to use and then ending with another `. This can be seen below.
Once the report is created, the result will look like this:
Notice how the r syntax was replaced with the actual value calculated in the code chunk above. This is highly useful as you can rerun analysis that may have some changes, which will automatically propagate throughout the report. You will not have to manually change every value you place in your report because of a single change in the initial data you collected.
Putting it all Together
Once you have created a document with prose and code, how to create a finished report that you can share? To do this, you need to knit your document. The next section is going to talk about what knitting is, what you need to knit, and some of the options you have in regards to output.
Knitting Files
Taken from https://rmarkdown.rstudio.com/lesson-2.html
“When you run render, R Markdown feeds the .Rmd file to knitr, which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output. The markdown file generated by knitr is then processed by pandoc which is responsible for creating the finished format. This may sound complicated, but R Markdown makes it extremely simple by encapsulating all of the above processing into a single render function.”
In other words, when you knit a file, you are turning your prose,
code, YAML, and the options you set into a usable document. To create an
HTML document, you should only need the R Markdown package installed and
knitr
packaged installed. We discussed the R Markdown
package earlier, but the knitr
package is new. This package
is often downloaded alongside other packages, so you may already have it
installed on your machine. If not, just use the code below:
Knitting to other types of outputs (PDF, word, pptx, etc.) can be a bit more tricky. To do this for a pdf, you will need to install a package or two following the directions from this website: https://yihui.org/tinytex/
This site will show you that you need to install the
tiny_tex
package and then run the
install_tinytex()
function as below:
Finally, to knit your document you can either do so through a function (will cover in a later tutorial and will link here) or through the knit button in the toolbar.
By simply clicking the knit button (needle and yarn) the document will be knit to the default output format (usually HTML). The output document will then be placed in whichever folder you have set as your working directory (hopefully you created a project folder using the steps from the creating project folder tutorial). To change the output, you can simply change to “Knit to PDF” or “Knit to Word”, etc. I will mention below how you can change the default output format within the YAML.
Taking it Further
So far we have talked about the prose section and code chunks, but the third portion of a Markdown file that we have left untouched is the YAML. The YAML can be used to add titles, tables of content, dates, change output format, pass parameters, and alter many other options. I will discuss some of the basic options that you can provide a markdown YAML, while future tutorials will cover more sophisticated options.
Setting Options in the YAML
The YAML is placed at the top of your markdown document and is denoted by three hyphens (—) on the top line and bottom line. In this tutorial I am going to discuss how to alter the title of the report, name the author, change the date, and alter the default output. More options do exist, but are outside of the scope of this tutorial.
Altering the title and author of your markdown only requires you to
type title:
/author:
followed by the title or
author that you want to use.
For the date option, you can input it exactly as you do the title and
author by using date:
followed by the date that you want to
put in. One other option can be seen below:
This option allows you to set the date as today’s date every time that you knit the document. This helps to alleviate any issues that may arise from having to manually change the date each time that you knit the report.
The final option we are going to talk about is the output option that
provides the default output format that will be created when knitting.
The three main options here are to put
output: html_document
, output: pdf_document
,
or output: word_document
. This will output the document, by
default, as an html, pdf, or word file respectively.
You can also leave the option open
as above, which will allow you to choose which output format you want to
knit to. Finally, you can use other types of output, but they do get a
bit more complicated. This tutorial itself was knit using the
readthedown
html markdown template. Future tutorials will
cover how to use templates and options like this one.
Setting Options in a Code Chunk
Another way to change the way your markdown file behaves when being
knit is to use code chunk options. These code chunk options allow you to
hide code from being seen in the report, suppress results, modify the
way a graph displays, among many other options. A code chunk can be
modified by following the r
in the top of the code chunk
with a comma and then the option(s) that you want to add to the code
chunk. An example can be seen below:
There are many options that you can provide to a code chunk. Below is a list of the different options and the choices you can make regarding these options:
- echo: (
TRUE
/FALSE
) This option indicates whether or not the code should be displayed in the markdown document. - eval: (
TRUE
/FALSE
) This option indicates whether or not the code should be evaluated in the markdown document. This may be useful if you want to turn off certain chunks of code that you are testing or are unsure if they will run properly. - results: (
hold
,hide
) By default, within a code chunk as a result is created (printed to console, graph created, table created, etc.) it is outputted into the markdown file before the next line of code is run. Thehold
option allows for all of the code to be run in a chunk before the results are shown. Thehide
options allows for the code to be run without printing out the results. This is useful for setup code chunks that do not need the output to be shown. - include: (
TRUE
/FALSE
) This option indicates whether or not the code and results should be shown. - message/warnings:
(
TRUE
/FALSE
) This option indicates whether or not messages/warning should be shown. These are useful to keep on when troubleshooting your markdown file, but should, generally, be set to False when completing your final report. - fig: (Many Options) The fig. set of options allow you to set the location, size, etc. of different images you print out in a report.
Setting Global Code Chunk Options
If you find yourself using the same code chunk options over and over again, you can decide to set the options as the default globally. This means that each code chunk will follow whatever option you set at the top of the markdown file, unless specified differently within the individual code chunks. This can be seen in the image below which illustrates the markdown options I set globally for this tutorial:
The syntax for setting global options is a bit complicated, but you
only have to use it once at the top of the markdown file and can
generally copy it over between your markdown files. The global options
are set here using the knitr
package and the
opts_chunk$set
function. Within the parenthesis you then
need to enter the options as you did within the code chunks.