R nanocourse 4: Dynamic Documents

Introduction

After the three first parts of the nanocourse, you have learned the basics of manipulating R and the Markdown language in the context of International Business. This session will consolidate all previous steps in order to create dynamic documents from an initial RMarkdown document.

Goals

At the end of the lab, you should be able to:

  1. explore the YAML options;
  2. understand the details of the R chunk settings;
  3. {import | transform | visualize} your data in a reproducible process.
  4. export your RMarkdown document in a {PDF | .html | .doc} file;
  5. transform your RMarkdown document into a ioslide format (powerpoint-like);

The goal of this session is to create a RMarkdown document regarding the analysis of a specific industrial sector in Russia. This RMarkdown document will produce both a PDF file and an ioslide presentation.

Keywords: RMarkdown; ioslide; beamer; RStudio; reproducible research

List of all commands studied so far

Throughout the last three Nanocourses, we have seen a total of 28 different commands. With these commands, you will be able to perform your analysis.

Command Detail R nanocourse
Embedding an image Nanocourse 1
Text in bold Nanocourse 1
Text in italic Nanocourse 1
URL link Nanocourse 1
First level title Nanocourse 1
Second level title Nanocourse 1
Third level title Nanocourse 1
Equation in LaTeX format Nanocourse 1
library() Loading library in the environment Nanocourse 2
read.csv() Reading .csv file Nanocourse 2
readGoogleSheet() Reading Google Sheet document Nanocourse 2
cleanGoogleTable() Cleaning Google Sheet document Nanocourse 2
head() Show first lines of a dataframe Nanocourse 2
summary() Description of the dataframe Nanocourse 2
as.numeric() Treat data as numeric Nanocourse 2
as.factor() Treat data as factor Nanocourse 2
geom_bar() Bar chart Nanocourse 2
geom_line() Line chart Nanocourse 2
geom_point() Point chart Nanocourse 2
gsheet2tbl() Load Google Sheet document Nanocourse 3
dataframe$newColumn Create new column in dataframe Nanocourse 3
$newColumn <- NULL Erase column Nanocourse 3
dim() Size of the dataframe Nanocourse 3
filter() Subset of a dataframe Nanocourse 3
arrange() Sort by ascending value Nanocourse 3
arrange(,desc()) Sort by descending value Nanocourse 3
dcast() Long to wide format Nanocourse 3
melt() Wide to long format Nanocourse 3
full_join() Merge 2 dataframes based on common columns Nanocourse 3

Syntax options

YAML

The YAML is the first lines of code telling how your document will be rendered. It lies in the top of your document between 3 dashed lines.

---
title: "R nanocourse 4: Dynamic Documents"
author: "Thierry Warin"
date: "12/02/2020"
output: 
  html_document:
    toc: yes
    toc_depth:3
  pdf_document:
    toc: yes
    toc_depth:3
---

In the output section, you have several options:

  • toc: yes/no, which enable/disable the table of content
  • toc_depth: number, which set the depth of the table of content

You can set a specific date to your document, but also change it so that it will render the actual date of rendering. For example, you are writing a document that will be compiled in three months, the date that will be shown will be the actual date. To do so, you can enter a command in R code that will seek for the actual date of your console, using the format() function.

date: `r format(Sys.time(), '%d %B, %Y')`

Task:

  • Create a new document (.Rmd): File > New File > R Markdown…
  • Set your YAML settings according to the date of knitting

R chunk settings

Every line of R code has to be confined between dashed lines for the RStudio console to interpret them.

However, it is possible to set different parameters in order to render different outputs for each R chunk. These settings have to specified in the first line of the R chunk, such as:

  • echo = FALSE/TRUE: if FALSE, the code will not been shown
  • warning = FALSE/TRUE: if FALSE, warnings will not been shown
  • message = FALSE/TRUE: if FALSE, messages generated by the code will not been shown
  • fig.align = 'center'/'left'/'right': will align the figure generated depending on the setting

Analysis of an industrial sector

With your previous nanocourses, you have in hand all the algorithms (see first section of this nanocourse: List of all command lines studied so far) needed for your analysis. Let’s take the UNIDO database and focus on a particular industrial sector (sugar industry - ISIC1542) to reveal interesting insights. For the details of each line of code, please refer to the Laboratory Nanocourse 3.

Task:

  • Load the UNIDO database regarding the overall industrial sector (gs15x)
  • Subset the dataframe in order to keep only data appropriate (IsicCode = 1542)
# Loading packages
library(gsheet)
library(dplyr)

# URL of the UNIDO dataset
gs15x <- "https://docs.google.com/spreadsheets/d/1aTJFKmkH2oxYcg0aiWeMAttGWKdM1u2KS5OyIlUkI6Q/edit?usp=sharing"

# Using the gsheet2tbl function to import the UNIDO dataset into the RStudio console
dataUnido <- gsheet2tbl(gs15x)

# Transform variables into numeric values
dataUnido$Value <- as.numeric(dataUnido$Value)
dataUnido$Tablecode <- as.numeric(dataUnido$Tablecode)
dataUnido$CountryCode <- as.numeric(dataUnido$CountryCode)
dataUnido$Year <- as.numeric(dataUnido$Year)
dataUnido$IsicCode <- as.numeric(dataUnido$IsicCode)
dataUnido$Unit <- NULL

# Subset concerning only data for the IsicCode = 1542
dataUnidoSubset <- filter(dataUnido, IsicCode == 1542)

Number of employees

Task:

  • Select a subset of the dataset regarding only the number of employees (Tablecode == 04)
  • Provide for 2010 a ranking of the country with the most important number of employees
  • Visualize and compare the top 7 countries in terms of employees
# Data regarding the number of employees
dataEmployees <- filter(dataUnidoSubset, Tablecode == 4)

# Data regarding 2010
dataEmployees2010 <- filter(dataEmployees, Year == 2010)

# List the 10 most important countries in terms of employees in 2010
ranking <- arrange(dataEmployees2010, desc(Value))

The list of the top 7 countries in terms of employees in the Sugar industry in 2010 are :

head(ranking, n=7)

Now, let’s visualize these data in a bar chart.

library(ggplot2)
library(ggthemes)
library(reshape2)

# Transform the column 'CountryCode' in a factor type
ranking$CountryCode <- as.factor(ranking$CountryCode)

# Produce a bar chart
ggplot(data = ranking[1:7,], aes(x = CountryCode, y = Value, fill = CountryCode)) + 
  geom_bar(stat = "identity", width = 0.5, position = "dodge")  +  
  ylab("Number of employees")  +
  xlab("") +
  guides(col = guide_legend(row = 1)) +
  theme_hc() +
  scale_fill_brewer(direction = -1)

So the most important countries in terms of employees in the sugar industry in 2010 are:

  • China
  • Russia
  • Mexico
  • Iran
  • Vietnam
  • Ukraine
  • Colombia

Number of establishments

Based on previous results, we would like to observe the evolution of the number of establishments in the top 3 countries as of 2010 in terms of employees (i.e. China, Russia, Mexico).

Task:

  • From the sugar industry dataset, select data corresponding to these 3 countries for all available years
  • Visualize the evolution of the number of establishments through time
# Subset of the dataEmployees dataframe concerning only the three selected countries
dataEmployeesCountries <- filter(dataEmployees, CountryCode == 156 | CountryCode == 643 | CountryCode == 484)

# Transform the column 'CountryCode' in a factor type
dataEmployeesCountries$CountryCode <- as.factor(dataEmployeesCountries$CountryCode)

# Produce a line chart
ggplot(data = dataEmployeesCountries, aes(x = Year, y = Value, color = CountryCode)) +
  geom_line()  + 
  ylab("")  +
  xlab("") +
  geom_smooth(span = 0.8) +
  ggtitle("") +
  theme_hc() +
  scale_color_brewer(direction = -1) +
  guides(fill=FALSE) +
  geom_point(colour = "blue", size = 2,shape = 22)

File format

PDF / HTML / doc

Now that your analysis has been completed, you can export your document in different format from the same RMarkdown file. To do so, click on the arrow on the right of the “Knit HTML” buttom and select the appropriate format (HTML, PDF, doc).

Task:

  • Render your RMarkdown document into a PDF file
  • Render your RMarkdown document into a HTML file

Ioslide / Beamer presentation

From the same RMarkdown file, it is possible to generate a “powerpoint” presentation. To do so, please consider the following instructions. First, you have to change the YAML options: in the output field, insert:

  • output: beamer_presentation (powerpoint)
  • output: ioslides_presentation (interactive presentation)

Secondly, a proper typology has to be adopted:

  • For a first-level slide, insert “#” before the title
  • For a second-level slide, insert “##” before the title

Task:

  • Open a new RMarkdown document
  • Select the code corresponding to the industrial analysis
  • Create an ioslide/beamer document
  • Showcase your results in the two different formats

References

Resources

For more on the RMarkdown syntax, please refer to:

Packages

Acknowledgments

To cite this course:

Warin, Thierry. 2020. “SKEMA Quantum Studio: R Nanocourses.” doi:10.6084/m9.figshare.11842416.v1.