class: left, top background-image: url("img/uc3m.jpg") background-position: 90% 90% background-size: 60% ### <img src="img/UC3M_logo_cc.png" width="50", hspace="20", align="left"> Coding Club UC3M <br/> <img src="img/ggplot2.png" width="150"> **Elegant and Informative Data-Graphics** ### Javier Nogales --- # First: install ggplot2 ```r install.packages("ggplot2") library("ggplot2") ``` <br/> # Even better: install the set of packages tidyverse ```r install.packages("tidyverse") library("tidyverse") ``` Includes: ggplot2, dplyr, tidyr, ... --- # Let's work Open a dataset: ```r install.packages("gapminder") library("gapminder") ``` Socio-economic dataset: ```r head(gapminder) ``` ``` ## # A tibble: 6 x 6 ## country continent year lifeExp pop gdpPercap ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ``` --- # A histogram in R Base ```r hist(gapminder$lifeExp, col="lightblue", main="", xlab="Life Exp") ``` <img src="index_files/figure-html/unnamed-chunk-7-1.png" width="60%" /> --- # A histogram in ggplot2 ```r ggplot(gapminder, aes(lifeExp))+geom_histogram(fill="lightblue")+labs(title="") ``` <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="60%" /> --- # A scatter plot in R Base ```r plot(gapminder$gdpPercap, gapminder$lifeExp, main="", xlab="GDP", ylab="Life Exp") ``` <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="60%" /> --- # A scatter plot in ggplot2 ```r ggplot(gapminder, aes(gdpPercap,lifeExp))+geom_point()+labs(title="") ``` <img src="index_files/figure-html/unnamed-chunk-10-1.png" width="60%" /> --- # So, why ggplot2? ### Based on The Grammar of Graphics High-level approach Breaks up graphs into modular logical pieces (semantic components) Leland Wilkinson, 2005 (statistician and computer scientist at H2O.ai) ### The ggplot2 package Implementation of the Grammar of Graphics for R Very flexible, with nice, informative and intuitive plots Hadley Wickham, 2005 (statistician and chief scientist at RStudio) --- # Syntax ``` ggplot(data = <DATA>) + <GEOM_FUNCTION>( mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION> ) + <COORDINATE_FUNCTION> + <FACET_FUNCTION> ``` <img src="img/omg.jpg" width="100"/> <br/> 7 components, but mainly 3... But so useful in practice: colors, legends, faceting, rendering, ... --- # What is exactly ggplot2? <br/> > "It is a mapping of data variables to aesthetic attributes of geometric objects" <br/> <br/> Three essential components: - data: *dataframe* with data that we map - aes: aesthetic attributes (x/y position, color, shape, size, ...) - geom: the geometric object (points, lines, bars, ...) we want to plot --- # Let's understand how ggplot2 works ```r plot = ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) ``` Nothing happens... --- # Let's understand how ggplot2 works ```r plot + geom_point() ``` <img src="index_files/figure-html/unnamed-chunk-12-1.png" width="60%" /> --- # Let's understand how ggplot2 works ```r ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point() ``` <img src="index_files/figure-html/unnamed-chunk-13-1.png" width="60%" /> --- # Let's understand how ggplot2 works ```r ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point() + * scale_x_log10() ``` <img src="index_files/figure-html/unnamed-chunk-14-1.png" width="60%" /> --- # Let's understand how ggplot2 works ```r ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point() + * scale_x_log10() + theme_minimal() + theme(legend.position="bottom") ``` <img src="index_files/figure-html/unnamed-chunk-15-1.png" width="60%" /> --- # Let's understand how ggplot2 works Combine layers: ```r ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) + geom_point(alpha=0.5) + theme_minimal() + theme(legend.position="bottom") + scale_x_log10(breaks = c(300, 1e3, 3e3, 10e3, 30e3)) + labs(title = "Gapminder and ggplot2", x = "Gross Domestic Product (log scale)", y = "Life Expectancy at birth (years)", * color = "Continent", size = "Population") ``` In that way, 4D information in a 2D plot... --- # Let's understand how ggplot2 works <img src="index_files/figure-html/unnamed-chunk-17-1.png" width="80%" /> --- # Some simple statistical models ```r ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + geom_smooth() ``` <img src="index_files/figure-html/unnamed-chunk-18-1.png" width="70%" /> --- # Let's try bar plots ```r ggplot(gapminder, aes(x=reorder(continent, continent, length)))+geom_bar(aes(fill=continent)) ``` <img src="index_files/figure-html/unnamed-chunk-19-1.png" width="70%" /> --- # Let's try bar plots with polar coordinates ```r ggplot(gapminder, aes(x=reorder(continent, continent, length)))+geom_bar(aes(fill=continent))+ * coord_polar() ``` <img src="index_files/figure-html/unnamed-chunk-20-1.png" width="70%" /> --- ## The syntax is the same! ### Just change the geom function or add more pieces (layers)... # Let's try boxplots ```r ggplot(gapminder, aes(x = continent, y = lifeExp)) + * geom_boxplot(fill="lightblue", outlier.colour = "hotpink") + geom_jitter(position = position_jitter(width = 0.1, height = 0), alpha = 1/4) ``` --- # Boxplots <img src="index_files/figure-html/unnamed-chunk-22-1.png" width="80%" /> --- # Densities (and formulas) ```r ggplot(gapminder,aes(lifeExp))+geom_density(aes(group=continent,colour=continent,fill=continent),alpha=0.1) + annotate("text", x = 38, y = 0.09, parse = TRUE, size = 8, label = "y==frac(1, sqrt(2*pi)) * e^{-x^2/2}") ``` <img src="index_files/figure-html/unnamed-chunk-23-1.png" width="80%" /> --- # Time Series ```r gapminder %>% mutate(gdp=gdpPercap*pop) %>% group_by(continent,year) %>% summarize(MeanLifeExp=mean(lifeExp), MeanGDP=mean(gdp)) %>% ggplot(aes(year,MeanGDP,color=continent)) + * geom_line() ``` --- # Time Series <img src="index_files/figure-html/unnamed-chunk-25-1.png" width="80%" /> --- # Facets ```r ggplot(gapminder, aes(gdpPercap, lifeExp, group=continent, color=year, size=pop)) + geom_point() + * facet_wrap(~ continent) + scale_color_gradient(low="red", high="green") + theme_minimal()+ theme(legend.position="bottom") ``` --- # Facets <img src="index_files/figure-html/unnamed-chunk-27-1.png" width="70%" /> --- # Themes and text ```r gapminder %>% filter(year==2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop, label = country)) + * geom_text() + # clean the axes names and breaks scale_x_log10(limits = c(200, 60000)) + labs(title = "GDP versus life expectancy in 2007", x = "GDP per capita (log scale)", y = "Life expectancy") + # add a nicer theme * theme_classic() + theme(legend.position="none") ``` --- # Themes and text <img src="index_files/figure-html/unnamed-chunk-29-1.png" width="80%" /> --- # Correlations ```r library(GGally) ggcorr(gapminder[,3:6], label = T) ``` <img src="index_files/figure-html/unnamed-chunk-30-1.png" width="80%" /> --- # Animations ```r install.packages("gifski") library(gganimate) gapminder %>% ggplot(aes(gdpPercap, lifeExp, size = pop, colour = continent)) + geom_point(alpha = 0.4) + geom_text(aes(x = gdpPercap, y = lifeExp + 2, label = country), size=4, data = filter(gapminder, country %in% c("Spain"))) + scale_x_log10(limits = c(200, 60000)) + theme_light() + theme(legend.position = 'bottom') + labs(title = 'Year: {frame_time}', x = 'GDP per capita (log)', y = 'Life expectancy') + * transition_time(year) + * ease_aes('linear') ``` We are going to plot 5D information in a 2D plot... --- class: center, middle <img src="img/gganimate.gif" width="800"/> --- # A little practice and you'll get... <img src="img/kll4.jpg" width="650"/> --- # A little practice and you'll get... Edward Tufte's book: Visual Display of Quantitative Information <img src="img/Dayton.png" width="1000"/> --- # A little practice and you'll get... Publication-ready plots <img src="img/heatmap.png" width="800"/> --- # A little practice and you'll get... Ask the Question, Visualize the Answer (Flowing Data) <img src="img/male-female.gif" width="650"/> --- # Final comments - Input for ggplot2 must be a data.frame - But there are shortcuts to avoid creating data frames, like qplot - Easy to save: ggsave("myplot.png") - Many themes to get different looks to your plots - Many packages based on ggplot2: factoextra, GGally, gganimate, ... - It can be integrated in interactive graphics: shyny, ggvis, etc. <br/> No excuses: use ggplot2 in your teaching, talks, publications, ... <br/> <br/> > "The purpose of visualization is insight, not pictures" --- Ben Shneiderman --- # More resources The main website: http://ggplot2.tidyverse.org/ The book: ggplot2 (Elegant Graphics for Data Analysis), by Hadley Wickham https://www.springer.com/us/book/9780387981413 The cheat sheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf The R Graph Gallery: http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html Top 50 ggplot2 Visualizations http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html Getting help (RStudio community) https://community.rstudio.com Getting help (stack overflow) https://stackoverflow.com/questions/tagged/ggplot2 --- background-image: url("img/uc3m.jpg") background-position: 90% 90% background-size: 60% # Thanks <br/> ### <img src="img/UC3M_logo_cc.png" width="60", hspace="20", align="left"> [**Coding Club**](https://codingclubuc3m.github.io) UC3M ### <img src="img/ggplot2.png" width="60", hspace="20", align="left"> ggplot2 is a part of the [**tidyverse**](https://ggplot2.tidyverse.org) ecosystem ### <img src="img/xaringan.png" width="60", hspace="20", align="left"> Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan)