sample <- c("Loschbour", "UstIshim", "Saqqaq", "AltaiNeandertal")
coverage <- c(18.2, 35.2, 13.4, 44.8)
archaic <- c(FALSE, FALSE, FALSE, TRUE)(A few remarks and tips before the practical session)
sample coverage age
1 Loschbour 18.2 8050
2 UstIshim 35.2 45020
3 Saqqaq 13.4 3885
4 AltaiNeandertal 44.8 125000
df[rows, cols]
Indexing by columns (“selecting columns”)
df[rows, cols]
Indexing by rows (“filtering rows”)
$If df is our data frame:
sample coverage age
1 Loschbour 18.2 8050
2 UstIshim 35.2 45020
3 Saqqaq 13.4 3885
4 AltaiNeandertal 44.8 125000
tidyverse makes everything we had to do
the hard way infinitely easier.
The tidyverse is a language for solving data science challenges with R code. Its primary goal is to facilitate a conversation between a human and a computer about data. Less abstractly, the tidyverse is a collection of R packages that share a high-level design philosophy […] so that learning one package makes it easier to learn the next.
The tidyverse encompasses the repeated tasks at the heart of every data science project: data import, tidying, manipulation, visualisation, and programming.
“Western Eurasia witnessed several large-scale human migrations during the Holocene. Here, to investigate the cross-continental effects of these migrations, we shotgun-sequenced 317 genomes—mainly from the Mesolithic and Neolithic periods—from across northern and western Eurasia. These were imputed alongside published data to obtain diploid genotypes from more than 1,600 ancient humans [and about 2,500 present-day humans].”
Our exercises will focus on two MesoNeo data sets:
A great example of how to approach totally unfamiliar data!