- Home
- Help
- Courses and Conferences
- 2014
- Computational Statistics for Genome Biology (CSAMA)
Computational Statistics for Genome Biology (CSAMA)
Brixen-Bressanone, Italy
2014-06-22 ~ 2013-06-28
Instructors
- Martin Morgan, Fred Hutchinson Cancer Research Center (USA)
- Robert Gentleman, Genentech (USA)
- Vincent J. Carey, Channing Laboratory, Harvard Medical School (USA)
- Wolfgang Huber, European Molecular Biology Laboratory (DE)
- Simon Anders, European Molecular Biology Laboratory (DE)
- Laurent Gatto, University of Cambridge (UK)
- Michael Lawrence, Genetech (USA)
Description
This one-week intensive course teaches current approaches in the statistical and computational analysis of large-scale experiments in biology. The course focuses on the methods for downstream analyses of high-throughput sequencing experiments including RNA sequencing (differential expression), DNA sequencing (variant calling), ChIP-Seq. Lectures also cover essentials including statistical testing, linear models, machine learning, visualisation and bioinformatic annotation. Emphasis is given to practical problem solving skills using open-source software from the Bioconductor, CRAN and other projects. The course is intended for researchers who have basic familiarity with the experimental technologies and the biology of the genome, and who are interested in developing their own, advanced data analyses using a scripting environment. The four practical sessions of the course will require simple script understanding in the computer language R. A tutorial on the required more advanced features of R will be provided, students are advised to familiarize themselves with the very basics of R beforehand. (Consider one of the many online resources or books, e.g. R-Intro from the R Project, Germán Rodríguez, R-Studio.
Materials
Monday, June 22
Morning talks
- pdf Introduction to R and Bioconductor
- pdf Basics of high-throughput sequencing technologies and short read aligners
- pdf Elements of statistics 1: t-test and linear model
- pdf Elements of statistics 2: multiple testing, false discovery rates, independent filtering
Afternoon labs
- zip R introduction/refresher: data types, reading and writing files and spreadsheets, plotting, programming, functions and packages.
- pdf R Exploratory data analysis and visualization (pdf solutions)
-
html R Intermediate R 1: accessing resources - packages, classes, methods, and efficient code. Download IntermediateR1_1.0.0.tar.gz and install as:
source("http://bioconductor.org/biocLite.R") biocLite(c("IRanges", "GenomicRanges", "microbenchmark")) install.packages("IntermediateR1_1.0.0.tar.gz", repos=NULL, type="source")
-
pdf R Intermediate R 2: scalable / performant computing. (Large files needed for some of this lab are NOT available for download). Download CSAMA2014ScalableComputingLab_0.0.1.tar.gz and install as:
source("http://bioconductor.org/biocLite.R") biocLite(c("IRanges", "GenomicRanges", "Rsamtools", "ShortRead", "rtracklayer", "GenomicAlignments", "GEOquery", "microbenchmark", "BiocParallel", "ggbio", "Biobase", "GenomicFiles")) install.packages("CSAMA2014ScalableComputingLab_0.0.1.tar.gz", repos=NULL, type="source")
Tuesday
Morning talks
- pdf RNA-Seq 1: differential expression analysis - GLMs and testing
- RNA-Seq 2: shrinkage, empirical Bayes, FC estimation
- pdf Visualisation
- pdf Computing with genomic ranges, sequences and alignments
Afternoon labs
- pdf R (Rnw, bib) A complete RNA-Seq differential expression workflow DESeq2_result_table.RData Homo_sapiens.GRCh37.75.subset.gtf.gz
Wednesday
Morning talks
- pdf DNA-Seq 1: Variant calling
- pdf DNA-Seq 2: visualisation and quality assessment of variant calls
- pdf Gene set enrichment analysis
- pdf R Working with gene and genome annotations
Afternoon labs
- pdf R Variant tallies, visualisation, HDF5 ExampleData.zip NRAS.tally.hfs5
Thursday
Morning talks
- pdf RNA-Seq 3: alternative exon usage
- html Elements of statistics 3: Classification and clustering - basic concepts
- pdf Elements of statistics 4: regularisation & kernels
- pdf R ChIP-Seq
Afternoon labs
Friday
Morning talks
- pdf Elements of statistics 5: experimental design
- pdf eQTL / molecular-QTL analyses
- pdf Proteomics
- Emerging topic – pdf image analysis
Afternoon labs
- Reporting your analysis - authoring knitr/Rmarkdown, ReportingTools, shiny. rauthoring_0.2.2.tar.gz