Linux Command LineReference
Friday 2 March
First class, introductory slides and a summary of exercises on logging into AWS and using Linux shell commands.
Guides to setting up PuTTY for SSH on Windows, and starting a Cloudbiolinux instance on Amazon Web Services Elastic Compute Cloud (AWS-EC2).
Monday 12 March
Data formats, quality scores, quality control,error correction, and file manipulation.
Materials for class: Slides with information on shell commands and regular expressions, and exercises part 1 and part 2.
An overview of characters used in regular expressions.
A shell script to set up programs for data QC on the Cloudbiolinux instance.
A one-line command for random sampling of sequence data from FASTQ files of paired-end reads.
Friday 16 March
Quality-filtering exercise with FastQC (details) and fastx-toolkit (details) programs.
Discussion of Quake (details) and Hammer (details) error-correction programs.
Monday 19 March
Experimental design, read simulation with dwgsim,
read mapping with bwa, and read assembly with velvet.
A lecture outline and exercises are here.
Friday 23 March
Comparison of requirements for assembly with ABySS and Velvet, and comparison of assembly to whole-genome alignment with MUMmer v.3 .
Exercises are here
Monday 26 March
Assembly quality metrics and Assemblathon-1: Outline and notes
SAM format definition and Tablet assembly viewer.
Friday 30 March
Lecture: Transcriptome Analysis
Here is an overview of the sample RNA-seq dataset ; and a shell script to download the data and run an analysis using RSEM software.
Monday 2 April
RNA-Seq Analysis in R using Biomart
Friday 6 April Spring holiday; no class
Monday 9 April
Shell and R scripts for RNA-Seq analysis with DESeq: shell script and R script
Friday 13 April
ChIP-seq analysis using Galaxy, a tutorial
Monday 16 April
Variant discovery using genotyping by sequencing and the TASSEL pipeline.
Analysis of a GBS dataset from the Oregon Wolfe Barley lines using a shell script to download data and run the analysis.
New stuff, not used in class:
- Documentation for UNEAK, the Universal Network-Enabled Analysis Kit for identifying allelic sequence tags in the absence of a reference genome sequence.
- Slides describing the UNEAK network filter approach to identifying SNPs
Friday 20 April
STACKS software: an overview for analysis of RAD-seq data, and a shell script to download sample data from spotted gar and run an analysis.
Monday 23 April
analysis strategy for GBS data
Friday 27 April
Analysis of GBS data with Stacks software using a shell script