Description: NCSU Forestry Department Logo

Description: NCSU Logo

BIT 815, DeepSequencing Data Analysis

Course Materials, Spring 2012

Linux Command LineReference

Friday 2 March
First class, introductory slides and a summary of exercises on logging into AWS and using Linux shell commands.
Guides to setting up PuTTY for SSH on Windows, and starting a Cloudbiolinux instance on Amazon Web Services Elastic Compute Cloud (AWS-EC2).

Monday 12 March
Data formats, quality scores, quality control,error correction, and file manipulation.
Materials for class: Slides with information on shell commands and regular expressions, and exercises part 1 and part 2.
An overview of characters used in regular expressions.
A shell script to set up programs for data QC on the Cloudbiolinux instance.
A one-line command for random sampling of sequence data from FASTQ files of paired-end reads.

Friday 16 March
Quality-filtering exercise with FastQC (details) and fastx-toolkit (details) programs.
Discussion of Quake (details) and Hammer (details) error-correction programs.

Monday 19 March
Experimental design, read simulation with dwgsim, read mapping with bwa, and read assembly with velvet.
A lecture outline and exercises are here.

Friday 23 March
Comparison of requirements for assembly with ABySS and Velvet, and comparison of assembly to whole-genome alignment with MUMmer v.3 .
Exercises are here

Monday 26 March
Assembly quality metrics and Assemblathon-1: Outline and notes
SAM format definition and Tablet assembly viewer.

Friday 30 March
Lecture: Transcriptome Analysis
Here is an overview of the sample RNA-seq dataset ; and a shell script to download the data and run an analysis using RSEM software.

Monday 2 April
RNA-Seq Analysis in R using Biomart

Background materials for the R statistical environment and Bioconductor genomic analysis toolbox
      From Fred Hutchinson Cancer Research Center: Introduction to R: slides and lab exercises; an introduction to Bioconductor and to the ShortRead package.
      From UC-Riverside: A manual for R and Bioconductor

Friday 6 April        Spring holiday; no class

Monday 9 April
Shell and R scripts for RNA-Seq analysis with DESeq: shell script and R script

Friday 13 April
ChIP-seq analysis using Galaxy, a tutorial

Monday 16 April
Variant discovery using genotyping by sequencing and the TASSEL pipeline.
Analysis of a GBS dataset from the Oregon Wolfe Barley lines using a shell script to download data and run the analysis.

New stuff, not used in class:
- Documentation for UNEAK, the Universal Network-Enabled Analysis Kit for identifying allelic sequence tags in the absence of a reference genome sequence.
- Slides describing the UNEAK network filter approach to identifying SNPs

Friday 20 April
STACKS software: an overview for analysis of RAD-seq data, and a shell script to download sample data from spotted gar and run an analysis.

Monday 23 April
analysis strategy for GBS data

Friday 27 April
Analysis of GBS data with Stacks software using a shell script