R Script: sequenceDifferentialExperssion (RNA-seq)
Approx. CPU requirement: minutes
Description:
| sequenceDifferentialExpression | R Documentation |
RNA-Seq Differential Expression
Description
This script performs a two-sample test for RNA-seq differential expression. It is based on the work flow and assumptions of the DESeq package, using specific methods for between-sample normalization, variance stabilization, and differential expression. Details are available in the DESeq package vignette, http://bioconductor.org/packages/release/bioc/html/DESeq.html.
Usage
sequenceDifferentialExpression((dataset, condition1, condition2)
Arguments
dataset |
A tab delimited text file of sample and count data.
The first row contains information describing the condition to which each sample is classified. The first entry (corresponding to the column described below containing the region of interest) is empty. There must be exactly two unique conditions across all samples. The second row contains information about sample names. The first cell of the second row applies to the feature name column, and is ignored. The remaining columns contain sample names, and must be distinct. The third and subsequent rows contain a name identifiying the region to which counts apply (e.g., gene names), followed by counts of reads found in each sample. There can be no missing values. All region names must be distinct. An example of this file is the ‘SequenceDifferentialExpressionInput.tab’ file provided. |
condition1, condition2 |
These parameters determine the direction of the comparsion – condition 2 versus condition 1. For the sample dataset provided, set these values to "Normal" and "Treatment". |
Value
A topTable summarizing count differences, and ordered according to adjusted P value. The columns of the top table are:
Id |
The region of interest identifier from the input data. |
NormalizedMean |
The mean count across all samples, after normalizing so that each sample is approximately equally represented. |
NormalizedMean.Condition1 |
The mean of the counts divided by the size factors for the counts for condition 1 |
NormalizedMean.Condition2 |
The mean of the counts divided by the size factors for the counts for condition 2 |
log2FoldChange |
The log2 fold change, condition 2 versus condition 1 |
PValue |
The P value for rejecting the null hypothesis ‘mean of condition 1 is equal to mean of condition 2’ |
AdjustedPValue |
The P values adjusted for false discovery using the Benjamini-Hochberg algorithm. |
Author(s)
Martin Morgan <mtmorgan@fhcrc.org>
Examples
topTable <-
sequenceDifferentialExpression(dataset, condition1, condition2)
volcano <-
xyplot(-log10(AdjustedPValue)~Log2FoldChange, topTable,
type=c("g", "p"),
xlab="Log2 Fold Change, Condition2 vs Condition1",
ylab="-Log10 Adjusted (Benjamini-Hochberg) P value")
volcano
xtab <- xtabs(~., topTable[,3:4]!=0)
xtab
Source code:
1 version <- '0.0.6' 2 #<crdata_text>Script version: </crdata_text> 3 #<crdata_object>version</crdata_object> 4 CRDATA_SOURCE_R_FILES <- TRUE 5 if (interactive() || 6 (exists("CRDATA_SOURCE_R_FILES") && get("CRDATA_SOURCE_R_FILES"))) 7 { 8 library(DESeq) 9 library(GenomicFeatures) 10 library(GenomicRanges) 11 library(IRanges) 12 } 13 14 sequenceDifferentialExpression <- 15 function(dataset, condition1, condition2) 16 { 17 conn <- file(dataset, "r") 18 conditions <- scan(conn, "character", nlines=1) 19 counts <- read.delim(conn, header=TRUE, stringsAsFactors=TRUE, 20 row.names=1) 21 close(conn) 22 23 cds <- newCountDataSet( counts, conditions ) 24 cds <- estimateSizeFactors( cds ) 25 cds <- estimateVarianceFunctions( cds ) 26 topTable <- nbinomTest( cds, condition1, condition2)[,c(1:4, 6:8)] 27 names(topTable) <- 28 c("Id", "NormalizedMean", 29 "NormalizedMean.Condition1", "NormalizedMean.Condition2", 30 "Log2FoldChange", "PValue", "AdjustedPValue") 31 data.frame(topTable[order(topTable$AdjustedPValue),], row.names=NULL) 32 } 33 34 sequenceCountsPerTranscript <- 35 function(bamFiles, genome, tableName, 36 transcriptSequenceNames, 37 bamSequenceNames) 38 { 39 if (missing(genome) || 40 (length(genome) == 1 && !nzchar(genome))) { 41 require(rtracklayer) 42 return(list(genome=ucscGenomes(), 43 supportedTables=supportedUCSCtables())) 44 } 45 nameMap <- transcriptSequenceNames 46 names(nameMap) <- bamSequenceNames 47 txdb <- 48 makeTranscriptDbFromUCSC(genome=genome, tablename=tableName) 49 tx <- transcripts(txdb) 50 txLevels <- levels(seqnames(tx)) 51 if (!all(idx <- nameMap %in% txLevels)) 52 stop("'transcriptSequenceNames' do not exist in transcript list: '", 53 paste(nameMap[!idx], collapse="' '"), "'") 54 for (bamFile in bamFiles) { 55 bam <- readGappedAlignments(bamFile) 56 idx <- match(rname(bam), names(nameMap), nomatch=0L) >= 0L 57 bam <- bam[idx,] 58 rname(bam) <- factor(nameMap[as.character(rname(bam))], 59 levels=nameMap) 60 values(tx)[[basename(bamFile)]] <- countOverlaps(tx, bam) 61 } 62 df <- as.data.frame(tx) 63 df$Total <- rowSums(df[,8:ncol(df), drop=FALSE]) 64 df 65 } 66 #<crdata_title>Sequence-based Differential Expression</crdata_title> 67 68 #<crdata_text>Supplied parameters:</crdata_text> 69 70 #<crdata_text>dataset: </crdata_text> 71 #<crdata_object>dataset</crdata_object> 72 73 #<crdata_text>condition1: </crdata_text> 74 #<crdata_object>condition1</crdata_object> 75 76 #<crdata_text>condition2: </crdata_text> 77 #<crdata_object>condition2</crdata_object> 78 79 topTable <- 80 sequenceDifferentialExpression(dataset, condition1, condition2) 81 82 #<crdata_text>The figure below shows the relationship between log fold change (on the X axis) and Benjamini-Hochberg adjusted (i.e., for false discovery rate) P values </crdata_text> 83 84 #volcano 85 library(lattice) 86 #<crdata_image> 87 print(xyplot(-log10(AdjustedPValue)~Log2FoldChange, topTable, type=c("g", "p"), 88 xlab=sprintf("Log2 Fold Change, '%s' vs '%s'", condition2, condition1), 89 ylab="-Log10 Adjusted (Benjamini-Hochberg) P value")) 90 #</crdata_image> 91 92 xtab <- xtabs(~., topTable[,3:4]!=0) 93 94 #<crdata_text>The following table summarizes the number of tags unique to each sample, e.g., the cell corresponding to the row labeled 'TRUE' and the column labelled 'FALSE' represent the regions present only in condition1</crdata_text> 95 96 #<crdata_object>xtab</crdata_object> 97 98 #<crdata_text>Here we save the table to a file 'job-JOBID-SequenceDifferentialExpression-TopTable.txt'</crdata_text> 99 write.table(topTable, file="SequenceDifferentialExpression-TopTable.txt")
Back