Materials for CLASS03

[Back to Course Homepage]

Workshop 3 - SIGNIFICANCE ANALYSIS OF MICROARRAYS (SAM)

Practical

  • Download related R script here.
  • Download example gene expression data (AC.txt) here.
  • Download class label files CLASS1 and CLASS2.

    1. Open a R session by clicking the R icon.

    2. Install Bioconductor in R. (The laptop in the classroom already installed Bioconductor).

      #install bioconductor from R
      source("http://bioconductor.org/biocLite.R")

    3. Install affy package in R via Bioconductor. (The laptop in the classroom already installed Bioconductor).

      #install affy package
      biocLite("affy")

    4. Install samr package in R via Bioconductor. (The laptop in the classroom already installed Bioconductor).

      #install samr package
      biocLite("samr")

    5. Load the affy package.

      #loading affy library
      library(affy)

    6. Significance Analysis of Microarrays (SAM) in R is implemented as samr package. Load the samr package.

      #loading samr library
      library(samr)

    7. Create a temporary file exprsFile to hold gene expression.

      #create a temporary file
      exprsFile <- tempfile()

    8. Read in class 1 samples.

      #reading class 1 samples from file "CLASS1"
      file1 <- read.table("CLASS1", sep = "\t")

    9. Check file1.

      #check what is being read in as file1
      head(file1)

    10. Read in class 2 samples.

      #reading class 2 samples from file "CLASS2"
      file2 <- read.table("CLASS2", sep = "\t")

    11. Check file2.

      #check what is being read in as file2
      head(file2)

    12. Assign class 1 samples as "1" and class 2 samples as "2" and store these values in Response.

      #Assign class labels to the column in sequential order accoriding to the number of rows in file1 and file2.
      #samples in file1 will get the label "1"
      #samples in file2 will get the label "2"
      Response <- c(rep(1,nrow(file1)), rep(2,nrow(file2)))

    13. Check Response.

      #check what is being assigned as Response
      Response

    14. Read in gene expression file AC.txt and assign it as a matrix in file.

      #Read in gene expression profiles from file known as "AC.txt" and assign it to file
      file <- as.matrix(read.table("AC.txt", header = TRUE, row.names = 1))

    15. Check the file.

      #check what is being read in as file
      head(file)

    16. Create exprsFile

      #write "file" to "exprsFile" to create a R object
      write.table(file, exprsFile, quote=F, sep="\t", row.names=TRUE, col.names=TRUE)

    17. Create an affy gene expression object to be used in samr.

      #create an affy expression object to be used in samr
      collapsed <- readExpressionSet(exprsFile, sep="\t", annotation = "hgu133plus2")

    18. Check what is in the R object collapsed.

      #check the "collapsed" object
      collapsed

    19. Create the samr data file sam.data as a list.

      #create the samr data file
      #x = gene expression profiles obtained from exprs(collapsed)
      #y = class label obtained from "Response"
      #geneid = first column from exprs(collapsed)
      #genenames = row names from exprs(collapsed)
      #logged2 = is the gene expression in log2 format? TRUE or FALSE
      sam.data <- list(x=exprs(collapsed), y=Response, geneid = row.names(exprs(collapsed)), genenames = row.names(exprs(collapsed)), logged2=TRUE)

    20. Check the list of data stored in sam.data.

      #check what is being read in as sam.data
      sam.data

      #you will see five lists
      #x, y, geneid, genenames, logged2

    21. Run samr using data in the sam.data.

      #Run samr using data in sam.data
      #resp.type = "Two class unpaired"
      #nperms = number of permutations
      #random.seed = set a random number as seed
      samr.obj <- samr(sam.data, resp.type = "Two class unpaired", nperms = 100, random.seed = 12345)

    22. Compute the delta values and store in delta.table.

      #Compute the delta values and store in delta.table
      delta.table <- samr.compute.delta.table(samr.obj)

    23. Check the delta.table.

      #Print out delta.table
      delta.table

    24. Select a FDR cut-off and assign the corresponding delta values to delta.

      #select a FDR cut-off and assign the corresponding delta values to delta (Example here, delta = 0.05)
      delta <- 0.05

    25. Plot the SAM graph using the cut-off with the delta value using samr.plot.

      #Plot the graph using the cut-off with the delta value
      samr.plot(samr.obj,delta)

      You can plot SAM graph using different delta values, and see the effects on the "dashed lines".

    26. List out the significant genes at the cut-off delta value using samr.compute.siggenes.table and write in siggenes.table.

      #List out the significant genes at the cut-off delta in siggenes.table
      siggenes.table <- samr.compute.siggenes.table(samr.obj, delta, sam.data, delta.table)

    27. Examine the significant genes in siggenes.table.

      #examine the significant genes in siggenes.table
      siggenes.table

    28. Assign the list of up-regulated genes as myupgenes.

      #assign up genes to myupgenes
      myupgenes <- siggenes.table$genes.up

    29. Check what is in myupgenes.

      #check what is in myupgenes
      myupgenes

    30. Print the list of myupgenes into a tab delimited file myupgenes.txt.

      #print myupgenes into a file
      write.table(myupgenes, file="myupgenes.txt", sep="\t")

    31. Assign the list of up-regulated genes as mydowngenes.

      #assign down genes to mydowngenes
      mydowngenes <- siggenes.table$genes.lo

    32. Check what is in mydowngenes.

      #check what is in mydowngenes
      mydowngenes

    33. Print the list of mydowngenes into a tab delimited file mydowngenes.txt.

      #print mydowngenes into a file
      write.table(mydowngenes, file="mydowngenes.txt", sep="\t")

    Related R script

  • Download related R script here.

    Assignment #3

    Now, use SAM to identify differentially expressed genes in the KRAS-dependency microarray gene expression data. The gene expression data is available here. This experiment is the same as Assignment #2, where eight colorectal cancer cell lines were profiled. Four KRAS-dependent lines (SK-CO-1, SW620, SW1116 and RCM-1) and four KRAS-independnet lines (LS-174T, SW837, SW1463 and SW948). You can use the same sample files CLASS 2 to represent KRAS-dependent lines and CLASS1 to represent KRAS-indepedent lines.

    Your tasks are:

    1. Modify the R script to read in these files.
    2. List out the significant genes at FDR = 0.05, find out the delta value.
    3. Plot the SAM graph at this delta value.
    4. How many up-regulated genes? Print out the up-regulated genes.
    5. How many down-regulated genes? Print out the down-regulated genes.
    6. Compare your gene list with the published gene list. How many genes are overlap?