Assignment for CLASS08

[Back to Course Homepage]

Assignment #8

From the Cancer Cell Line Encyclopedia (CCLE) data set, identify the gene signature (differentially expressed genes) and pathways (using GSEA of KEGG gene sets) for Erlotinib (Tarceva, EGFR inhibitor) in lung cancer.

  • Use this erlotinib sensitivity data [Erlotinib sensitivity data] [Green = sensitive, Red = resistant]

    Your tasks are:

    1. Download lung cancer cell lines raw microarray gene expression data (.CEL) files from the CCLE website (or NCBI GEO, GSE36133).
    2. OR Download all CCLE lung cancer cell lines here (.CEL) [DOWNLOAD]
    3. Run APT and use RMA to normalize the cell lines microarray profiles.
    4. Run SAM to identify differentially expressed genes (use FDR = 10%).
    5. Run GSEA to identify pathways that correlated with drug sensitivity and resistance.

  • Lung lines: [Here]
  • Perl code: PERL CODE. This code is to assign the files in the input to a list of output files.
    To run the code:

    perl < lung_cel_list.txt > movefile

  • Change mode of "movefile" so that you can run as command line:

    chmod 777 movefile


  • Next, create a LUNG Folder and move all the lung CEL files into that Folder

    #create a lung folder

    mkdir LUNG_FOLDER

    #move all the *_LUNG.CEL into LUNG_FOLDER


  • Run your RMA in the LUNG_FOLDER (see WORKSHOP 05 and Assignment 5)

    apt-probeset-summarize -a rma -d CDF_FILE -o OUTPUT_DIR *.CEL

  • Define Sensitive and Resistant Lines (use this erlotinib sensitivity file) and perform SAM analysis. (see WORKSHOP 03 or Assignments 3 and 5).

  • Perform GSEA analysis to identify enriched pathways in Erlotinib Sensitive and Resistant lines (see WORKSHOP 04 or Assignments 4 and 6).

    Send me the list of SAM results (up and down genes), and GSEA results (up and down pathways).