Assignment #8
From the Cancer Cell Line Encyclopedia (CCLE) data set, identify the gene signature (differentially expressed genes) and pathways (using GSEA of KEGG gene sets) for Erlotinib (Tarceva, EGFR inhibitor) in lung cancer.
Use this erlotinib sensitivity data [Erlotinib sensitivity data] [Green = sensitive, Red = resistant]
Your tasks are:
- Download lung cancer cell lines raw microarray gene expression data (.CEL) files from the CCLE website (or NCBI GEO, GSE36133).
- OR Download all CCLE lung cancer cell lines here (.CEL) [DOWNLOAD]
- Run APT and use RMA to normalize the cell lines microarray profiles.
- Run SAM to identify differentially expressed genes (use FDR = 10%).
- Run GSEA to identify pathways that correlated with drug sensitivity and resistance.
TIPS:
Lung lines: [Here]
Perl code: PERL CODE. This code is to assign the files in the input to a list of output files.
To run the code:
perl replace.cel.pl < lung_cel_list.txt > movefile
Change mode of "movefile" so that you can run as command line:
chmod 777 movefile
./movefile
Next, create a LUNG Folder and move all the lung CEL files into that Folder
#create a lung folder
mkdir LUNG_FOLDER
#move all the *_LUNG.CEL into LUNG_FOLDER
mv *_LUNG.CEL LUNG_FOLDER
Run your RMA in the LUNG_FOLDER (see WORKSHOP 05 and Assignment 5)
apt-probeset-summarize -a rma -d CDF_FILE -o OUTPUT_DIR *.CEL
Define Sensitive and Resistant Lines (use this erlotinib sensitivity file) and perform SAM analysis. (see WORKSHOP 03 or Assignments 3 and 5).
Perform GSEA analysis to identify enriched pathways in Erlotinib Sensitive and Resistant lines (see WORKSHOP 04 or Assignments 4 and 6).
Send me the list of SAM results (up and down genes), and GSEA results (up and down pathways).