Materials for CLASS04

[Back to Course Homepage]

Workshop 4 - GENE SET ENRICHMENT ANALYSIS (GSEA)

GSEA User Guide Link

GSEA Homepage Link

Practical

  • Download the Leukemia example here

    1. Open a GSEA GUI session by clicking the GSEA icon.

    2. Click Load data on the Left Panel (Steps in GSEA Analysis).

    3. Load data using Method 1: Browse for files ...

    4. Locate the Leukemia_collapsed_symbols.gct and Leukemia.cls and c2.v1.symbols.gmt in your computer.

    5. Click "Choose".

    6. Click Run GSEA on the Left Panel (Steps in GSEA Analysis).

    7. Under Required fields: Select the Leukemia_collapsed_symbols.gct in Expression dataset.

    8. Select Gene sets database.

    9. In the pop-up window, select c2.v1.symbols.gmt under Gene matrix(local gmx/gmt). Click OK.

    10. Select 500 in the Number of permutations.

    11. Select Leukemia.clsin the Phenotype labels.

    12. Select ALL_versus_AML then click OK.

    13. Select false in the Collapse dataset to gene symbols. This is because the dataset is already collapsed.

    14. Select phenotype in the Permutation type.

    15. Leave the Chip platform(s) blank as the dataset is already collapsed.

    16. Click to expand Basic fields.

    17. Type in the Analysis name.

    18. Left all parameters as default.

    19. Select 500 in the Max size:exclude larger sets.

    20. Select 10 in the Min size:exclude smaller sets.

    21. Select the directory to save results in Save results in this folder.

    22. Click to expand Advanced fields.

    23. Leave all parameters as default.

    24. Select 100 in Plot graphs for the top sets of each phenotype.

    25. At the bottom, select Normal (cpu usage).

    26. Click Run.

    27. Check status at Bottom Left panel under GSEA reports.

    28. Click the Status column when GSEA is done.

    29. Browse the GSEA results as HTML.

    30. Browse the GSEA results in folder.

    31. Click Leading edge analysis on the Left Panel (Steps in GSEA analysis).

    32. Load the GSEA results by Select a GSEA result from the application cache section.

    33. Click Load GSEA Results.

    34. Sort the gene sets by NOM p-value. This can be done by clicking the NOM p-value bar.

    35. Select gene sets with NOM p-value < 0.05.

    36. At the bottom right, click Run leading edge analysis.

    37. Browse the Leading edge analysis results.

    Assignment #4

    Now, use GSEA to identify differentially expressed gene sets in the KRAS-dependency microarray gene expression data. The gene expression data is available here. This experiment is the same as Assignment #2 and #3, where eight colorectal cancer cell lines were profiled. Four KRAS-dependent lines (SK-CO-1, SW620, SW1116 and RCM-1) and four KRAS-independnet lines (LS-174T, SW837, SW1463 and SW948). Use the C2 gene sets in the analysis.

    Download the Assignment #4 Data here.

    Use HT-HGU133A.chip to collapse dataset to gene symbols. Download the chip file here.

    Use C2 gene set file in MSigDB.

    Your tasks are:

    1. Create the .gct and .cls for the TRAIN.probesets.txt.
      [You can create in Excel] or
      [You can use cut and paste in the commands]

    2. Run GSEA to identify enriched gene sets in KRAS-DEP and KRAS-IND using C2 gene sets.
    3. List out the gene sets in each class with p<0.05 (Nominal p-value).
    4. Use Leading Edge Analysis, and plot the heatmap of the leading edge genes.
    5. Find drugs for KRAS-DEP and KRAS-IND using DSigDB D2 gene sets.