Jessica Jin is a rising senior (Class of 2026) at Interlake High School in Bellevue, Washington. She is passionate about advancing patient well-being, learning from diverse perspectives, and developing practical treatment solutions, particularly for underserved communities. Her current work includes providing citizenship tutoring for immigrants and offering assistive care for seniors and hospital patients. Through the DREAM-High program, Jessica aims to broaden her impact on these communities by applying the resources and knowledge she gains from its members and mentors. During the program, she has learned to use R code notebooks to analyze breast cancer data and has connected with inspiring individuals from the Institute for Systems Biology and across the country. Outside of DREAM-High, Jessica enjoys staying active through sports, nurturing her creativity through journaling, and relaxing with music.
Through hands-on programming, DREAM-High Scholars visualize and analyze genomics, clinical, and physical data from breast cancer cells. DREAM-High is a partnership between the Columbia Center for Cancer Systems Therapeutics, the Palazzo Strozzi Foundation USA, the Stanford Center for Cancer Systems Biology, and the Institute for Systems Biology.
In the DREAM-High program, Scholars learn to program in R, a language for statistical computing and graphics. They manipulate and write code in a cloud-based RStudio environment to analyze a wide range of data on breast cancer patients and cancer cell lines.
I created heat maps as colorized representations of data matrices. I reordered features and observations so that similar entities are close to each other in the graph. Heat maps make it easy to visualize and understand complex data.
I loaded and examined a data frame of clinical information from 1,082 breast cancer patients from The Cancer Genome Atlas (TCGA). I summarized clinical measurements on both the patients, such as gender and age, and the patients’ tumors, such as estrogen receptor status and histology.
I performed an integrative analysis of clinical measurements and gene expression data for 1,082 patients in the TCGA Breast Cancer cohort. By calculating heat maps and annotating them with clinical information, I detected patterns in the patients' expression profiles across 18,351 genes that correspond to luminal and triple negative breast cancers.
I discovered biological processes that distinguish cancer cell lines based on the aggressiveness of the cancers they model. For both breast cancer and colon cancer cell lines, I calculated, visualized, and functionally annotated differential gene expression profiles with data from the Physical Sciences in Oncology Cell Line Characterization Study.
I applied Principal Components Analysis (PCA) to the NCI-60 dataset of gene expression profiles for 60 different cell lines representing a range of cancer types. PCA is a powerful dimensionality reduction technique that simplifies a complex dataset by transforming its features into a new set variables called principal components where the first few capture most of the variance in the data.
I built linear regression models that are predictive of breast cancer survival from the METABRIC breast cancer dataset. I found that gene expression profiles of certain cancer genes are predictive of prognosis. Inclusion of additional features in my model increased its explanatory power.