At EGA we stand for efficient and secure sharing of human genomic data. Data upcycling can create some virtuous paths with an amazing potential to benefit science and to translate in medical advances.


In 2012 a collaborative effort headed by the teams of Carlos Caldas in Cambridge, UK and Samuel Aparicio, Canada, produced a pivotal study aimed at defining the mutational landscape of breast cancer. They gathered a collection of over 2000 breast cancer samples (METABRIC) associated with long-term clinical follow-up. They delineated an array of inherited and somatically acquired genetic variants and associated them to gene expression. This led to the identification of putative cancer-driving events and improved stratification of breast cancer patients. The study elicited huge attention and was tackled as herculean effort by fellow researchers. Cristina Curtis and her colleagues deposited the data in the EGA archive (EGAS00000000083). What they probably could not foresee is that their data would be reused more than 100 times, becoming one of the most downloaded datasets at EGA.

In 2015, Frederik S. Varn and colleagues accessed the above-mentioned data from the New Hampshire, USA, to correlate it to transcriptional programs of hematopoietic cells infiltrated in breast tumours and found that those can influence patients’ prognosis. Their findings, published in Nature Communications, improved the understanding of the interplay between immune system and cancer.

Again, in 2017 the dataset was downloaded by an Italian/Polish collaboration and used to corroborate their mechanistic study that demonstrate how a long non-coding RNA (lncRNA) is aberrantly localized into the chromatin, where it modulates oncogenic isoforms expression, thus contributing to breast cancer.

To date, the METABRIC dataset has been reused to contribute to 147 publications. Its impact on breast cancer research is incalculable, and it even spread beyond that, with recent contributions to the understanding of pan-cancer mechanisms and features. Some of the publications are translating into improvements in diagnosis and treatment of different types of breast cancer.

This is just an example of how data upcycling can amplify the potential of any dataset, well past the scope of its creation, the imagination of its owners and any geographical border. At EGA, we are proud to empower such fruitful worldwide cycles of knowledge, providing a platform that enables safe sharing of sensitive genetic data.