U.S. Department of Energy

Pacific Northwest National Laboratory

Aiding diagnosis of rare disease: applications of mass spectrometry-based metabolomics in the Undiagnosed Diseases Network

Introduction 

In the U.S., 6% of the general population suffers from a rare disorder that has evaded diagnosis, defined by U.S. law as one that affects <200,000 individuals. The goals of the NIH Undiagnosed Diseases Network (UDN) include improving the level of diagnosis and care for patients and facilitating research into the etiology of undiagnosed diseases. As the UDN Metabolomics Core, we are performing MS-based metabolomics and lipidomics analyses of plasma, cerebrospinal fluid (CSF) and urine from patients and first degree relatives, as well as disease models in organisms such as Drosophila and zebrafish. These data are being compared against similar metabolic profiles from healthy individuals and in integrative analyses together with results from patient gene sequencing.

Methods 

Study design is a major challenge in omics analyses of undiagnosed diseases, which may affect only one or a few patients. To allow for proper statistical analyses of UDN patient data, we generated reference datasets from analyses of plasma, CSF, and urine from >391 individuals with no known metabolic disease and representative of the demographics of the UDN (predominately < 18 yrs, ~50% female, and majority Caucasian), all driven by power analyses of historical data from our previous work. Metabolomics analyses were performed using GC-MS, with metabolite identification via an in-house modified version of FiehnLib. Lipidomics analyses were performed using LC-MS/MS, with lipid identification via the in-house tool LIQUID. Metabolomics data were integrated with gene sequencing data using STITCH.

 

Preliminary data

To date, we have performed >1500 untargeted metabolomics and lipidomics analyses of plasma, CSF, and urine samples from individuals with no known metabolic disease in order to create reference databases against which data from UDN patients can be compared. These analyses have resulted in reference datasets containing >300 identified and unidentified metabolites and >500 identified lipids as a community resource. Using data from metabolomics analyses of 139 quality control (QC) samples distributed among 201 samples (340 total analyses), the median coefficient of variation (CV) in the measurement of 180 metabolites was 27%. Similarly, using data from lipidomics analyses of 100 QC samples distributed among 197 samples (297 total analyses), the median CV in the measurement of 361 lipids was 10%. A total of 145 samples (83 plasma, 57 urine, and 5 CSF) from 83 UDN patients and first degree relatives have been analyzed, and the resulting data compared to the appropriate reference datasets in order to identify outlier metabolites and lipids.  This data has subsequently been used to identify the metabolic pathways that have been affected by the underlying disease processes. In a proof-of-principle analysis, metabolomics data and gene sequencing results from 18 UDN cases were integrated using STITCH (Search Tool for Interactions of Chemicals; stitch.embl.de). From a list of several candidate gene variants, STITCH was able to identify a single gene/protein as a key interactor with metabolic outliers and their partners, based on a protein-chemical database and random decision forests (e.g. p-values for protein-protein interactions). The biological relevance of genes of clinical interest will be subsequently evaluated by functional assays.

Novel aspect 

The first integrated application of comprehensive MS-based metabolomics and lipidomics analyses with gene sequencing for the evaluation of undiagnosed disease.

| Pacific Northwest National Laboratory