U.S. Department of Energy

Pacific Northwest National Laboratory

Algorithm Development

In pushing the boundaries of mass spectrometry instrumentation and biological inquiry, we routinely are confronted with a need for new algorithms. These algorithms cover topics like signal processing and instrument control, mass spectrometry interpretation, and biological data integration. Below is a sample of a few current efforts.

Active Data

Active Data is a visualization platform that helps people browse and explore Big Data. I am creating Active Data Biology, which is adapted to the specific needs of biological data analysis. ADBio has three visual interfaces to project data: an interactive heatmap, a pathway browser, and the Canvas. As a user explores their data, their analysis is stored and versioned at GitHub so that progress on the project is not lost. Using GitHub also allows for easy collaboration on projects. More information can be found at adbio.pnnl.gov 

Top-Down software

Top-down proteomics involves direct analysis of intact proteins, to analyze proteins in their endogenous form without proteolysis. This preserves valuable information about combinatorial post-translation modifications and endogenous proteolytic fragments. With rapid advancements in mass spectrometry instruments and experimental protocols, there are high demands for computational tools for processing top-down proteomics data. We are working on a new open source software package for top-down proteomics analysis consisting of algorithms for an LC-MS feature finding algorithm, a database search algorithm, and an interactive results viewer. In a benchmark test, we applied our software for an analysis of human-in-mouse xenograft breast cancer samples generated by the Clinical Proteomic Tumor Analysis Consortium and discovered many more differently expressed LC-MS features and proteoforms when compared to other available tools.

LIQUID

LIQUID (Lipid Informed Quantitation and Identification) is a software program that has been developed to enable users to conduct both informed and high-throughput global liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based lipidomics analysis. This newly designed desktop application can quickly identify and quantify lipids from LC-MS/MS datasets while providing a friendly graphical user interface for users to fully explore the data (Figure 1). Informed data analysis simply involves the user specifying an electrospray ionization mode, lipid common name (i.e. PC(16:0/18:1)), and associated charge carrier. The primary evidence shown is a stem plot of the MS/MS spectra including colors and labels for peaks that match to fragments of the identified lipid. A stem plot of the isotopic profile and a line plot of the extracted ion chromatogram are also provided to show the MS-level evidence of the identified lipid. In addition to plots, other information such as intensity, mass measurement error, and elution time are also provided. Typically, a global analysis for 15,000 lipid targets is executed in less than 5 seconds and evidence of each lipid is immediately displayed to the user.

| Pacific Northwest National Laboratory