U.S. Department of Energy

Pacific Northwest National Laboratory


All of our open source software is cross-posted at our group's GitHub Repository.

Software Category: Featured Tools

  • Used to de-isotope mass spectra and to detect features from mass spectrometry data using observed isotopic signatures.

  • Reduces mass measurement errors for parent ions of tandem MS/MS data by modeling systematic errors based on putative peptide identifications. This information is used to subtract out errors from parent ion protonated masses.

  • InfernoRDN can perform various downstream data analysis, data reduction, and data comparison tasks including normalization, hypothesis testing, clustering, and heatmap generation.

  • Aligns multiple LC-MS datasets to one another after which LC-MS features can be matched to a database of peptides (typically an AMT tag database)

  • VIPER (Visual Inspection of Peak/Elution Relationships) can be used to visualize and characterize the features detected during LC-MS analyses.

Software Category: Data Analysis and Data Presentation Tools

  • Active Data Canvas is a web-based visual analytic tool to visualize data matrix (expression matrix) and for users to interactively identify the structured domain knowledges (e.g., pathways and other genesets) linked to a cluster.

  • The Residue Frequency Summarizer is a VB.NET command-line utility that reads in a text file or fasta file containing peptide or protein sequences and prepares statistics on the occurrence of each amino acid residue throughout the file. Statistics include the number of sequences containing each amino acid and the occurrence percentage across all residues for each amino acid.

  • DanteR is an entirely R-based program that provides a graphical front-end for common data analysis tasks in "omics", with an emphasis on proteomics. It is the successor to DAnTE, providing all of the previous features plus new functionality, including the imputation algorithm described in "A statistical framework for protein quantitation in bottom-up MS-based proteomics." by Karpievitch and Dabney (DOI 10.1093/bioinformatics/btp362).

    IMPORTANT: Development of this program is frozen since it has been superseded by InfernoRDN, available on the InfernoRDN page or on GitHub.

    This version is still made available because it implements the imputation algorithm described above (see "Model Based Filter/Impute/ANOVA" under the Statistics menu).

  • Windows graphical user interface tool for viewing LC-MS data and identifications.

  • The ListPOR program (List Parser for Outlier Removal) can be used to read a file containing columns of grouped values and remove outlier values using Grubb's test. 

  • The Protein Coverage Summarizer can be used to determine the percent of the residues in each protein sequence that have been identified.

  • A high-performance multiprocessor implementation of the NCBI BLAST library.

  • Draws correctly proportioned and positioned two and three circle Venn diagrams (aka Euler diagrams) whose colors can be customized and the diagrams copied to the clipboard or saved to disk.

  • The Visual Integration for Bayesian Evaluation (VIBE) software is a visualization tool that allows the user to observe classification accuracies at the class level and evaluate classification accuracies on any subset of available data types based on the posterior probability models defined for the individual and integrated data.

Software Category: Fasta File, Protein Sequence, or Protein Database Related tools

  • Console application that reads a protein FASTA file and splits it apart into a number of sections. Although the splitting is random, each section will have a nearly identical number of residues.

  • A Population Variation plug-in for the Skyline software program that can assist researchers in determining whether their target peptides have known mutations in the general human population.

  • The Protein Digestion Simulator can be used to read a text file containing protein or peptide sequences (FASTA format or delimited text) then output the data to a tab-delimited file.

  • The Protein Sequence Motif Extractor reads a fasta file or tab delimited file containing protein sequences, then looks for the specified motif in each protein sequence.

  • The Uniprot DAT File Parser can read a Uniprot .Dat file and parse out the information for each entry, creating a series of tab delimited text files or creating a FASTA file.

Software Category: MS Analysis Tools

  • Formularity is software for assignment of low weight molecular formula from high-resolution mass spectra.

  • Finds peaks in raw mass spectra. Capable of full waveform generation, automated mass spectra interpretation and database searching integration of FASTA or GenBank files.

  • This software can be used to generate an Accurate Mass and Time tag database (Microsoft Access format) from local MS/MS search engine results from either SEQUEST or X!Tandem.

Software Category: MS/MS Analysis Tools

  • DeconMSn creates spectrum files for tandem mass spectrometry data.

  • GlyQ-IQ is software that performs a targeted, chromatographic centric search of mass spectral data for glycans. The software uses a list of glycan targets to search for expected features in MS1 spectra. Features are characterized by monoisotopic mass, elution time, and isotopic fit score.  Features are annotated by glycan family relationships and in-source fragmentation patterns.

    Note: GlyQ-IQ is provided on an as-is basis and is no longer supported.  Source code is on GitHub at https://github.com/PNNL-Comp-Mass-Spec/GlyQ-IQ

  • MASIC (MS/MS Automated Selected Ion Chromatogram generator) Generates selected ion chromatograms (SICs) for all of the parent ions chosen for fragmentation in an LC-MS/MS analysis.

  • Reads the contents of a tab-delimited peptide hit results file (e.g. from Sequest, XTandem, Inspect, or MSGF+) and merges that information with the corresponding MASIC results files, appending the relevant MASIC stats for each peptide hit result.

  • MS-GF+ (aka MSGF+ or MSGFPlus) performs peptide identification by scoring MS/MS spectra against peptides derived from a protein sequence database. It supports the HUPO PSI standard input file (mzML) and saves results in the mzIdentML format, though results can easily be transformed to TSV. ProteomeXchange supports Complete data submissions using MS-GF+ search results.

  • DataProcessing toolbox for running MSGF+ and MASIC, then merging the results. Uses Windows batch files to automate the process for a folder of Thermo .Raw files

  • MSPathFinder is a database search engine for top-down proteomics, part of the Informed Proteomics package.

  • MZRefinery is a software tool for correcting systematic mass error biases in mass spectrometry data files. The software uses confident peptide spectrum matches from MSGF+ to evaluate three different calibration methods, then chooses the optimal transform function to remove systematic bias, typically resulting in a mass measurement error histogram centered at 0 ppm. MzRefinery is part of the ProteoWizard package (in the msconvert.exe tool) and it thus can read and write a wide variety of file formats.

    Download ProteoWizard from http://proteowizard.sourceforge.net/downloads.shtml

    See below for a command line utility for generating plots of the mass measurement errors before and after correction.

    For more information on the algorithms employed by mzRefinery, see also http://www.ncbi.nlm.nih.gov/pubmed/26243018

  • PE-MMR can be used to create a MGF file with refined parent ion masses and charges, which can lead to more accurate search results from MS/MS spectra.

  • Converts a MSGF+ TSV file, X!Tandem results file (XML format), or a SEQUEST Synopsis/First Hits file to a series of tab-delimited text files summarizing the results.

Software Category: MS Data File Utilities

  • Command-line utility that reads in a _Dta.txt file and creates the equivalent Mascot Generic Format (MGF) file. _Dta.txt files are large text files that contain numerous .Dta files, all concatenated together.

  • The Concatenated Text File Splitter can be used to split apart the concatenated file to re-create the individual text files (creating one file per spectrum). This is necessary if you wish to re-search the data with SEQUEST (which reads individual .Dta files).

  • The Flexible File Sort Utility is a command line application that sorts a text file alphabetically (forward or reverse).
    It supports both in-memory sorts for smaller files and use of temporary swap files for large files.
    It can alternatively sort on a column in a tab-delimited or comma-separated file.
    The column sort mode also supports numeric sorting.

  • The MS File Info Scanner can be used to scan a series of MS data files (or data folders) and extract the acquisition start and end times, number of spectra, and the total size of the data.

  • Utility for converting ontology OBO files to a tab-delimited text file

  • Ion mobility mass spectrometry is an emerging method of molecular characterization (reference publications here). We present here a software tool for applying multidimensional smoothing and filtering of raw data from ion mobility-mass spectrometry analyses. Functionality to repair saturated peaks coming soon.

  • The Thermo Raw File Reader is a .NET DLL that demonstrates how to read Thermo-Finnigan .Raw files using Thermo's MS File Reader.

Software Category: Mass Spectrometry Auxiliary Tools

Software Category: Tutorials

  • This topic provides a basic introduction to using software tools that do not have a graphical user interface (GUI) and instead can only be used at the Windows Command Prompt.


| Pacific Northwest National Laboratory