Protein Digestion Simulator
The Protein Digestion Simulator can be used to read a text file containing protein or peptide sequences (FASTA format or delimited text) then output the data to a tab-delimited file. It can optionally digest the input sequences using trypsin, partial trypsin rules, or various other enzymes. The digested peptides will also have predicted normalized elution time (NET) values computed for them.
The Fasta File Validation module can be used to validate a FASTA file, testing it against a set of rules that identify common formatting errors. You can optionally create a fixed Fasta file, where various protein naming issues are fixed. By default, long protein names will be shortened and invalid residues will be removed. The processing will also look for proteins with duplicate sequences or duplicate names and can optionally remove the duplicate proteins when creating the fixed fasta file. See also the Validate Fasta File program, which is a command-line utility program that allows you to validate fasta files from the command line
An advanced feature of the the Protein Digestion Simulator is the ability to calculate the number of uniquely identifiable peptides within an input file using only mass, or both mass and NET, given user-defined tolerances (see Peptide Uniqueness Options below).
The methods embodied in this software to derive the Kangas/Petritis retention time prediction values are covered by U.S. patent 7,136,759 and pending patent 2005-0267688A1. The software is made available solely for non-commercial research purposes on an "as is" basis by Battelle Memorial Institute. If rights to deploy and distribute the code for commercial purposes are of interest, please e-mail Bruce Harrer.
|Download Software Tool||Download Source Code|
|Version||v2.2.5053||Requirements||Microsoft NET Framework 4.0|
|Date Updated||November 1, 2013||File Size (Software Tool)||567 KB (ZIP)|
|Registration Required||No||File size (Source Code)||3.1 MB (ZIP)|
|Comments||See the complete Revision History for a history of changes|
Protein Digestion Simulator Feature Tour
File Format Options
|Can read a FASTA file or delimited text file containing protein or peptide sequences to output the data to a tab-delimited file. FASTA files can also be validated against a set of rules that identify common formatting errors.|
Parse Digest File Options
|Can read in a FASTA file and create a new FASTA file with all of the protein sequences reversed or even randomized. This new file can be the equivalent length of the original file, or can include just a subset of the original file.|
FASTA File Validation
Peptide Uniqueness Options
Calculate the number of uniquely identifiable peptides within the input file (digested or undigested), using only mass, or both mass and NET, with appropriate tolerances. The predicted NET values are computed using the NET Prediction DLL included with the NET Prediction Utility
Reference: A.D. Norbeck, M.E. Monroe, J.N. Adkins, K.K. Anderson, D.S. Daly, and R.D. Smith, "The utility of accurate mass and LC elution time information in the analysis of complex proteomes," Journal of the American Society for Mass Spectrometry; (2005) 16, 1239-1249.
All publications that utilize this software should provide appropriate acknowledgement to PNNL and the OMICS.PNL.GOV website. However, if the software is extended or modified, then any subsequent publications should include a more extensive statement, using this text or a similar variant:
Portions of this research were supported by the NIH National Center for Research Resources (Grant RR018522), the W.R. Wiley Environmental Molecular Science Laboratory (a national scientific user facility sponsored by the U.S. Department of Energy's Office of Biological and Environmental Research and located at PNNL), and the National Institute of Allergy and Infectious Diseases (NIH/DHHS through interagency agreement Y1-AI-4894-01). PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under contract DE-AC05-76RL0 1830.