Validate Fasta File
The Validate Fasta File utility is a Windows command-line application that will parse a Fasta file and return the number of proteins and number of residues in the file. Additionally, it will check the validity of the fasta file looking for common, known problems.
You can use the /F switch to generate a new, Fixed Fasta file, where various protein naming issues are fixed. By default, long protein names will be shortened and invalid residues will be removed. The processing will also look for proteins with duplicate sequences or duplicate names.
If you use the /R switch, then proteins with duplicate names (but differing sequences) will be renamed to assign a unique name to each protein. Additionally, if you use the /D switch, then proteins with duplicate sequences will be consolidated to keep just one copy of each protein. When looking for duplicate proteins, you can use the /L switch to ignore I/L differences. in protein sequences
You can provide an XML parameter file to the program using the /P switch. Related to this, run the program with the /X switch to have it generate a model (default) XML parameter file.
A GUI version of the Validate Fasta File program can be found in the Protein Digestion Simulator software
Please see the Command Line Application Help page for additional information on running this program at the Windows command prompt.
|Download Software Tool||Download Source Code|
|Version||v2.1.5053||Requirements||Microsoft NET Framework 4.0|
|Date Updated||November 1, 2013||File Size (Software Tool)||212 KB (ZIP)|
|Registration Required||No||File size (Source Code)||184 KB (ZIP)|
|Comments||See the complete Revision History for a history of changes|
All publications that utilize this software should provide appropriate acknowledgement to PNNL and the OMICS.PNL.GOV website. However, if the software is extended or modified, then any subsequent publications should include a more extensive statement, using this text or a similar variant:
Portions of this research were supported by the NIH National Center for Research Resources (Grant RR018522), the W.R. Wiley Environmental Molecular Science Laboratory (a national scientific user facility sponsored by the U.S. Department of Energy's Office of Biological and Environmental Research and located at PNNL), and the National Institute of Allergy and Infectious Diseases (NIH/DHHS through interagency agreement Y1-AI-4894-01). PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under contract DE-AC05-76RL0 1830.