Skip to Main Content U.S. Department of Energy
MS Proteomics - Software and Tools - Synopsis and First Hits Files

Synopsis and First Hits Files

The PNNL automated data processing pipeline produces Synopsis and First Hits files after each SEQUEST, X!Tandem, or Inspect analysis job finishes. These tab-delimited files are created by the Peptide File Extractor application. This software parses out important information that is subsequently used in downstream processing.

The _syn.txt (synopsis) files and the _fht.txt (first hits) files contain similar information, but are optimized for different purposes. The First Hits file contains only the highest scoring peptide for each spectrum; the entries are not limited by an XCorr threshold (all scans will be represented).

The Synopsis file contains all of the peptides above a threshold in each spectrum (nominally XCorr 1.5 for SEQUEST); thus, all peptide matches that exceed or match the threshold are included. Also, the synopsis file contains multiple records for peptides that occur in multiple proteins (aka ORFs).

Software on this site that creates or reads Synopsis files includes:

 

Column Details

1. Row index (aka HitNum)
This value is the row number of the peptide entry, usually sorted according to descending XCorr value.
2. Scan Number (aka ScanNum)
This is the scan number of the MS/MS spectrum that resulted in the peptide identification of interest. The scan number is a closely related to retention time.
3. Number of Scans (aka ScanCount or ScanRange)
Under some instances, the Thermofinnigan extract_msn (aka lcq_dta) program will group (combine via summing) multiple spectra into one spectrum file in order to reduce duplication and increase signal-to-noise. In normal proteomic samples processed by extract_msn, ~2% of the spectra are grouped. A value of 1 indicates only one scan was used. If more than one scan is grouped, then the Number of Scans column indicates the range between the first spectrum and the last spectrum grouped, thus it is not actually a scan count value. For example, if the program groups spectra 8163, 8165, 8168, 8171, 8174, and 8177, then Number of Scans = 8177-8163+1 = 15. Unfortunately, when using extract_msn to convert .Raw files to .Dta files, we do not know which exact scans are grouped together, just the range of scans as indicated by this column.
4. Charge State
This is the charge state of the parent ion. This value is often used in filtering criteria since larger charge state peptides can produce a larger number of potential fragment ions. Thus, many criteria often have different cutoffs for different charge states.
5. MH
MH or (M + H)+ includes the mass of the peptide, calculated from theoretical amino acid masses, for the peptide match to the current spectrum of interest. Largely because of convention, this value is reported as the mass of the peptide + mass of proton. The default mass calculation uses average amino acid mass values (i.e. it does not use monoisotopic masses).
6. Xcorr (aka cross correlation score)
This is SEQUEST's main scoring value related to peptide confidence.
7. DeltaCn
SEQUEST calculates the deltaCn using the Xcorr of the top hit and the nth ranked hit using:
(Xcorr(top hit) - Xcorr(n)) ÷ Xcorr(top hit). Thus, the deltaCn for the top hit is
(Xcorr(top hit) - Xcorr(top hit)) ÷ Xcorr(top hit) = 0.
  • Note: This score is often of little value as calculated here. Instead, the DeltaCn2 value better reflects peptide confidence. This column is still retained in order to have compatibility with previous versions of the syn/fht files.
8. Sp
Preliminary score. This is a score that Sequest uses to do an initial scoring of the all peptide candidates (these can easily number of 100,000 or more). It is quicker to do, but the score is less robust than Xcorr, when considering peptide confidence.
9. Ref.
ORF reference. This is the ORF name of the protein wherein the current peptide sequence was detected. For the fht and instances where the peptide is contained in many ORFs, only the first ORF is listed. All ORFs are outputted in the _syn.txt (up to 50 of them) for the current peptide.
10. MO
Number of multiple ORFs. If a peptide sequence can be found in x ORFs, then this value is +(x-1) indicating the number of additional ORFs. Users mostly use this field in filtering out "degenerate" or "multiorf" peptide hits in tabulated lists of peptide IDs.
11. Peptide
This is the peptide sequence that matches the spectrum.
12. DeltaCn2
For the nth ranked hit, deltaCn2 is
(Xcorr(n) - Xcorr(n+1)) ÷ Xcorr(n). Note that this value is calculated by the syn-fht summary generator, but this is the DeltaCn that should be used for filtering purposes (and what other people use in the scientific community). Note: "rank" here refers to rank by Xcorr.
13. RankSp
Prelimiary score rank. The determination of this score is done by SEQUEST, where a list or array of peptides are sorted according to decreasing Sp score and a rank is assigned to each peptide sequence (e.g. the topmost entry would have a RankSp=1)
14. RankXc
RankXc is the Xcorr rank. The explaination of this is similar to that given for RankSp except that the list of peptides are sorted by decreasing Xcorr. This value should be set to 1 for the fht file for all peptide hits, but is usually 1- 10 (or more) in the syn file.
15. DelM
This is the mass error in the parent ion mass. This is primarily calculated from the SEQUEST output and is calculated from 'Theoretical Parent Mass' minus 'Parent Mass Observed'.
16. XcRatio
XCorr ratio, calculated using Xcorr(n) ÷ Xcorr(top hit).
17. PassFilt
This score is not calculated by SEQUEST, but is calculated from syn-fht summary generator using Xcorr, DelCN, RankXc and the number of tryptic termini. This is a legacy score and can typically be ignored.
18. MScore
This score measures whether the peptide under consideration has a fragmentation pattern that is consistent with that observed for high confident peptide assignments. It is an empirical measure and not derived from SEQUEST scores. As rough guide, MScore of 10 or higher qualifies as a good score. This is also a legacy score and can typically be ignored.
19. Number of tryptic terminii (number of tryptic cleavage sites)
This is the number of terminii that conform to the expected cleavage behavior of trypsin (i.e. C-terminal to R and K). Note that K-P and R-P do not qualify as tryptic cleavages because of the proline rule. However, the protein N-terminus and protein C-terminus do count as tryptic cleavage sites. Values can be 0, 1, or 2:
  • 2 = Fully tryptic, for example: K.ACDEFGR.S or -.ACDEFGR.S or R.ACDEFGH.-
  • 1 = Partially tryptic, for example: L.ACDEFGR.S or K.ACDEFGR.P
  • 0 = Non tryptic, for example: L.ACDEFGH.S or K.PCDEFGR.P

 

Acknowledgment

All publications that utilize this software should provide appropriate acknowledgement to PNNL and the OMICS.PNL.GOV website. However, if the software is extended or modified, then any subsequent publications should include a more extensive statement, using this text or a similar variant:

Portions of this research were supported by the NIH National Center for Research Resources (Grant RR018522), the W.R. Wiley Environmental Molecular Science Laboratory (a national scientific user facility sponsored by the U.S. Department of Energy's Office of Biological and Environmental Research and located at PNNL), and the National Institute of Allergy and Infectious Diseases (NIH/DHHS through interagency agreement Y1-AI-4894-01). PNNL is operated by Battelle Memorial Institute for the U.S. Department of Energy under contract DE-AC05-76RL0 1830.

Site Links