For Identifying Proteins from ESI Tandem MS Data Sequest

The first algorithm/program to identify proteins by matching MS-MS data to database sequences is Sequest, which was introduced by John Yates and Jimmy Eng in 1995. Several similar software tools have been introduced and these will be discussed below. However, Sequest will be described in greatest detail as representative of this class of tools. The value of programs such as Sequest is that they provide a relatively rapid assignment of MS-MS spectra to specific peptide sequences in databases. This allows fast reduction of large volumes of LC-MS-MS data in proteomics analyses. However, it is important to emphasize that Sequest and similar programs do not actually perform de novo interpretation of the spectra per se. Consequently, the output of these programs depends on the quality of the MS-MS data obtained and the completeness and accuracy of the database used.

Here's how Sequest works. When the MS instrument obtains an MS-MS scan, it not only records the MS-MS scan itself, but also the m/z value of the precursor ion. This information is stored together with the scan data. After the analysis is complete, the user sits at the computer and opens the Sequest program. The user then selects the datafile containing the MS-MS scans to be analyzed. The user can tell Sequest what enzyme (e.g., trypsin) was used to digest the protein sample and also specifies whether singly or doubly charged ions were subjected to MS-MS. Finally, the user selects a database against which the MS-MS data are to be compared.

Once the program starts, all of the proteins in the database are subjected to a virtual digestion with the enzyme specified by the user (e.g., trypsin). This generates a master list of possible peptides for comparison to the MS-MS scans. Then each MS-MS scan is analyzed as follows (Fig. 1):

• The precursor m/z for each MS-MS scan is used to select peptides from the database with the same mass (within a defined mass tolerance). If no digestion enzyme was specified, the program simply selects all possible peptide sequences that correspond to the mass of the peptide ion analyzed in that MS-MS scan.

• Theoretical MS-MS spectra are generated from each of the selected peptides.

• The MS-MS spectrum being analyzed is compared with each of the theoretical MS-MS spectra generated from the database.

• A correlation score is calculated for each match between the MS-MS scan and the theoretical MS-MS spectra.

Theoretical Spectrum Sequest
Fig. 1. Schematic representation of operation of Sequest algorithm for correlation of MS-MS spectra with peptide sequences from databases.

The best match or matches for each MS-MS scan analyzed is then reported. The results for the analyses of all the MS-MS scans in a datafile (e.g., an LC-MS-MS run) are presented in a web-browser-based window. A summary of the peptide sequences matched to MS-MS spectra for any particular protein is also displayed (Fig. 2). The quality of the matches of individual MS-MS scans to database entries can be evaluated on the basis of the correlation scores reported or by visual inspection of the actual MS-MS spectra overlaid with the predicted b- and/or y-ions from the "best match" peptide. This makes it relatively easy to distinguish reliable matches from unreliable ones. For example, an MS-MS spectrum in which over half of the predicted b- and y-ions in a peptide match the major signals in the spectrum is often a correct match (Fig. 3). On the other hand, a spectrum in which most of the prominent fragment ions do not match the purported b- and y-ions for the putative peptide is usually an incorrect match (Fig. 4).

However, it is important to realize that Sequest does not make judgments about the quality of the matches assigned. The algorithm will identify the best peptide sequence match in the database to each MS-MS scan analyzed—even if the match is of very poor quality. Thus, the user must use some combination of knowledge and intuition to decide which matches to accept and which to reject. One aid to decision-making is a summary of database proteins matched to MS-MS scans, which is presented in the browser window, which lists the proteins in order of decreasing numbers of hits (i.e., MS-MS scan matches). A protein with several high-quality hits on different peptide sequences is likely to be correctly identified. On the other hand, a protein with one or two weak matches to MS-MS spectra may not be correctly identified. The most reliable protein identifications are those in which several different sequences within the identified protein provide high-quality matches to MS-MS spectra in the datafile.

There are a number of complications that can make Sequest analyses more time-consuming or less accurate and complete. First, many peptides bear covalent modifications, which modify the m/z values of the peptides actually analyzed. Thus, Sequest would use a mass that did not correspond to the unmodified peptide mass in the database. In this case, no correct match between the MS-MS scan of the modified peptide and the database sequence would be possible because of this

Fig. 2. Sequest browser output window showing correspondence of actual MS-MS spectrum product ions with predicted b- and y-ions from matched peptide sequence. The actual spectrum provides a good match to predicted b- and y-ions from the matched peptide sequence.
Ionized Peptide
Fig. 3. Sequest browser output window showing correspondence of actual MS-MS spectrum product ions with predicted b- and y-ions from matched peptide sequence. The actual spectrum provides a poor match to predicted b- and y-ions from the matched peptide sequence.

>gi1418694|pir | |ABBOS serum albumin precursor [validated] - bovine [HASS=69270] HKWTFISLL LLFSSAYSRG VFRRDTHKSE IAHRFKDLGE EOFKGLVLIA FSOYLOOCPF DEHVKLVHEL TEFAKTCVAD ESHAGCEKSL HTLFGDELCK VASLRETYGD HADCCEKQEP ERNECFLSHK DDSPDLPKLK PDPNTLCDEF KADEKKFWGK YLYEIARRHP YFYAPELLYY ANKYNGVFQD CCQAEDKGAC LLPKIETHRE KVLASSARQR LRCASIOKFG ERALKAWSVA RLSQKFPKftE FVEVTKLVTD LTKVHKECCH GDLLECADDR ADLAKYICDM ODTISSKLKE CCDKPLLEKS HCIAEVEKDA IPENLPPLTA DFAEDKDVCK HYQEAKDAFL GSFLYEYSRR HPEYAVSVLL RLAKEYEATL EECCAKDDPH ACYSTVFDKL KHLVDEPQNL IKQHCDQFEK LGEYGFQTJAL IVRYTRKVPQ VSTPTLVEVS RSLGKVGTRC CTKPESERMP CTEDYLSLIL HRLCVLHEKT PVSEKVTKCC TESLVMRRPC FSALTPDETY VPKftFDEKLF TFHADICTLP DTEKQIKKQT ALVELLKHKP KATEEQLKTV MEHFVAFVDK CCAADDKEAC FAVEGPKLW STQTALA

S Mass (average): 69270.4 Identifier: gi | 418694 Database: C :/Xcalibur/database/bovine . f asta Protein Coverage: 232/607 = 38.2% by amino acid count, 2 6178.1/69270.4 = 37.8% by mass

Was this article helpful?

0 0
Healthy Chemistry For Optimal Health

Healthy Chemistry For Optimal Health

Thousands Have Used Chemicals To Improve Their Medical Condition. This Book Is one Of The Most Valuable Resources In The World When It Comes To Chemicals. Not All Chemicals Are Harmful For Your Body – Find Out Those That Helps To Maintain Your Health.

Get My Free Ebook


Post a comment