A significant challenge in the process of dereplication is the extraction of useful data from the large and complex data sets generated with hyphenated techniques such as LC/UV/MS. To accommodate high throughput sample processing, this extraction should occur in an automated fashion where data files are automatically searched for known components. Most modern LC-MS systems have some capability to perform this operation. For compounds that have been analyzed previously on a similar system, data files can generally be searched within known retention-time windows for the presence of specific mass-spectral or UV peaks. Matches are then compared (often manually) to the MS or UV library entries from in-house libraries. One major limitation to this approach is the need to generate RT, UV, and MS data on each standard of interest. This restricts the utility of these comparative searches to compounds that are available as standards, or that were previously isolated in-house, and have been previously run on a similar system. Obviously, this case will only represent a small fraction of the compounds that are encountered in nature.
Rather than rely solely on comparisons with standards, significant research has been performed to develop tools for the automated extraction of peaks from chromatographic data. Many of these are based on component-detection algorithm (CODA)-based peak detection , or the Automated Mass Spectral Deconvolution and Identification System (AMDIS) software developed by the National Institute of Standards and Technology . Application of the AMDIS software for component detection and searching was recently described by Zink et al. . In this work, a single quadrupole LC-MS was operated in the pulsed positive and negative ESI mode with concurrent UV detection. AMDIS was used to extract mass spectral peaks from both the positive and negative ESI mass spectra. These MS spectra were combined with UV spectra and retention-time data to search against an internal compound library to produce a composite matching score. Wang et al.  described the development of an automated deconvolution system based on an ion trap that used fast positive/negative switching routines in combination with the acquisition of data-dependent MS/MS and MSn spectra. This system was used to generate a "spectral fingerprint" of each unknown, which included its retention time, MS polarity, as well as MS and MS/MS spectra. These data were then searched against a database of several thousand known natural products.
Some of the most elegant work in data reduction for natural products screening has been reported in the area of diversity assessment, as discussed earlier . Several of these examples involve LC/UV/MS analysis of natural product extracts on an extremely large scale. The resulting data were converted to the netCDF format and evaluated with customized software. Three-dimensional images were generated for the visualization of sample component differences, and a measure of similarity was calculated to allow for diversity assessment. This approach was also applied to the automatic deconvolution of the mass spectra of secondary metabolites from LC-MS data with factor analysis .
Diversity assessment has also been combined with the accurate mass capabilities offered by TOF instruments. He et al.  described the use of an LCT instrument with UV detection to rank crude natural product extracts. These data were converted to netCDF format along with the accurate mass data, and processed using the CODA algorithm to extract peaks from the raw data. Accurate mass, RT, UV, and MS data were then combined to generate over 4000 unique "signatures," which were used to prioritize new samples according to their diversity. This approach was later applied to an LCT, incorporating a lock-spray inlet, resulting in improved mass accuracy and system stability .
Was this article helpful?