U.S. Department of Energy

Pacific Northwest National Laboratory

Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies.

TitleSequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies.
Publication TypeJournal Article
Year of Publication2013
AuthorsWebb-Robertson B-JM, Matzke MM, Metz TO, McDermott JE, Walker H, Rodland KD, Pounds JG, Waters KM
JournalBiotechniques
KeywordsAnimals, Chromatography, Liquid, Databases, Protein, Humans, Mass Spectrometry, Metabolomics, Principal Component Analysis, Proteomics
Abstract

Principal Component Analysis (PCA) is a common exploratory tool used to evaluate large complex data sets. The resulting lower-dimensional representations are often valuable for pattern visualization, clustering, or classification of the data. However, PCA cannot be applied directly to many -omics data sets generated by newer technologies such as label-free mass spectrometry due to large numbers of non-random missing values. Here we present a sequential projection pursuit PCA (sppPCA) method for defining principal components in the presence of missing data. Our results demonstrate that this approach generates robust and informative low-dimensional data representations compared to commonly used imputation approaches.

DOI10.2144/000113978
Alternate JournalBioTechniques
PubMed ID23477384
Grant List1R0111GM084892 / GM / NIGMS NIH HHS / United States
CA160019 / CA / NCI NIH HHS / United States
DK070146 / DK / NIDDK NIH HHS / United States
U54-016015 / / PHS HHS / United States
| Pacific Northwest National Laboratory