# pFind Studio: a computational solution for mass spectrometry-based proteomics

### 2021

###### ABSTRACT: RT-PCR is the primary method to diagnose COVID-19 and is also used to monitor the disease course. This approach, however, suffers from false negatives due to RNA instability and poses a high risk to medical practitioners. Here, we investigated the potential of using serum proteomics to predict viral nucleic acid positivity during COVID19. We analyzed the proteome of 275 inactivated serum samples from 54 out of 144 COVID-19 patients and shortlisted 42 regulated proteins in the severe group and 12 in the non-severe group. Using these regulated proteins and several key clinical indexes, including days after symptoms onset, platelet counts, and magnesium, we developed two machine learning models to predict nucleic acid positivity, with an AUC of 0.94 in severe cases and 0.89 in non-severe cases, respectively. Our data suggest the potential of using a serum protein-based machine learning model to monitor COVID-19 progression, thus complementing swab RT-PCR tests. More efforts are required to promote this approach into clinical practice since mass spectrometry-based protein measurement is not currently widely accessible in clinic. [more...]
###### ABSTRACT: N-linked glycosylation plays important roles in multiple physiological and pathological processes, while the analysis coverage is still limited due to the insufficient digestion of glycoproteins, as well as incomplete ion fragments for intact glycopeptide determination. Herein, a mirror-cutting-based digestion strategy was proposed by combining two orthogonal proteases of LysargiNase and trypsin to characterize the macro- and micro-heterogeneity of protein glycosylation. Using the above two proteases, the b- or y-ion series of peptide sequences were, respectively, enhanced in MS/MS, generating the complementary spectra for peptide sequence identification. More than 27% (489/1778) of the site-specific glycoforms identified by LysargiNase digestion were not covered by trypsin digestion, suggesting the elevated coverage of protein sequences and site-specific glycoforms by the mirror-cutting method. Totally, 10,935 site-specific glycoforms were identified from mouse brain tissues in the 18 h MS analysis, which significantly enhanced the coverage of protein glycosylation. Intriguingly, 27 mannose-6-phosphate (M6P) glycoforms were determined with core fucosylation, and 23 of them were found with the "Y-HexNAc-Fuc" ions after manual checking. This is hitherto the first report of M6P and fucosylation co-modifications of glycopeptides, in which the mechanism and function still needs further exploration. The mirror-cutting digestion strategy also has great application potential in the exploration of missing glycoproteins from other complex samples to provide rich resources for glycobiology research. [more...]
###### ABSTRACT: The RNA binding protein TDP-43 forms intranuclear or cytoplasmic aggregates in age-related neurodegenerative diseases. In this study, we found that RNA binding-deficient TDP-43 (produced by neurodegeneration-causing mutations or posttranslational acetylation in its RNA recognition motifs) drove TDP-43 demixing into intranuclear liquid spherical shells with liquid cores. These droplets, which we named "anisosomes", have shells that exhibit birefringence, thus indicating liquid crystal formation. Guided by mathematical modeling, we identified the primary components of the liquid core to be HSP70 family chaperones, whose adenosine triphosphate (ATP)-dependent activity maintained the liquidity of shells and cores. In vivo proteasome inhibition within neurons, to mimic aging-related reduction of proteasome activity, induced TDP-43-containing anisosomes. These structures converted to aggregates when ATP levels were reduced. Thus, acetylation, HSP70, and proteasome activities regulate TDP-43 phase separation and conversion into a gel or solid phase. [more...]
###### ABSTRACT: Human plasma fibronectin is an adhesive protein that plays a crucial role in wound healing. Many studies had indicated that glycans might mediate the expression and functions of fibronectin, yet a comprehensive understanding of its glycosylation is still missing. Here, we performed a comprehensive N- and O-glycosylation mapping of human plasma fibronectin and quantified the occurrence of each glycoform in a site-specific manner. Intact N-glycopeptides were enriched by zwitterionic hydrophilic interaction chromatography, and N-glycosite sites were localized by the O-18-labeling method. O-glycopeptide enrichment and O-glycosite identification were achieved by an enzyme-assisted site-specific extraction method. An RP-LC-MS/MS system functionalized with collision-induced dissociation and stepped normalized collision energy (sNCE)-HCD tandem mass was applied to analyze the glycoforms of fibronectin. A total of 6 N-glycosites and 53 O-glycosites were identified, which were occupied by 38 N-glycoforms and 16 O-glycoforms, respectively. Furthermore, 77.31% of N-glycans were sialylated, and O-glycosylation was dominated by the sialyl-T antigen. These site-specific glycosylation patterns on human fibronectin can facilitate functional analyses of fibronectin and therapeutics development. [more...]
###### ABSTRACT: The characterization of therapeutic glycoproteins is challenging due to the structural heterogeneity of the therapeutic protein glycosylation. This study presents an in-depth analytical strategy for glycosylation of first-generation erythropoietin (epoetin beta), including a developed mass spectrometric workflow for N-glycan analysis, bottom-up mass spectrometric methods for site-specific N-glycosylation, and a LC-MS approach for O-glycan identification. Permethylated N-glycans, peptides, and enriched glycopeptides of erythropoietin were analyzed by nanoLC-MS/MS, and de-N-glycosylated erythropoietin was measured by LC-MS, enabling the qualitative and quantitative analysis of glycosylation and different glycan modifications (e.g., phosphorylation and O-acetylation). The newly developed Python scripts enabled the identification of 140 N-glycan compositions (237 N-glycan structures) from erythropoietin, especially including 8 phosphorylated N-glycan species. The site-specificity of N-glycans was revealed at the glycopeptide level by pGlyco software using different proteases. In total, 114 N-glycan compositions were identified from glycopeptide analysis. Moreover, LC-MS analysis of deN-glycosylated erythropoietin species identified two O-glycan compositions based on the mass shifts between non-O-glycosylated and O-glycosylated species. Finally, this integrated strategy was proved to realize the in-depth glycosylation analysis of a therapeutic glycoprotein to understand its pharmacological properties and improving the manufacturing processes. [more...]
###### ABSTRACT: Post-translational changes in the redox state of cysteine residues can rapidly and reversibly alter protein functions, thereby modulating biological processes. The nematode C. elegans is an ideal model organism for studying cysteine-mediated redox signaling at a network level. Here we present a comprehensive, quantitative, and site-specific profile of the intrinsic reactivity of the cysteinome in wild-type C. elegans. We also describe a global characterization of the C. elegans redoxome in which we measured changes in three major cysteine redox forms after H2O2 treatment. Our data revealed redox-sensitive events in translation, growth signaling, and stress response pathways, and identified redox-regulated cysteines that are important for signaling through the p38 MAP kinase (MAPK) pathway. Our in-depth proteomic dataset provides a molecular basis for understanding redox signaling in vivo, and will serve as a valuable and rich resource for the field of redox biology. Reversible cysteine oxidative modifications have emerged as important mechanisms that alter protein function. Here the authors globally assess the cysteine reactivity and an array of cysteine oxidative modifications in C. elegans, providing insights into redox signaling at the organismal level. [more...]
###### ABSTRACT: Haptoglobin (Hp) is one of the acute-phase response proteins secreted by the liver, and its aberrant N-glycosylation was previously reported in hepatocellular carcinoma (HCC). Limited studies on Hp O-glycosylation have been previously reported. In this study, we aimed to discover and confirm its O-glycosylation in HCC based on lectin binding and mass spectrometry (MS) detection. First, serum Hp was purified from patients with liver cirrhosis (LC) and HCC, respectively. Then, five lectins with Gal or GalNAc monosaccharide specificity were chosen to perform lectin blot, and the results showed that Hp in HCC bound to these lectins in a much stronger manner than that in LC. Furthermore, label-free quantification based on MS was performed. A total of 26 intact O-glycopeptides were identified on Hp, and most of them were elevated in HCC as compared to LC. Among them, the intensity of HYEGS(316)TVPEK (H1N1S1) on Hp was the highest in HCC patients. Increased HYEGS(316)TVPEK (H1N1S1) in HCC was quantified and confirmed using the MS method based on O-18/O-16 C-terminal labeling and multiple reaction monitoring. This study provided a comprehensive understanding of the glycosylation of Hp in liver diseases. [more...]
###### ABSTRACT: The heterogeneity and complexity of glycosylation hinder the depth of site-specific glycoproteomics analysis. High-field asymmetric-waveform ion-mobility spectrometry (FAIMS) has been shown to improve the scope of bottom-up proteomics. The benefits of FAIMS for quantitative N-glycoproteomics have not been investigated yet. In this work, we optimized FAIMS settings for N-glycopeptide identification, with or without the tandem mass tag (TMT) label. The optimized FAIMS approach significantly increased the identification of site-specific N-glycopeptides derived from the purified immunoglobulin M (IgM) protein or human lymphoma cells. We explored in detail the changes in FAIMS mobility caused by N-glycopeptides with different characteristics, including TMT labeling, charge state, glycan type, peptide sequence, glycan size, and precursor m/z. Importantly, FAIMS also improved multiplexed N-glycopeptide quantification, both with the standard MS2 acquisition method and with our recently developed Glyco-SPS-MS3 method. The combination of FAIMS and Glyco-SPS-MS3 methods provided the highest quantitative accuracy and precision. Our results demonstrate the advantages of FAIMS for improved mass spectrometry-based qualitative and quantitative N-glycoproteomics. [more...]
Use: pGlyco

###### ABSTRACT: Protein N-glycosylation in human milk whey plays a substantial role in infant health during postnatal development. Changes in site-specific glycans in milk whey reflect the needs of infants under different circumstances. However, the conventional glycoproteomics analysis of milk whey cannot reveal the changes in site-specific glycans because the attached glycans are typically enzymatically removed from the glycoproteins prior to analysis. In this study, N-glycoproteomics analysis of milk whey was performed without removing the attached glycans, and 330 and 327 intact glycopeptides were identified in colostrum and mature milk whey, respectively. Label-free quantification of site-specific glycans was achieved by analyzing the identified intact glycopeptides, which revealed 9 significantly upregulated site-specific glycans on 6 glycosites and 11 significantly downregulated sitespecific glycans on 8 glycosites. Some interesting change trends in N-glycans attached to specific glycosites in human milk whey were observed. Bisecting GlcNAc was found attached to 11 glycosites on 8 glycoproteins in colostrum and mature milk. The dynamic changes in site-specific glycans revealed in this study provide insights into the role of protein N-glycosylation during infant development. [more...]
###### ABSTRACT: The diagnosis of AFP (alpha-fetoprotein)-negative HCC (hepatocellular carcinoma) mostly relies on imaging and pathological examinations, and it lacks valuable and practical markers. Protein N-glycosylation is a crucial post-translation modifying process related to many biological functions in an organism. Alteration of N-glycosylation correlates with inflammatory diseases and infectious diseases including hepatocellular carcinoma. Here, serum N-linked intact glycopeptides with molecular weight (MW) of 40-55 kDa were analyzed in a discovery set (n = 40) including AFP-negative HCC and liver cirrhosis (LC) patients using label-free quantification methodology. Quantitative lens culinaris agglutin (LCA) ELISA was further used to confirm the difference of glycosylation on serum PON1 in liver diseases (n = 56). Then, the alteration of site-specific intact N-glycopeptides of PON1 was comprehensively assessed by using Immunoprecipitation (IP) and mass spectrometry based O-16/O-18 C-terminal labeling quantification method to distinguish AFP-negative HCC from LC patients in a validation set (n = 64). Totally 195 glycopeptides were identified using a dedicated search engine pGlyco. Among them, glycopeptides from APOH, HPT/HPTR, and PON1 were significantly changed in AFP-negative HCC as compared to LC. In addition, the reactivity of PON1 with LCA in HCC patients with negative AFP was significantly elevated than that in cirrhosis patients. The two glycopeptides HAN(253)WTLTPLK (H5N4S2) and (H5N4S1) corresponding to PON1 were significantly increased in AFP-negative HCC patients, as compared with LC patients. Variations in PON1 glycosylation may be associated with AFP-negative HCC and might be helpful to serve as potential glycomic-based biomarkers to distinguish AFP-negative HCC from cirrhosis. [more...]
### 2020

###### ABSTRACT: Native peptides from sea bass muscle were analyzed by two different approaches: medium-sized peptides by peptidomics analysis, whereas short peptides by suspect screening analysis employing an inclusion list of exact m/z values of all possible amino acid combinations (from 2 up to 4). The method was also extended to common post-translational modifications potentially interesting in food analysis, as well as non-proteolytic aminoacyl derivatives, which are well-known taste-active building blocks in pseudo-peptides. The medium-sized peptides were identified by de novo and combination of de novo and spectra matching to a protein sequence database, with up to 4077 peptides (2725 modified) identified by database search and 2665 peptides (223 modified) identified by de novo only; 102 short peptide sequences were identified (with 12 modified ones), and most of them had multiple reported bioactivities. The method can be extended to any peptide mixture, either endogenous or by protein hydrolysis, from other food matrices. [more...]
###### ABSTRACT: Protein sequence database search is one of the most commonly used methods for protein identification in shotgun proteomics. In tradition, searching a protein sequence database is usually required to construct the theoretical spectrum for each peptide at first, which only considers the information of mass-to-charge ratio at present. However, the information related to isotope peak intensity is neglected. Thanks to the rapid development of artificial intelligence technique in recent years, deep learning-based MS/MS spectrum prediction tools have showed a high accuracy and great potentials to improve the sensitivity and accuracy of protein sequence database searching. In this study, we used a deep learning model (pDeep2) to predict the theoretical mass spectrum of all peptides and applied it to a database searching tool (DeepNovo), thus improving the sensitivity and accuracy of peptide identification. [more...]
###### ABSTRACT: Steady improvement in Orbitrap-based mass spectrometry (MS) technologies has greatly advanced the peptide sequencing speed and depth. In-depth analysis of the performance of state-of-the-art MS and optimization of key parameters can improve sequencing efficiency. In this study, we first systematically compared the performance of two popular data-dependent acquisition approaches, with Orbitrap as the first-stage (MS1) mass analyzer and the same Orbitrap (high-high approach) or ion trap (high-low approach) as the second-stage (MS2) mass analyzer, on the Orbitrap Fusion mass spectrometer. High-high approach outperformed high-low approach in terms of better saturation of the scan cycle and higher MS2 identification rate. However, regardless of the acquisition method, there are still more than 60% of peptide features untargeted for MS2 scan. We then systematically optimized the MS parameters using the high-high approach. Increasing the isolation window in the high-high approach could facilitate faster scan speed, but decreased MS2 identification rate. On the contrary, increasing the injection time of MS2 scan could increase identification rate but decrease scan speed and the number of identified MS2 spectra. Dynamic exclusion time should be set properly according to the chromatography peak width. Furthermore, we found that the Orbitrap analyzer, rather than the analytical column, was easily saturated with higher loading amount, thus limited the dynamic range of MS1-based quantification. By using optimized parameters, 10 000 proteins and 110 000 unique peptides were identified by using 20 h of effective liquid chromatography (LC) gradient time. The study therefore illustrated the importance of synchronizing LC-MS precursor ion targeting, fragment ion detection, and chromatographic separation for high efficient data-dependent proteomics. [more...]
###### ABSTRACT: Alk-Ph is a clickable APEX2 substrate developed for spatially restricted protein/RNA labeling in intact yeast cells. Alk-Ph is more water soluble and cell wall permeable than biotin-phenol substrate, allowing more efficient profiling of the subcellular proteome in microorganisms. We describe the protocol for Alk-Ph probe synthesis, APEX2 expression, and protein/RNA labeling in yeast and the workflow for quantitative proteomic experiments and data analysis. Using the yeast mitochondria as an example, we provide guidelines to achieve high-resolution mapping of subcellular yeast proteome and transcriptome. For complete details on the use and execution of this protocol, please refer to Li etal. (2020). © 2020 The Author(s). [more...]
###### ABSTRACT: The glycocalyx comprises glycosylated proteins and lipids and fcorms the outermost layer of cells. It is involved in fundamental inter- and intracellular processes, including non-self-cell and self-cell recognition, cell signaling, cellular structure maintenance, and immune protection. Characterization of the glycocalyx is thus essential to understanding cell physiology and elucidating its role in promoting health and disease. This protocol describes how to comprehensively characterize the glycocalyx N-glycans and O-glycans of glycoproteins, as well as intact glycolipids in parallel, using the same enriched membrane fraction. Profiling of the glycans and the glycolipids is performed using nanoflow liquid chromatography-mass spectrometry (nanoLC-MS). Sample preparation, quantitative LC-tandem MS (LC-MS/MS) analysis, and data processing methods are provided. In addition, we discuss glycoproteomic analysis that yields the site-specific glycosylation of membrane proteins. To reduce the amount of sample needed, N-glycan, O-glycan, and glycolipid analyses are performed on the same enriched fraction, whereas glycoproteomic analysis is performed on a separate enriched fraction. The sample preparation process takes 2-3 d, whereas the time spent on instrumental and data analyses could vary from 1 to 5 d for different sample sizes. This workflow is applicable to both cell and tissue samples. Systematic changes in the glycocalyx associated with specific glycoforms and glycoconjugates can be monitored with quantitation using this protocol. The ability to quantitate individual glycoforms and glycoconjugates will find utility in a broad range of fundamental and applied clinical studies, including glycan-based biomarker discovery and therapeutics. This protocol describes nanoflow liquid chromatography-mass spectrometry (nanoLC-MS) analysis of the N-glycans and O-glycans of glycoproteins and glycolipids, as well as site-specific glycosylation of membrane proteins. [more...]
###### ABSTRACT: Cysteine is unique among all protein-coding amino acids, owing to its intrinsically high nucleophilicity. The cysteinyl thiol group can be covalently modified by a broad range of redox mechanisms or by various electrophiles derived from exogenous or endogenous sources. Measuring the response of protein cysteines to redox perturbation or electrophiles is critical for understanding the underlying mechanisms involved. Activity-based protein profiling based on thiol-reactive probes has been the method of choice for such analyses. We therefore adapted this approach and developed a new chemoproteomic platform, termed 'QTRP' (quantitative thiol reactivity profiling), that relies on the ability of a commercially available thiol-reactive probe IPM (2-iodo-N-(prop-2-yn-1-yl)acetamide) to covalently label, enrich and quantify the reactive cysteinome in cells and tissues. Here, we provide a detailed and updated workflow of QTRP that includes procedures for (i) labeling of the reactive cysteinome from cell or tissue samples (e.g., control versus treatment) with IPM, (ii) processing the protein samples into tryptic peptides and tagging the probe-modified peptides with isotopically labeled azido-biotin reagents containing a photo-cleavable linker via click chemistry reaction, (iii) capturing biotin-conjugated peptides with streptavidin beads, (iv) identifying and quantifying the photo-released peptides by mass spectrometry (MS)-based shotgun proteomics and (v) interpreting MS data by a streamlined informatic pipeline using a proteomics software, pFind 3, and an automatic post-processing algorithm. We also exemplified here how to use QTRP for mining H2O2-sensitive cysteines and for determining the intrinsic reactivity of cysteines in a complex proteome. We anticipate that this protocol should find broad applications in redox biology, chemical biology and the pharmaceutical industry. The protocol for sample preparation takes 3 d, whereas MS measurements and data analyses require 75 min and <30 min, respectively, per sample. Proteomic cysteines can undergo redox reactions and electrophile-derived modifications. In QTRP, a thiol-reactive probe is used to covalently label, enrich and quantify the reactive cysteinome in cultured cells and tissue samples. [more...]
###### ABSTRACT: Identification of post-translationally or chemically modified peptides in mass spectrometry-based proteomics experiments is a crucial yet challenging task. We have recently introduced a fragment ion indexing method and the MSFragger search engine to empower an open search strategy for comprehensive analysis of modified peptides. However, this strategy does not consider fragment ions shifted by unknown modifications, preventing modification localization and limiting the sensitivity of the search. Here we present a localization-aware open search method, in which both modification-containing (shifted) and regular fragment ions are indexed and used in scoring. We also implement a fast mass calibration and optimization method, allowing optimization of the mass tolerances and other key search parameters. We demonstrate that MSFragger with mass calibration and localization-aware open search identifies modified peptides with significantly higher sensitivity and accuracy. Comparing MSFragger to other modification-focused tools (pFind3, MetaMorpheus, and TagGraph) shows that MSFragger remains an excellent option for fast, comprehensive, and sensitive searches for modified peptides in shotgun proteomics data. Mass spectrometry-based proteomics is the method of choice for the global mapping of post-translational modifications, but matching and scoring peaks with unknown masses remains challenging. Here, the authors present a refined open search strategy to score all peaks with higher sensitivity and accuracy. [more...]
###### ABSTRACT: Proteins carry out the vast majority of functions in all biological domains, but for technological reasons their large-scale investigation has lagged behind the study of genomes. Since the first essentially complete eukaryotic proteome was reported(1), advances in mass-spectrometry-based proteomics(2)have enabled increasingly comprehensive identification and quantification of the human proteome(3-6). However, there have been few comparisons across species(7,8), in stark contrast with genomics initiatives(9). Here we use an advanced proteomics workflow-in which the peptide separation step is performed by a microstructured and extremely reproducible chromatographic system-for the in-depth study of 100 taxonomically diverse organisms. With two million peptide and 340,000 stringent protein identifications obtained in a standardized manner, we double the number of proteins with solid experimental evidence known to the scientific community. The data also provide a large-scale case study for sequence-based machine learning, as we demonstrate by experimentally confirming the predicted properties of peptides fromBacteroides uniformis. Our results offer a comparative view of the functional organization of organisms across the entire evolutionary range. A remarkably high fraction of the total proteome mass in all kingdoms is dedicated to protein homeostasis and folding, highlighting the biological challenge of maintaining protein structure in all branches of life. Likewise, a universally high fraction is involved in supplying energy resources, although these pathways range from photosynthesis through iron sulfur metabolism to carbohydrate metabolism. Generally, however, proteins and proteomes are remarkably diverse between organisms, and they can readily be explored and functionally compared at www.proteomesoflife.org. [more...]
###### ABSTRACT: Spectrum prediction using machine learning or deep learning models is an emerging method in computational proteomics. Several deep learning-based MS/MS spectrum prediction tools have been developed and showed their potentials not only for increasing the sensitivity and accuracy of data-dependent acquisition search engines, but also for building spectral libraries for data-independent acquisition analysis. Different tools with their unique algorithms and implementations may result in different performances. Hence, it is necessary to systematically evaluate these tools to find out their preferences and intrinsic differences. In this study, multiple datasets with different collision energies, enzymes, instruments, and species, are used to evaluate the performances of the deep learning-based MS/MS spectrum prediction tools, as well as, the machine learning-based tool MS2PIP. The evaluations may provide helpful insights and guidelines of spectrum prediction tools for the corresponding researchers. [more...]
###### ABSTRACT: The engineered ascorbate peroxidase (APEX) is a powerful tool for the proximity-dependent labeling of proteins and RNAs in live cells. Although widely use in mammalian cells, APEX applications in microorganisms have been hampered by the poor labeling efficiency of its biotin-phenol (BP) substrate. In this study, we sought to address this challenge by designing and screening a panel of alkyne-functionalized substrates. Our best probe, Alk-Ph, substantially improves APEX-labeling efficiency in intact yeast cells, as it is more cell wall-permeant than BP. Through a combination of protein-centric and peptide-centric chemoproteomic experiments, we have identified 165 proteins with a specificity of 94% in the yeast mitochondrial matrix. In addition, we have demonstrated that Alk-Ph is useful for proximity-dependent RNA labeling in yeast, thus expanding the scope of APEX-seq. We envision that this improved APEX-labeling strategy would set the stage for the large-scale mapping of spatial proteome and transcriptome in yeast. [more...]
###### ABSTRACT: Plants deploy a variety of secondary metabolites to fend off pathogen attack. Although defense compounds are generally considered toxic to microbes, the exact mechanisms are often unknown. Here, we show that the Arabidopsis defense compound sulforaphane (SFN) functions primarily by inhibiting Pseudomonas syringae type III secretion system (TTSS) genes, which are essential for pathogenesis. Plants lacking the aliphatic glucosinolate pathway, which do not accumulate SFN, were unable to attenuate TTSS gene expression and exhibited increased susceptibility to P. syringae strains that cannot detoxify SFN. Chemoproteomics analyses showed that SFN covalently modified the cysteine at position 209 of HrpS, a key transcription factor controlling TTSS gene expression. Site-directed mutagenesis and functional analyses further confirmed that Cys209 was responsible for bacterial sensitivity to SFN in vitro and sensitivity to plant defenses conferred by the aliphatic glucosinolate pathway. Collectively, these results illustrate a previously unknown mechanism by which plants disarma pathogenic bacterium. [more...]
###### ABSTRACT: Liquid chromatography tandem mass spectrometry (LCMS/MS) has been the most widely used technology for phosphoproteomics studies. As an alternative to database searching and probability-based phosphorylation site localization approaches, spectral library searching has been proved to be effective in the identification of phosphopeptides. However, incompletion of experimental spectral libraries limits the identification capability. Herein, we utilize MS/MS spectrum prediction coupled with spectral matching for site localization of phosphopeptides. In silico MS/MS spectra are generated from peptide sequences by deep learning/machine learning models trained with nonphosphopeptides. Then, mass shift according to phosphorylation sites, phosphoric acid neutral loss, and a "budding" strategy are adopted to adjust the in silico mass spectra. In silico MS/MS spectra can also be generated in one step for phosphopeptides using models trained with phosphopeptides. The method is benchmarked on data sets of synthetic phosphopeptides and is used to process real biological samples. It is demonstrated to be a method requiring only computational resources that supplements the probability-based approaches for phosphorylation site localization of singly and multiply phosphorylated peptides. [more...]
###### ABSTRACT: Precise assignment of sialylation linkages at the glycopeptide level is of importance in bottom-up glycoproteomics and an indispensable step to understand the function of glycoproteins in pathogen-host interactions and cancer progression. Even though some efforts have been dedicated to the discrimination of alpha 2,3/alpha 2,6-sialylated isomers, unambiguous identification of sialoglycopeptide isomers is still needed. Herein, we developed an innovative glycosyltransferase labeling assisted mass spectrometry (GLAMS) strategy. After specific enzymatic labeling, oxonium ions from higher-energy C-trap dissociation (HCD) fragmentation of alpha 2,3-sailoglycopeptides then generate unique reporters to distinctly differentiate those of alpha 2,6-sailoglycopeptide isomers. 'With this strategy, a total of 1236 linkage-specific sialoglycopeptides were successfully identified from 161 glycoproteins in human serum. [more...]
###### ABSTRACT: Regulation of protein N-glycosylation is essential in human cells. However, large-scale, accurate, and site-specific quantification of glycosylation is still technically challenging. We here introduce SugarQuant, an integrated mass spectrometry-based pipeline comprising protein aggregation capture (PAC)-based sample preparation, multi-notch MS3 acquisition (Glyco-SPS-MS3) and a data-processing tool (GlycoBinder) that enables confident identification and quantification of intact glycopeptides in complex biological samples. PAC significantly reduces sample-handling time without compromising sensitivity. Glyco-SPS-MS3 combines high-resolution MS2 and MS3 scans, resulting in enhanced reporter signals of isobaric mass tags, improved detection of N-glycopeptide fragments, and lowered interference in multiplexed quantification. GlycoBinder enables streamlined processing of Glyco-SPS-MS3 data, followed by a two-step database search, which increases the identification rates of glycopeptides by 22% compared with conventional strategies. We apply SugarQuant to identify and quantify more than 5,000 unique glycoforms in Burkitt's lymphoma cells, and determine site-specific glycosylation changes that occurred upon inhibition of fucosylation at high confidence. Comprehensive quantitative profiling of intact glycopeptides remains technically challenging. To address this, the authors here develop an integrated quantitative glycoproteomic workflow, including optimized sample preparation, multiplexed quantification and a dedicated data processing tool. [more...]
###### ABSTRACT: Peptide spectrum match scoring algorithm plays a key role in the peptide sequence identification,and the traditional scoring algorithm cannot effectively make full use of the peptide fragmentation pattern to perform scoring. In order to solve the problem,a multi-classification probability sum scoring algorithm combined with the peptide sequence information representation called deepscore- alpha was proposed. In this algorithm,the second scoring was not performed with the consideration of global information,and there was no limitation on the similarity calculation method of theoretical mass spectrum and experimental mass spectrum. In the algorithm,a one-dimensional residual network was used to extract the underlying information of the sequence,and then the effects of different peptide bonds on the current peptide bond fracture were integrated through the multi-attention mechanism to generate the final fragmention relative intensity distribution probability matrix,after that,the final peptide spectrum match score was calculated by combining the actual relative intensity of the peptide sequence fragmention. This algorithm was compared with Comet and MSGF+,two common open source identification tools. The results show that when False Discovery Rate(FDR)was 0.01 on humanbody proteome dataset,the number of peptide sequences retained by deepScore-alpha is increased by about 14%,and the Top1 hit ratio(the proportion of the correct peptide sequences in the spectrum with the highest score)of this algorithm is increased by about 5 percentage points. The generalization performance test of the model trained by human ProteomeTools2 dataset show that the number of sequences peptide retained by deepScore- alpha at FDR of 0.01 is improved by about 7%,the Top1 hit ratio of this algorithm is increased by about 5 percentage points,and the identification results from Decoy library in the Top1 is decreased by about 60%. Experimental results prove that,the algorithm can retain more peptide sequences at lower FDR value, improve the Top1 hit ratio,and has good generalization performance. [more...]
### 2019

###### ABSTRACT: In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E. coli in such a way that the amino acid sequence encodes two concatenated English sentences. The letters 'O' and 'U' in the sentence are both replaced by 'K' in the protein. The sequence cannot be found online and carried to-be-discovered modifications. With limited information in hand, to solve the challenge, we developed a workflow consisting of bottom-up proteomics, de novo sequencing and a bioinformatics pipeline for data processing and searching for frequently appearing words. We assembled a complete first question: "Have you ever wondered what the most fundamental limitations in life are?" and validated the result by sequence database search against a customized FASTA file. We also searched the spectra against an E. coli proteome database and found close to 600 endogenous, co-purified E. coli proteins and contaminants introduced during sample handling, which made the inference of the sentence very challenging. We conclude that E. coli can express English sentences, and that de novo sequencing combined with clever sequence database search strategies is a promising tool for the identification of uncharacterized proteins. © 2019 Published by Elsevier B.V. on behalf of European Proteomics Association (EuPA). [more...]
###### ABSTRACT: In recent years, high-throughput technologies have contributed to the development of a more precise picture of the human proteome. However, 2129 proteins remain listed as missing proteins (MPs) in the newest neXtProt release (2019-02). The main reasons for MPs are a low abundance, a low molecular weight, unexpected modifications, membrane characteristics, and so on. Moreover, >50% of the MS/MS data have not been successfully identified in shotgun proteomics. Open-pFind, an efficient open search engine, recently released by the pFind group in China, might provide an opportunity to identify these buried MPs in complex samples. In this study, proteins and potential MPs were identified using Open-pFind and three other search engines to compare their performance and efficiency with three large-scale data sets digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity on different amino acid (AA) residues. Our results demonstrated that Open-pFind identified 44.7-93.1% more peptide-spectrum matches and 21.3-61.6% more peptide sequences than the second-best search engine. As a result, Open-pFind detected 53.1% more MP candidates than MaxQuant and 8.8% more candidate MPs than Proteome Discoverer. In total, 5 (PE2) of the 124 MP candidates identified by Open-pFind were verified with 2 or 3 unique peptides containing more than 9 AAs by using a spectrum theoretical prediction with pDeep and synthesized peptide matching with pBuild after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant filtering. These five verified MPs can be saved as PEI proteins. In addition, three other MP candidates were verified with two unique peptides (one peptide containing more than 9 AAs and the other containing only 8 AAs), which was slightly lower than the criteria listed by C-HPP and required additional verification information. More importantly, unexpected modifications were detected in these MPs. All MS data sets have been deposited into ProteomeXchange with the identifier PXDO15759. [more...]
###### ABSTRACT: The application of database search algorithms with very wide precursor mass tolerances for the "Open Search" paradigm has brought new efforts at post-translational modification discovery in shotgun proteomes. This approach has motivated the acceleration of database search tools by incorporating fragment indexing features. In this report, we compare open searches and sequence tag searches of high-resolution tandem mass spectra to seek a common "palette" of modifications when analyzing multiple formalin-fixed, paraffin-embedded (FFPE) tissues from Thermo Q-Exactive and SCIEX TripleTOF instruments. While open search in MSFragger produced some gains in identified spectra, careful FDR control limited the best result to 24% more spectra than narrow search (worst result: a loss of 9%). Open pFind produced high apparent sensitivity for PSMs, but entrapment sequences hinted that the actual error rate may be higher than reported by the software. Combining sequence tagging, open search, and chemical knowledge, we converged on this set of PTMs for our four FFPE sets: mono- and di-methylation (nTerm and Lys), single and double oxidation (Met and Pro), and variable carbamidomethylation (nTerm and Cys). (C) 2019 Elsevier B.V. All rights reserved. [more...]
###### ABSTRACT: Aims: Cysteine persulfidation (also called sulfhydration or sulfuration) has emerged as a potential redox mechanism to regulate protein functions and diverse biological processes in hydrogen sulfide (H2S) signaling. Due to its intrinsically unstable nature, working with this modification has proven to be challenging. Although methodological progress has expanded the inventory of persulfidated proteins, there is a continued need to develop methods that can directly and unequivocally identify persulfidated cysteine residues in complex proteomes. Results: A quantitative chemoproteomic method termed as low-pH quantitative thiol reactivity profiling (QTRP) was developed to enable direct site-specific mapping and reactivity profiling of proteomic persulfides and thiols in parallel. The method was first applied to cell lysates treated with NaHS, resulting in the identification of overall 1547 persulfidated sites on 994 proteins. Structural analysis uncovered unique consensus motifs that might define this distinct type of modification. Moreover, the method was extended to profile endogenous protein persulfides in cells expressing H2S-generating enzyme, mouse tissues, and human serum, which led to additional insights into mechanistic, structural, and functional features of persulfidation events, particularly on human serum albumin. Innovation and Conclusion: Low-pH QTRP represents the first method that enables direct and unbiased proteomic mapping of cysteine persulfidation. Our method allows to generate the most comprehensive inventory of persulfidated targets of NaHS so far and to perform the first analysis of in vivo persulfidation events, providing a valuable tool to dissect the biological functions of this important modification. Antioxid. Redox Signal. 00, 000-000. [more...]
###### ABSTRACT: Rheumatoid arthritis (RA) is an autoimmune disease in which certain immune cells are dysfunctional and attack their own healthy tissues. There has been great difficulty in finding an accurate and efficient method for the diagnosis of early-stage RA. The present shortage of diagnostic methods leads to the rough treatments of the patients in the late stages, such as joint removing. Nowadays, there is an increasing focus on glyco-biomarkers discovery for malicious disease via MS-based strategy. In this study, we present an integrated proteomics and glycoproteomics approach to uncover the pathological changes of some RA-related glyco-biomarkers and glyco-checkpoints involved in the RA onset. Among 39 distinctly expressive N-glycoproteins, 27 N-glycoproteins were discovered with over twofold expression significances. On the other hand, 13 proteins have been distinguished with significant differences in 53 distinctly expressed proteins identified in this study. Such an integrated approach will provide a comprehensive strategy for new potential glyco-biomarkers and checkpoints discovery in rheumatoid arthritis. [more...]
###### ABSTRACT: De novo peptide sequencing for large-scale proteomics remains challenging because of the lack of full coverage of ion series in tandem mass spectra. We developed a mirror protease of trypsin, acetylated LysargiNase (Ac-LysargiNase), with superior activity and stability. The mirror spectrum pairs derived from the Ac-LysargiNase and trypsin treated samples can generate full b and y ion series, which provide mutual complementarity of each other, and allow us to develop a novel algorithm, pNovoM, for de novo sequencing. Using pNovoM to sequence peptides of purified proteins, the accuracy of the sequence was close to 100%. More importantly, from a large-scale yeast proteome sample digested with trypsin and Ac-LysargiNase individually, 48% of all tandem mass spectra formed mirror spectrum pairs, 97% of which contained full coverage of ion series, resulting in precision de novo sequencing of full-length peptides by pNovoM. This enabled pNovoM to successfully sequence 21,249 peptides from 3,753 proteins and interpreted 44-152% more spectra than pNovo+ and PEAKS at a 5% FDR at the spectrum level. Moreover, the mirror protease strategy had an obvious advantage in sequencing long peptides. We believe that the combination of mirror protease strategy and pNovoM will be an effective approach for precision de novo sequencing on both single proteins and proteome samples. [more...]
###### ABSTRACT: The peptide components of defatted walnut (Juglans regia L.) meal hydrolysate (DWMH) remain unclear, hindering the investigation of biological mechanisms and exploitation of bioactive peptides. The present study aims to identify the peptide composition of DWMH, followed by to evaluate in vitro antioxidant effects of selected peptides and investigate mechanisms of antioxidative effect. First, more than 1 000 peptides were identified by de novo sequencing in DWMH. Subsequently, a scoring method was established to select promising bioactive peptides by structure based screening. Eight brand new peptides were selected due to their highest scores in two different batches of DWMH. All of them showed potent in vitro antioxidant effects on H2O2-injured nerve cells. Four of them even possessed significantly stronger effects than DWMH, making the selected bioactive peptides useful for further research as new bioactive entities. Two mechanisms of hydroxyl radical scavenging and ROS reduction were involved in their antioxidative effects at different degrees. The results showed peptides possessing similar capacity of hydroxyl radical scavenging or ROS reduction may have significantly different in vitro antioxidative effects. Therefore, comprehensive consideration of different antioxidative mechanisms were suggested in selecting antioxidative peptides from DWMH. [more...]
###### ABSTRACT: Glycosylation, as a biologically important protein post-translational modification, often alters on both glycosites and glycans, simultaneously. However, most of current approaches focused on biased profiling of either glycosites or glycans, and limited by time-consuming process and milligrams of starting protein material. We describe here a simple and integrated spintip-based glycoproteomics technology (termed Glyco-SISPROT) for achieving a comprehensive view of glycoproteome with shorter sample processing time and low microgram starting material. By carefully integrating and optimizing SCX, C18 and Concanavalin A (Con A) packing material and their combination in spintip format, both predigested peptides and protein lysates could be processed by Glyco-SISPROT with high efficiency. More importantly, deglycopeptide, intact glycopeptide and glycans released by multiple glycosidases could be readily collected from the same Glyco-SISPROT workflow for LC-MS analysis. In total, above 1850 glycosites in (1) over tilde 770 unique deglycopeptides were characterized from mouse liver by using either 100 mu g of predigested peptides or directly using 100 mu g of protein lysates, in which about 30% of glycosites were released by both PNGase F and Endos. To the best of our knowledge, this approach should be one of the most comprehensive glycoproteomic approaches by using limited protein starting material. One significant benefit of Glyco-SISPROT is that whole processing time is dramatically reduced from a few days to less than 6 h with good reproducibility when protein lysates were directly processed by Glyco-SISPROT. We expect that this method will be suitable for multi-level glycoproteome analysis of rare biological samples with high sensitivity. (C) 2019 Elsevier B.V. All rights reserved. [more...]
###### ABSTRACT: Aberrant sialylation of glycoproteins is closely related to many malignant diseases, and analysis of sialylation has great potential to reveal the status of these diseases. However, in-depth analysis of sialylation is still challenging because of the high microheterogeneity of protein glycosylation, as well as the low abundance of sialylated glycopeptides (SGPs). Herein, an integrated strategy was fabricated for the detailed characterization of glycoprotein sialylation on the levels of glycosites and site-specific glycoforms by employing the SGP enrichment method. This strategy enabled the identification of up to 380 glycosites, as well as 414 intact glycopeptides corresponding to 383 site-specific glycoforms from only initial 6 mu L serum samples, indicating the high sensitivity of the method for the detailed analysis of glycoprotein sialylation. This strategy was further employed to the differential analysis of glycoprotein sialylation between hepatocellular carcinoma patients and control samples, leading to the quantification of 344 glycosites and 405 site-specific glycoforms, simultaneously. Among these, 43 glycosites and 55 site-specific glycoforms were found to have significant change on the glycosite and site-specific glycoform levels, respectively. Interestingly, several glycoforms attached onto the same glycosite were found with different change tendencies. This strategy was demonstrated to be a powerful tool to reveal subtle differences of the macro- and microheterogeneity of glycoprotein sialylation. [more...]
###### ABSTRACT: N-glycosylation alteration has been reported in liver diseases. Characterizing N-glycopeptides that correspond to N-glycan structure with specific site information enables better understanding of the molecular pathogenesis of liver damage and cancer. Here, unbiased quantification of N-glycopeptides of a cluster of serum glycoproteins with 40-55 kDa molecular weight (40-kDa band) was investigated in hepatitis B virus (HBV)-related liver diseases. We used an N-glycopeptide method based on O-18/O-16 C-terminal labeling to obtain 82 comparisons of serum from patients with HBV-related hepatocellular carcinoma (HCC) and liver cirrhosis (LC). Then, multiple reaction monitoring (MRM) was performed to quantify N-glycopeptide relative to the protein content, especially in the healthy donor-HBV-LC-HCC cascade. TPLTAN(205)ITK (H5N5S1F1) and (H5N4S2F1) corresponding to the glycopeptides of IgA(2) were significantly elevated in serum from patients with HBV infection and even higher in HBV-related LC patients, as compared with healthy donor. In contrast, the two glycopeptides of IgA(2) fell back down in HBV-related HCC patients. In addition, the variation in the abundance of two glycopeptides was not caused by its protein concentration. The altered N-glycopeptides might be part of a unique glycan signature indicating an IgA-mediated mechanism and providing potential diagnostic clues in HBV-related liver diseases. [more...]
### 2018

###### ABSTRACT: The open (mass tolerant) search of tandem mass spectra of peptides shows great potential in the comprehensive detection of post-translational modifications (PTMs) in shotgun proteomics. However, this search strategy has not been widely used by the community, and one bottleneck of it is the lack of appropriate algorithms for automated and reliable post-processing of the coarse and error-prone search results. Here we present PTMiner, a software tool for confident filtering and localization of modifications (mass shifts) detected in an open search. After mass-shift-grouped false discovery rate (FDR) control of peptide-spectrum matches (PSMs), PTMiner uses an empirical Bayesian method to localize modifications through iterative learning of the prior probabilities of each type of modification occurring on different amino acids. The performance of PTMiner was evaluated on three data sets, including simulated data, chemically synthesized peptide library data and modified-peptide spiked-in proteome data. The results showed that PTMiner can effectively control the PSM FDR and accurately localize the modification sites. At 1% real false localization rate (FLR), PTMiner localized 93%, 84 and 83% of the modification sites in the three data sets, respectively, far higher than two open search engines we used and an extended version of the Ascore localization algorithm. We then used PTMiner to analyze a draft map of human proteome containing 25 million spectra from 30 tissues, and confidently identified over 1.7 million modified PSMs at 1% FDR and 1% FLR, which provided a system-wide view of both known and unknown PTMs in the human proteome. [more...]
###### ABSTRACT: Cysteine sulfinic acid or S-sulfinylation is an oxidative post-translational modification (OxiPTM) that is known to be involved in redox-dependent regulation of protein function but has been historically difficult to analyze biochemically. To facilitate the detection of S-sulfinylated proteins, we demonstrate that a clickable, electrophilic diazene probe (DiaAlk) enables capture and site-centric proteomic analysis of this OxiPTM. Using this workflow, we revealed a striking difference between sulfenic acid modification (S-sulfenylation) and the S-sulfinylation dynamic response to oxidative stress, which is indicative of different roles for these OxiPTMs in redox regulation. We also identified >55 heretofore-unknown protein substrates of the cysteine sulfinic acid reductase sulfiredoxin, extending its function well beyond those of 2-cysteine peroxiredoxins (2-Cys PRDX1-4) and offering new insights into the role of this unique oxidoreductase as a central mediator of reactive oxygen species-associated diseases, particularly cancer. DiaAlk therefore provides a novel tool to profile S-sulfinylated proteins and study their regulatory mechanisms in cells. [more...]
###### ABSTRACT: Confident characterization of intact glycopeptides is a challenging task in mass spectrometry-based glycoproteomics due to microheterogeneity of glycosylation, complexity of glycans, and insufficient fragmentation of peptide bones. Open mass spectral library search is a promising computational approach to peptide identification, but its potential in the identification of glycopeptides has not been fully explored. Here we present pMatchGlyco, a new spectral library search tool for intact N-linked glycopeptide identification using high-energy collisional dissociation (HCD) tandem mass spectrometry (MS/MS) data. In pMatchGlyco, (1) MS/MS spectra of deglycopeptides are used to create spectral library, (2) MS/MS spectra of glycopeptides are matched to the spectra in library in an open (precursor tolerant) manner and the glycans are inferred, and (3) a false discovery rate is estimated for top-scored matches above a threshold. The efficiency and reliability of pMatchGlyco were demonstrated on a data set of mixture sample of six standard glycoproteins and a complex glycoprotein data set generated from human cancer cell line OVCAR3. [more...]
### 2017

###### ABSTRACT: Proteins can undergo oxidative cleavage by in-vitro metal-catalyzed oxidation (MCO) in either the aamidation or the diamide pathway. However, whether oxidative cleavage of polypeptide-chain occurs in biological systems remains unexplored. We describe a chemoproteomic approach to globally and site-specifically profile electrophilic protein degradants formed from peptide backbone cleavages in human proteomes, including the known N-terminal alpha-ketoacyl products and >1000 unexpected N-terminal formyl products. Strikingly, such cleavages predominantly occur at the carboxyl side of lysine (K) and arginine (R) residues across native proteomes in situ, while MCO-induced oxidative cleavages randomly distribute on peptide/protein sequences in vitro. Furthermore, ionizing radiation-induced reactive oxygen species (ROS) also generate random oxidative cleavages in situ. These findings suggest that the endogenous formation of N-formyl and N-alpha-ketoacyl degradants in biological systems is more likely regulated by a previously unknown mechanism with a trypsin-like specificity, rather than the random oxidative damage as previously thought. More generally, our study highlights the utility of quantitative chemoproteomics in combination with unrestricted search tools as a viable strategy to discover unexpected chemical modifications of proteins labeled with active-based probes. [more...]
###### ABSTRACT: Reactive metabolites (RM) formed from bioactivation of drugs can covalently modify liver proteins and cause mechanism-based inactivation of major cytochrome P450 (CYP450) enzymes. Risk of bioactivation of a test compound is routinely examined as part Of lead optimization efforts in drug discovery. Here we described a chemoproteomic platform to ass in vitro and in vivo bioactivation potential of drugs. This platform enabled us to determine reactivity of thousands of proteomic cysteines toward RMs of diclofenac formed in human liver microsomes and living animals. We pinpointed numerous reactive cysteines as the targets of RMs of diclofenac, including the active (heme-binding) sites on several key CYP450 isoforms (1A2, 2E1 and 3A4 for human, 2C39 and 3A11 for mouse). This general platform should be applied to other drugs, drug candidates, and xenobiotics with potential hepatoxicity, including environmental organic substances, bioactive natural products, and traditional Chinese medicine. [more...]
###### ABSTRACT: Identifying missing proteins (MPs) has been one of the critical missions of the Chromosome-Centric Human Proteome Project (C-HPP). Since 2012, over 30 research teams from 17 countries have been trying to search biochemical strategies. MPs mainly fall into the following adequate and accurate evidence of MPs through various classes: (1) low-molecular-weight (LMW) proteins, (2) membrane proteins, (3) proteins that contained various post-translational modifications (PTMs), (4) nucleic acid associated proteins, (5) low abundance, and (6) unexpressed genes. In this study, kidney cancer and adjacent tissues were used for phosphoproteomics research, and 8962 proteins were identified, including 6415 phosphoproteins, and 44 728 phosphorites, of which 10 266 were unreported previously. In total, 75 candidate detections were found, including 45 phoshoproteins. GO analysis for these 75 candidate detections revealed that these proteins mainly clustered as membrane proteins and took part in nephron and kidney development. After rigorous screening and manual check, 9 of them were verified with the synthesized peptides. Finally, only one missing protein was confirmed. All mass spectrometry data from this study have been deposited in the PRIDE with identifier PXD006482. [more...]
###### ABSTRACT: Although 5 years of the missing proteins (MPs) study have been completed, searching for MPs remains one of the core missions of the Chromosome-Centric Human Proteome Project (C-HPP). Following the next-50-MPs challenge of the C-HPP, we have focused on the testis-enriched MPs by various strategies since 2015. On the basis of the theoretical analysis of MPs (2017-01, neXtProt) using multiprotease digestion, we found that nonconventional proteases (e.g. LysargiNase, GluC) could improve the peptide diversity and sequence coverage compared with Trypsin. Therefore, a multiprotease strategy was used for searching more MPs in the same human testis tissues separated by 10% SDS-PAGE, followed by high resolution LC-MS/MS system (Q Exactive HF). A total of 7838 proteins were identified. Among them, three PE2 MPs in neXtProt 2017-01 have been identified: beta-defensin 123 (Q8N688, chr 20q), cancer/testis antigen family 45 member A10 (PODMU9, chr Xq), and Histone H2A-Bbd type 2/3 (P0C5Z0, chr Xq). However, because only one unique peptide of >= 9 AA was identified in beta-defensin 123 and Histone H2A-Bbd type 2/3, respectively, further analysis indicates that each falls under the exceptions clause of the HPP Guidelines v2.1. After a spectrum quality check, isobaric PTM and single amino acid variant (SAAV) filtering, and verification with a synthesized peptide, and based on overlapping peptides from different proteases, these three MPs should be considered as exemplary examples of MPs found by exceptional criteria. Other MPs were considered as candidates but need further validation. All MS data sets have been deposited to the ProteomeXchange with identifier PXD006465. [more...]
###### ABSTRACT: Markers are needed to facilitate early detection of pancreatic ductal adenocarcinoma (PDAC), which is often diagnosed too late for effective therapy. Starting with a PDAC cell reprogramming model that recapitulated the progression of human PDAC, we identified secreted proteins and tested a subset as potential markers of PDAC. We optimized an enzyme-linked immunosorbent assay (ELISA) using plasma samples from patients with various stages of PDAC, from individuals with benign pancreatic disease, and from healthy controls. A phase 1 discovery study (n = 20), a phase 2a validation study (n = 189), and a second phase 2b validation study (n = 537) revealed that concentrations of plasma thrombospondin-2 (THBS2) discriminated among all stages of PDAC consistently. The receiver operating characteristic (ROC) c-statistic was 0.76 in the phase 1 study, 0.84 in the phase 2a study, and 0.87 in the phase 2b study. The plasma concentration of THBS2 was able to discriminate resectable stage I cancer as readily as stage III/IV PDAC tumors. THBS2 plasma concentrations combined with those for CA19-9, a previously identified PDAC marker, yielded a c-statistic of 0.96 in the phase 2a study and 0.97 in the phase 2b study. THBS2 data improved the ability of CA19-9 to distinguish PDAC from pancreatitis. With a specificity of 98%, the combination of THBS2 and CA19-9 yielded a sensitivity of 87% for PDAC in the phase 2b study. A THBS2 and CA19-9 blood marker panel measured with a conventional ELISA may improve the detection of patients at high risk for PDAC. [more...]
###### ABSTRACT: Detailed characterization of glycoprotein structures requires determining both the sites of glycosylation as well as the glycan structures associated with each site. In this work, we developed an analytical strategy for characterization of intact N-glycopeptides in complex proteome samples. In the first step, tryptic glycopeptides were enriched using ZIC-HILIC. Secondly, a portion of the glycopeptides was treated with endoglycosidase H (Endo H) to remove high-mannose (Man) and hybrid N-linked glycans. Thirdly, a fraction of the Endo H-treated glycopeptides was further subjected to PNGase F treatment in O-18 water to remove the remaining complex glycans. The intact glycopeptides and deglycosylated peptides were analyzed by nano-RPLC-MS/MS, and the glycan structures and the peptide sequences were identified by using the Byonic or pFind tools. Sequential digestion by endoglycosidase provided candidate glycosites information and indication of the glycoforms on each glycopeptide, thus helping to confine the database search space and improve the confidence regarding intact glycopeptide identification. We demonstrated the effectiveness of this approach using RNase B and IgG and applied this sequential digestion strategy for the identification of glycopeptides from the HepG2 cell line. We identified 4514 intact glycopeptides coming from 947 glycosites and 1011 unique peptide sequences from HepG2 cells. The intensity of different glycoforms at a specific glycosite was obtained to reach the occupancy ratios of site-specific glycoforms. These results indicate that our method can be used for characterizing site-specific protein glycosylation in complex samples. [more...]
### 2016

###### ABSTRACT: Plant growth is controlled by integration of hormonal and light-signaling pathways. BZS1 is a B-box zinc finger protein previously characterized as a negative regulator in the brassinosteroid (BR)-signaling pathway and a positive regulator in the light-signaling pathway. However, the mechanisms by which BZS1/BBX20 integrates light and hormonal pathways are not fully understood. Here, using a quantitative proteomic workflow, we identified several BZS1-associated proteins, including light-signaling components COP1 and HY5. Direct interactions of BZS1 with COP1 and HY5 were verified by yeast two-hybrid and co-immunoprecipitation assays. Overexpression of BZS1 causes a dwarf phenotype that is suppressed by the hy5 mutation, while overexpression of BZS1 fused with the SRDX transcription repressor domain (BZS1-SRDX) causes a long-hypocotyl phenotype similar to hy5, indicating that BZS1's function requires HY5. BZS1 positively regulates HY5 expression, whereas HY5 negatively regulates BZS1 protein level, forming a feedback loop that potentially contributes to signaling dynamics. In contrast to BR, strigolactone (SL) increases BZS1 level, whereas the SL responses of hypocotyl elongation, chlorophyll and HY5 accumulation are diminished in the BZS1-SRDX seedlings, indicating that BZS1 is involved in these SL responses. These results demonstrate that BZS1 interacts with HY5 and plays a central role in integrating light and multiple hormone signals for photomorphogenesis in Arabidopsis. Copyright (C) 2016, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Limited and Science Press. All rights reserved. [more...]
###### ABSTRACT: Protein phosphorylation, one of the most common and important modifications of acute and reversible regulation of protein function, plays a dominant role in almost all cellular processes. These signaling events regulate cellular responses, including proliferation, differentiation, metabolism, survival, and apoptosis. Several studies have been successfully used to identify phosphorylated proteins and dynamic changes in phosphorylation status after stimulation. Nevertheless, it is still rather difficult to elucidate precise complex phosphorylation signaling pathways. In particular, how signal transduction pathways directly communicate from the outer cell surface through cytoplasmic space and then directly into chromatin networks to change the transcriptional and epigenetic landscape remains poorly understood. Here, we describe the optimization and comparison of methods based on thiophosphorylation affinity enrichment, which can be utilized to monitor phosphorylation signaling into chromatin by isolation of phosphoprotein containing nucleosomes, a method we term phosphorylation-specific chromatin affinity purification (PS-ChAP). We utilized this PS-ChAP(1) approach in combination with quantitative proteomics to identify changes in the phosphorylation status of chromatin-bound proteins on nucleosomes following perturbation of transcriptional processes. We also demonstrate that this method can be employed to map phosphoprotein signaling into chromatin containing nucleosomes through identifying the genes those phosphorylated proteins are found on via thiophosphate PS-ChAP-qPCR. Thus, our results showed that PS-ChAP offers a new strategy for studying cellular signaling and chromatin biology, allowing us to directly and comprehensively investigate phosphorylation signaling into chromatin to investigate if these pathways are involved in altering gene expression. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD002436. [more...]
###### ABSTRACT: Detection of differentially abundant proteins in label-free quantitative shotgun liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a series of computational steps that identify and quantify LC-MS features. It also requires statistical analyses that distinguish systematic changes in abundance between conditions from artifacts of biological and technical variation. The 2015 study of the Proteome Informatics Research Group (iPRG) of the Association of Biomolecular Resource Facilities (ABRF) aimed to evaluate the effects of the statistical analysis on the accuracy of the results. The study used LC tandem mass spectra acquired from a controlled mixture, and made the data available to anonymous volunteer participants. The participants used methods of their choice to detect differentially abundant proteins, estimate the associated fold changes, and characterize the uncertainty of the results. The study found that multiple strategies (including the use of spectral counts versus peak intensities, and various software tools) could lead to accurate results, and that the performance was primarily determined by the analysts' expertise. This manuscript summarizes the outcome of the study, and provides representative examples of good computational and statistical practice. The data set generated as part of this study is publicly available. [more...]
###### ABSTRACT:
###### ABSTRACT: N-Glycosylation is one of the most prevalent protein post-translational modifications and is involved in many biological processes, such as protein folding, cellular communications, and signaling. Alteration of N-glycosylation is closely related to the pathogenesis of diseases. Thus, the investigation of protein N-glycosylation is crucial for the diagnosis and treatment of disease. In this research, we applied diethylaminoethanol (DEAE) Sepharose solid-phase extraction microcolumns for N-glycopeptide enrichment. This method integrated the advantages of Click Maltose and zwitterionic HILIC (ZIC-HILIC) and showed a relatively higher specificity for N-glycosylated peptides. This strategy was then applied to tryptic digests of normal human serum, followed by deglycosylation using peptide-N-glycosidase F (PNGase F) in H-2 O-18. Subsequent LC-MS/MS analysis allowed for the assignment of 219 N-glycosylation sites from 115 serum N-glycoproteins. This study provides an alternative approach for N-glycopeptide enrichment and the method employed is effective for large-scale N-glycosylation site identification. [more...]
###### ABSTRACT: Since 2012, missing proteins (MPs) investigation has been one of the critical missions of Chromosome-Centric Human Proteome Project (C-HPP) through various biochemical strategies. On the basis of our previous testis MPs study, faster scanning and higher resolution mass-spectrometry-based proteomics might be conducive to MPs exploration, especially for low-abundance proteins. In this study, Q-Exactive HF (HF) was used to survey proteins from the same testis tissues separated by two separating methods (tricine- and glycine-SDS-PAGE), as previously described. A total of 8526 proteins were identified, of which more low-abundance proteins were uniquely detected in HF data but not in our previous LTQ Orbitrap Velos (Velos) reanalysis data. Further transcriptomics analysis showed that these uniquely identified proteins by HF also had lower expression at the mRNA level. Of the 81 total identified MPs, 74 and 39 proteins were listed as MPs in HF and Velos data sets, respectively. Among the above MPs, 47 proteins (43 neXtProt PE2 and 4 PE3) were ranked as confirmed MPs after verifying with the stringent spectra match and isobaric and single amino acid variants filtering. Functional investigation of these 47 MPs revealed that 11 MPs were testis-specific proteins and 7 MPs were involved in spermatogenesis process. Therefore, we concluded that higher scanning speed and resolution of HF might be factors for improving the low-abundance MP identification in future C-HPP studies. All mass-spectrometry data from this study have been deposited in the ProteomeXchange with identifier PXD004092. [more...]
###### ABSTRACT: A membrane protein enrichment method composed of ultracentrifugation and detergent-based extraction was first developed based on MCF7 cell line. Then, in-solution digestion with detergents and eFASP (enhanced filter-aided sample preparation) with detergents were compared with the time-consuming in-gel digestion method. Among the in-solution digestion strategies, the eFASP combined with RapiGest identified 1125 membrane proteins. Similarly, the eFASP combined with sodium deoxycholate identified 1069 membrane proteins; however, the in-gel digestion characterized 1091 membrane proteins. Totally, with the five digestion methods, 1390 membrane proteins were identified with >= 1 unique peptides, among which 1345 membrane proteins contain unique peptides >= 2. This is the biggest membrane protein data set for MCF7 cell line and even breast cancer tissue samples. Interestingly, we identified 13 unique peptides belonging to 8 missing proteins (MPs). Finally, eight unique peptides were validated by synthesized peptides. Two proteins were confirmed as MPs, and another two proteins were candidate detections. [more...]
###### ABSTRACT: Core-fucosylation (CF) plays important roles in regulating biological processes in eukaryotes. Alterations of CF-glycosites or CF-glycans in bodily fluids correlate with cancer development. Therefore, global research of protein core-fucosylation with an emphasis on proteomics can explain pathogenic and metastasis mechanisms and aid in the discovery of new potential biomarkers for early clinical diagnosis. In this study, a precise and high throughput method was established to identify CF-glycosites from human plasma. We found that alternating HCD and ETD fragmentation (AHEF) can provide a complementary method to discover CF-glycosites. A total of 407 CF-glycosites among 267 CF-glycoproteins were identified in a mixed sample made from six normal human plasma samples. Among the 407 CF-glycosites, 10 are without the N-X-S/T/C consensus motif, representing 2.5% of the total number identified. All identified CF-glycopeptide results from HCD and ETD fragmentation were filtered with neutral loss peaks and characteristic ions of GlcNAc from HCD spectra, which assured the credibility of the results. This study provides an effective method for CF-glycosites identification and a valuable biomarker reference for clinical research. Biological significance: CF-glycosytion plays an important role in regulating biological processes in eukaryotes. Alterations of the glycosites and attached CF-glycans are frequently observed in various types of cancers. Thus, it is crucial to develop a strategy for mapping human CF-glycosylation. Here, we developed a complementary method via alternating HCD and ETD fragmentation (AHEF) to analyze CF-glycoproteins. This strategy reveals an excellent complementarity of HCD and ETD in the analysis of CF-glycoproteins, and provides a valuable biomarker reference for clinical research. Published by Elsevier B.V. [more...]
###### ABSTRACT: Over the past decades, protein O-GlcNAcylation has been found to play a fundamental role in cell cycle control, metabolism, transcriptional regulation, and cellular signaling. Nevertheless, quantitative approaches to determine in vivo GlcNAc dynamics at a large-scale are still not readily available. Here, we have developed an approach to isotopically label O-GlcNAc modifications on proteins by producing C-13-labeled UDP-GlcNAc from C-13(6)-glucose via the hexosamine biosynthetic pathway. This metabolic labeling was combined with quantitative mass spectrometry-based proteomics to determine protein O-GlcNAcylation turnover rates. First, an efficient enrichment method for O-GlcNAc peptides was developed with the use of phenylboronic acid solid-phase extraction and anhydrous DMSO. The near stoichiometry reaction between the diol of GlcNAc and boronic acid dramatically improved the enrichment efficiency. Additionally, our kinetic model for turnover rates integrates both metabolomic and proteomic data, which increase the accuracy of the turnover rate estimation. Other advantages of this metabolic labeling method include in vivo application, direct labeling of the O-GlcNAc sites and higher confidence for site identification. Concentrating only on nuclear localized GlcNAc modified proteins, we are able to identify 105 O-GlcNAc peptides on 42 proteins and determine turnover rates of 20 O-GlcNAc peptides from 14 proteins extracted from HeLa nuclei. In general, we found O-GlcNAcylation turnover rates are slower than those published for phosphorylation or acetylation. Nevertheless, the rates widely varied depending on both the protein and the residue modified. We believe this methodology can be broadly applied to reveal turnovers/dynamics of protein O-GlcNAcylation from different biological states and will provide more information on the significance of O-GlcNAcylation, enabling us to study the temporal dynamics of this critical modification for the first time. [more...]
###### ABSTRACT: O-linked beta-N-acetylglucosamine (O-GlcNAc) is emerging as an essential protein post-translational modification in a range of organisms. It is involved in various cellular processes such as nutrient sensing, protein degradation, gene expression, and is associated with many human diseases. Despite its importance, identifying O-GlcNAcylated proteins is a major challenge in proteomics. Here, using peracetylated N-azidoacetylglucosamine (Ac(4)GlcNAz) as a bioorthogonal chemical handle, we described a gel-based mass spectrometry method for the identification of proteins with O-GlcNAc modification in A549 cells. In addition, we made a labeling efficiency comparison between two modes of azide-alkyne bioorthogonal reactions in click chemistry: copper-catalyzed azide-alkyne cycloaddition (CuAAC) with Biotin-Diazo-Alkyne and stain-promoted azide-alkyne cycloaddition (SPAAC) with Biotin-DIBO-Alkyne. After conjugation with click chemistry in vitro and enrichment via streptavidin resin, proteins with O-GlcNAc modification were separated by SDS-PAGE and identified with mass spectrometry. Proteomics data analysis revealed that 229 putative O-GlcNAc modified proteins were identified with Biotin-Diazo-Alkyne conjugated sample and 188 proteins with Biotin-DIBO-Alkyne conjugated sample, among which 114 proteins were overlapping. Interestingly, 74 proteins identified from Biotin-Diazo-Alkyne conjugates and 46 verified proteins from Biotin-DIBO-Alkyne conjugates could be found in the O-GlcNAc modified proteins database dbOGAP (http://cbsb.lombardi.georgetown.edu/hulab/OGAP.html). These results suggested that CuAAC with Biotin-Diazo-Alkyne represented a more powerful method in proteomics with higher protein identification and better accuracy compared to SPAAC. The proteomics credibility was also confirmed by the molecular function and cell component gene ontology (GO). Together, the method we reported here combining metabolic labeling, click chemistry, affinity-based enrichment, SDS-PAGE separation, and mass spectrometry, would be adaptable for other post-translationally modified proteins in proteomics. [more...]
