Proteomics 2013, 13, 341–354 341DOI 10.1002/pmic.201200149
REVIEW
Glycobioinformatics: Current strategies and tools
for data mining in MS-based glycoproteomics
Feng Li1,2
, Olga V. Glinskii1,3
and Vladislav V. Glinsky1,2
1
Research Service, Harry S. Truman Memorial Veterans Hospital, Columbia, MO, USA
2
Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO, USA
3
Department of Medical Pharmacology and Physiology, University of Missouri, Columbia, MO, USA
Glycobioinformatics is a rapidly developing ﬁeld providing a vital support for MS-based glycoproteomics
research. Recent advances in MS greatly increased technological capabilities for
high throughput glycopeptide analysis. However, interpreting MS output, in terms of identifying
glycan structures, attachment sites and glycosylation linkages still presents multiple
challenges. Here, we discuss current strategies used in MS-based glycoproteomics and bioinformatics
tools available for MS-based glycopeptide and glycan analysis. We also provide a
brief overview of recent efforts in glycobioinformatics such as the new initiative UniCarbKB
directed toward developing more comprehensive and uniﬁed glycobioinformatics platforms.
With regards to glycobioinformatics tools and applications, we do not express our personal
preferences or biases, but rather focus on providing a concise description of main features
and functionalities of each application with the goal of assisting readers in making their own
choices and identifying and locating glycobioinformatics tools most suitable for achieving their
experimental objectives.
Keywords:
Bioinformatics / Glycan analysis / Glycomics / Glycoproteomics / MS
Received: April 10, 2012
Revised: October 6, 2012
Accepted: November 6, 2012
PTMs of proteins are the primary means used by prokaryotic
and eukaryotic cells to regulate the activity of key proteins
[1–3]. PTMs may involve both chemical alterations of
protein side chains and a cleavage of the main chain peptide
bonds. The dynamic modiﬁcation and diversiﬁcation enabled
by PTMs greatly increases molecular variants of cellular proteins
by an estimated one or two orders of magnitude over
the number encoded by the genome [4]. Therefore, characterizing
structures, sites, and dynamics of the protein PTMs
is essential for understanding their diversity, structure, and
function in the “-omics” age [5].
Correspondence: Dr. Vladislav V. Glinsky, M263 Medical Sciences
Bldg., Department of Pathology and Anatomical Sciences, University
of Missouri, Columbia, MO 65212, USA
E-mail: glinskiivl@missouri.edu
Fax: +1-573-814-6551
Abbreviations: CFG, consortium for functional glycomics; ECD,
electron capture dissociation; ETD, electron transfer dissociation;
IGAP, intact glycopeptide analysis pipeline
1 Glycosylation: Most abundant and
structurally diverse posttranslational
modiﬁcation
Covalent attachment of sugars or glycans to proteins or
lipids is deﬁned as glycosylation. Glycosylation represents
not only the most abundant protein PTM, but also by far
the most structurally diverse one [6]. Although monosaccharides
do not constitute overly complicated chemical
group, 13 different monosaccharides and eight different
amino acids involved in glycoprotein linkages results
in a total of at least 41 different glycan-protein bonds.
Comprehensive information about various monosaccharide
residues can be found in the MonosaccharideDB database
(http://www.monosaccharidedb.org), which is being developed
as part of the EUROCarbDB and GLYCOSCIENCES.de
projects [7]. Furthermore, due to additional modiﬁcations on
terminal glycans of oligosaccharide branches such as fucosylation,
sulfation, acetylation, and/or sialylation (about 50
different sialic acids are known), the molecular diversity of
Colour Online: See the article online to view Figs. 1–3 in colour.
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
342 F. Li et al. Proteomics 2013, 13, 341–354
glycosylation rapidly increases exponentially and becomes incredibly
complicated [8–10]. There are estimated 250–500
“glycogenes” in human genome (about 1–2% of the total
genome), which are directly involved in glycan assembly [11].
Glycogenes are involved in the attachment and subsequent
processing of the sugar portion of glycoconjugates and can be
divided into several categories including glycosyltransferases,
glycolytic enzymes, sugar nucleotide synthetases, and sugar
nucleotide transporters, while glycosyltransferases comprise
one of the largest and most diverse groups of enzymes with
over 180 glycogenes being cloned and characterized to date
[12,13].
Consequently, it has been estimated that more than 50%
of the entire human proteome is covalently modiﬁed with glycans,
although latest reports suggest that less than one-ﬁfth of
proteins are glycosylated [14,15]. In addition to a diversity of
the attached glycan structures, macroheterogeneity (variable
occupancy of several glycosylation sites), and microheterogeneity
(variable degree of type, trimming, and elongation of
the glycan attached to a single glycosylation site) contribute
further to the complexity of protein glycosylation [16]. The
macroheterogeneity and microheterogeneity of the glycosylation
are controlled by multiple factors. For example, within
a particular cell, not only the primary and secondary structure
of the protein affects the location of the glycosylation
sites and the level of their occupancy, but also the tertiary and
quaternary structures inﬂuence the subsequent processing
of the attached glycans [17]. Recent precision mapping of the
in vivo N-glycoproteome clearly reveals rigid topological and
sequence constraints of N-linked glycosylation of glycoproteins
[18]. Thus, glycosylation is one of the most common
PTMs existing in nature, which is characterized by the extreme
structural diversity. Glycosylation plays essential role
in many biological processes such as cell recognition, cell–cell
communication, signaling, embryo development, immunity,
etc. [19,20]. Therefore, identifying the glycosylation sites, their
occupancy, and the attached glycan structures is crucial for
proper understanding of the glycoprotein biological function.
Yet tools for systematic identiﬁcation and analysis of protein
glycosylation are greatly underdeveloped.
2 Glycoproteomics: Systematic
identiﬁcation of glycosylation
on proteome level
Based on the concept of proteome, the complete subset of
glycosylated proteins (glycoproteins) generated by a cell or
an organism under speciﬁc conditions, is deﬁned as “glycoproteome.”
Therefore, the term “glycoproteomics” refers to
studies that aim to deﬁne or quantify the complete set of proteins
containing glycosylation modiﬁcations in a cell, tissue,
or organism [19]. Rather than separating glycan and protein
analyses into glycomics and proteomics, major tasks of glycoproteomics
include not only identifying the protein main
chains modiﬁed by glycosylation, but also assigning and/or
mapping the structures and sites of these modiﬁcations [21].
Compared with identifying protein main chains, assigning
or mapping glycan structures and sites in glycoproteomics
is much more challenging. First, the biosynthesis of glycans
is a nontemplate driven process involving coordinated expression
of several glycosyltransferases, some of which have
additional tissue-speciﬁc isoforms [22]. Consequently, the inherent
heterogeneity and large diversity of glycan structures
cannot be predicted from any reference database due to a complex
biosynthesis and lack of proofreading machinery [23],
although N-linked glycoproteins do have quality control
“proofreading,” whereby misfolded proteins or those proteins
attached with Glc1–3Man9NAc2 are typically recycled with the
help of calreticulin and calnexin in the ER [24, 25]. Second,
based on the backbone chemical structure of glycosylation,
glycans can be classiﬁed broadly as linear and branched sugars.
Due to the branched nature of glycosylation, the chemical
heterogeneity, and diversity of glycans challenge the development
of analytical techniques such as MS to accurately deﬁne
their chemical structures. Some glycosylation PTMs could be
very difﬁcult (if not impossible) to differentiate with MS technology
alone, when individual monosaccharides having the
same masses are involved, or the differences between glycans
are caused not by the attachment of distinct chemical groups
but by the different glycan-protein linkages or glycosidic linkages
between individual monosaccharides are employed such
as in core 5 O-glycans GalNAc␣1–3GalNAc␣-Ser/Thr and
core 7 O-glycans GalNAc␣1–6GalNAc␣-Ser/Thr, or in core
1 O-glycans Gal␤1–3GalNAc␣Ser/Thr, and core 8 O-glycans
Gal␣1–3GalNAc␣Ser/Thr. In addition, the presence of heterogeneous
mixtures of different chemical structures within
glycans always need to be considered [26].
Based on the mode of attachment, two types of protein glycosylation
with important biological functions are most common
in nature: “N-linked” glycosylation (N-glycosylation),
where glycans are attached to asparagines in a consensus
sequence N-X-S/T (where X can be any amino acid except proline)
via an N-acetylglucosamine (N-GlcNAc) residue; and “Olinked”
glycosylation (O-glycosylation), where glycans are attached
to serine or threonine through acyl linkages. N-Linked
glycosylation is initiated via en bloc transfer of a tetradecasaccharide
(Glc3Man9GlcNAc2) from the lipid-linked oligosaccharide
Glc3Man9GlcNAc2-PP-dolichol to speciﬁc asparagine
residues of nascent proteins in the ER by oligosaccharyltransferases
[27]. Subsequent trimming or further elaboration to
form one of the three standard types of N-glycan cores (high
mannose, complex, or hybrid) is mediated by a series of
glycosidases and glycosyltransferases in the Golgi apparatus
based on this tetradecasaccharide [28]. O-linked glycans are
formed by a series of glycosyltransferase-catalyzed steps that
begin with the transfer of the ﬁrst glycan from UDP-GalNAc
directly to either a Serine or Threonine residue by one of a
large number of polypeptide GalNAc-transferases found in
the Golgi [29]. Different O-glycan core structures are synthesized
by subsequent glycan addition based on the ﬁrst glycan
O-GalNAc (there are eight O-GalNAc-based glycan core
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2013, 13, 341–354 343
structures, most of which may be further glycosylated [30].
N- and O-glycosylation classiﬁcation leads to further subdivision
of glycoproteomics into N-linked and O-linked glycoproteomics
focusing on these two important biologically
functional PTMs.
Of note, it is important to remember that O-GlcNAc modiﬁcation
or O-GlcNAcylation is in many ways distinct from
“classical” N- and O-linked glycosylation. O-GlcNAcylation is
found mostly within the cytoplasm or nucleoplasm (with a
few exceptions) and it is not elongated or further processed
into a complex oligosaccharide [31]. O-GlcNAcylation is similar
to protein phosphorylation, O-GlcNAc can be attached by
O–GlcNAc transferase or removed by O–GlcNAcase, and this
process is a dynamical response to changes in the cellular environment
triggered by stress, hormones, or nutrients [32].
O-GlcNAcylation is very labile upon ionization in a mass spectrometer,
and much of O-GlcNAc is often lost at the source,
which makes O-GlcNAcylation very difﬁcult to detect and
map corresponding attachment sites [33]. In addition to Nand
O-linked glycosylation, proteins can cross-link with reducing
sugars and form advanced glycation end-products in
a process called glycation or nonenzymatic glycosylation [34].
Strategies and tools for analyzing other known but less common
forms of glycosylation such as C- and S-linked as well
as other will not be discussed here.
3 MS technology: Method of choice
in glycoproteomics
The general workﬂow in glycoproteomics consists of glycoprotein
or glycopeptide enrichment, multidimensional protein
or peptide separation, tandem mass spectrometric analysis,
and bioinformatic data interpretation [35,36]. Based on
the general pipeline of glycoproteomics, two complementary
strategies (the “bottom-up” and the “top-down”) are currently
widely used to identify proteins in glycoproteomics (Fig. 1)
and each strategy has its own strengths and weaknesses (although
sometimes a combination of these two complementary
strategies being employed) [37–39]. In the “bottom-up”
approach, the peptides resulting from proteolytic digestion
or chemical cleavage of proteins are used for identiﬁcation of
peptide sequences and PTMs. Because the “bottom-up” strategy
is based on peptide-centric approach, some PTMs may
be ultimately unobtainable in MS as only a portion of the entire
protein is generally detected. Further, PTMs and proteins
resulting from alternative splicing as well as the same enzymatic
peptide sequences from highly related protein families
are difﬁcult to be probed completely by the “bottom-up”
approach. Nonetheless, “bottom-up” strategy plays a dominant
role in the current glycoproteomics research, because it
is more suitable for automation and high sample throughput
making it easier to use employing current MS technologies
[40,41].
The advantage of the “top-down” approach is that sequence
information and PTM data are acquired from intact proteins
allowing for identiﬁcation of protein isoforms, alternatively
spliced isoforms, and PTMs once full-length protein
sequence coverage has been demonstrated. The most significant
recent advancements in the “top-down” approach are
electron capture dissociation (ECD)and electron transfer dissociation
(ETD) MS techniques, which are discussed in more
detail later in this chapter. However, the “top-down” strategy
currently still suffers from limited sensitivity and throughput.
No matter whether “bottom-up,” “top-down,” or a combination
of the two approaches is used, MS is the method of
choice for rapidly identifying the protein main chains and
the structures and the sites of glycan attachment [42]. To elucidate
peptide sequences, glycan structures, and the sites of
their attachment on intact glycopeptides, they need to be isolated
and dissociated (typically by a gas phase reaction) into
smaller product ions in a MS instrument so as the product
ions could be subsequently isolated and subjected to further
dissociation reaction and mass analysis [43]. Thus, the control
of the dissociation process from parent ions into product
ions is the key aspect of MS/MS technique in glycopeptide
analysis. Currently, CID, ECD, ETD, and infrared multi photon
dissociation are the methods most widely employed to
identify intact glycopeptides using MS/MS dissociation techniques
[44]. CID, also known as collision activated dissociation,
is a common fragmentation method utilizing a vibrational
activation fragmentation process that breaks the weak
bonds within peptides. However, if used to analyze O-linked
glycopeptides, CID will normally return no or very low intensity
information on the sites of O-linked modiﬁcations.
Because O-linkages between glycans and serines or threonines
are considerably more labile than peptide bonds, glycan
residues will be generally eliminated during CID before
peptide fragmentation [5,45]. On the other hand, N-linkages
are relatively more stable than O-linkages. Consequently, Nlinked
modiﬁcations have been successfully analyzed using
CID, including identiﬁcation of the peptide backbones and
the sites of N-linked glycosylation [46].
To overcome these limitations, ECD and ETD dissociation
techniques were developed to complement CID in MS analysis
of O-glycopeptides and/or N-glycopeptides [47,48]. ECD
is a “mild” fragmentation technique based on partial recombination
of multiply protonated polypeptide molecules with
low-energy electrons (<0.2 eV), so the intact glycopeptides
can be recorded by mass analyzer [49,50]. However, the need
of FT-ICR-MS greatly limited ECD use, though it has been
reported on an ion trap MS as well [51]. ETD is a recently
developed dissociation technique that shows promising alternative
fragmentation pathways, which fragments peptides by
transferring an electron from a radical anion (e.g. ﬂuoranthene)
to a protonated peptide [52]. Analogous to ECD, ETD
results preferentially in the cleavages of the N–C␣ bonds
of the peptide backbone to generate homolog series of c’and
z’- type fragment ions without loss of the glycan moiety
(Fig. 2). Compared to CID, ETD preserves glycosylation
PTMs, which are often removed by CID, while sequence information
being obtained for peptide identiﬁcation [52, 53].
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
344 F. Li et al. Proteomics 2013, 13, 341–354
Figure 1. “Top-down” and
“bottom-up” workﬂows in glyco-
proteomics.
Thus, glycosylation can be identiﬁed through the tandem
spectra analysis of mass shift (the m/z increase due to the attached
glycans) as glycan structures are still attached on the c’or
z’ fragment ions after dissociation by ETD. For example, as
shown in Fig. 2, ETD spectra have been used successfully to
deduce the glycosylation sites based on the mass shift in comparison
to the CID spectra from the same precursor ions [54].
Consequently, integrated CID and ETD tandem mass spectra
in data-dependent LC-MS/MS are expected to play an increasingly
important role in intact glycopeptide analysis [55].
In addition to MS, other approaches including HPLC,
NMR, chemical reactions, radioactive labeling, as well as detection
with speciﬁc lectins or antibodies have been adopted
to probe the monosaccharide composition or determine the
structure of glycans released from glycoproteins. As these
approaches, which are often labor intensive and time consuming,
have been recently reviewed elsewhere [56,57] they
will not be covered in this review.
4 Glycoinformatics: Data interpretation
strategies and tools in glycoproteomics
The next important challenge in MS-based glycoproteomics
is efﬁcient interpretation of tandem mass spectra data generated
from intact glycopeptide analysis using adequate bioinformatics
tools. However, analyzing large amounts of data
generated in high-throughput MS-based glycoproteomics experiments
constitutes presently a major bottleneck in glycoproteomics
research.
In general, two main strategies (“two-step” and “one-step”)
are most commonly utilized in glycoproteomics to ascertain
glycosylation of intact glycopeptides. Two-step strategy
involves stripping glycans from intact glycopeptides using
enzymatic or chemical methods including widely used Nglycanase
enzymes (e.g. PNGase F and PNGase A) to release
N-linked glycans, or alternative chemical cleavage (e.g. ␤elimination
or hydrazine) to release O- or N-linked glycans
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2013, 13, 341–354 345
Figure 2. CID (A) and ETD (B)
spectra of the precursor ion
m/z 690.8 (3+), corresponding to
glycopeptide 574–587 of the fulllength
APP695, showing amino
acid Thr 576 occupied with the
indicated Core 1 type trisaccharide.
The ETD spectrum was
obtained using data-dependent
acquisition and the activation
energy was 0.10 V. Color code:
yellow, N-acetyl galactosamine;
blue, galactose; purple, sialic
acid. The fragment ions relevant
for determination of the glycosylation
site are indicated with
black boxes. Adapted with permission
from [54].
and identifying deglycosylated peptides and glycans separately
by MS [58]. Releasing an attached N-glycan moiety by
PNGase F or A results in the conversion of Asn to Asp on
the attachment site, which in turn causes a mass shift for
each N-glycosylation site on the mass spectrum. Thus, when
a deglycosylated protein is further digested with trypsin, the
peptides that were bound to the glycan moiety will be about
1 Da heavier than the expected theoretical mass. After subjecting
these peptides to MS/MS, each peptide that possesses
Asp (instead of Asn) is identiﬁed as formerly attached to the
glycan moiety [59]. A similar approach involves glycan release
with PNGase F in the presence of H2
18
O. The deglycosylated
Asn will be labeled with 18
O and its mass altered by about
3 Da (1 Da for the Asn-to-Asp conversion and 2 Da contributed
by 18
O) could be followed [60,61]. However, the subsequent
bioinformatics (glycoinformatics) analysis in this approach
may require signiﬁcant human interference with data
interpretation for assigning glycan structures as information
about the sites of glycan attachment cannot be inferred from
MS results automatically.
The one-step strategy is to input intact glycopeptides into
MS instrument and resolve peptide backbones and/or glycan
structures using different dissociation modes such as CID
or CID combined with ECD or ETD. In this approach, both
peptide sequences could be identiﬁed and the sites and structures
of glycosylation modiﬁcations assigned through the
analysis of MS data using database search tools or integrated
analysis software platforms. Some of the features of bioinformatics
tools utilized in one-step and two-step glycoproteomics
experiments to identify glycopeptide sequences and assign
the sites and structures of attached glycans are summarized
below.
5 Bioinformatics tools used in one-step
strategy approach
IGAP (intact glycopeptide analysis pipeline) is an automated
data analysis pipeline that can be utilized in one-step
strategy glycoproteomics analysis for identiﬁcation of
N-glycopeptides and glycan attachment sites (Table 1). In
IGAP, the raw data of intact glycopeptide MS/MS acquired
without stripping down the attached glycans are used to mine
for possible sites and structures of attached glycans. The
raw ﬁle generated by Xcalibur is extracted to mzXML format
ﬁle by ReAdw, which is a tool used for converting Thermo
Scientiﬁc RAW formats into the open format mzXML and
depends on Windows-only vendor libraries from Thermo.
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
346 F. Li et al. Proteomics 2013, 13, 341–354
Table 1. Partial list of software tools currently used in glycopeptide analysis utilising a one-step strategy
Software Function Referencea) Availability
IGAP Analysis of N-linked glycopeptides [65]b) Freec)
Protein prospector General tandem MS search engine featured
with partially predeﬁned glycosylation
http://prospector.ucsf.
edu/prospector/mshome.
htm
Access through the web
GlyDB (sequest) Analysis of N-linked glycopeptides [72] NAD
Peptoonist Analysis of N-linked glycopeptides with
single-MS and tandem MS
[73] NAD
N-glycopeptide library Analysis of N-glycopeptides with custorm
generated human N-glycopeptides library
[76] NAD
GP ﬁnder (glycox) Analysis of N-, or O-linked glycopeptides
based on tandem MS with diagnostic ions
[74,75] NAD
GlycoPep ID Analysis of N-linked glycosylation based on
target protein and CID spectra
http://hexose.chem.ku.
edu/predictiontable. php
Access through the web
a) If the software, program, or web service is an open access resource, the web address is provided, otherwise the paper reference numbers
are provided.
b) This software can be downloaded through supplementary program from http://www.nature.com/nprot/journal/v6/n3/full/nprot.
2010.176.html#/supplementary-information.
c) This software is free for academic use. For commercial use, the developers should be contacted.
NAD, not accessible directly, the authors of the papers should be contacted.
However, ReAdw was last released in 2009 and is no longer
supported. Although, the instructions on how to install
and run ReAdw could be found at the following website
(http://tools.proteomecenter.org/wiki/index.php?title= Software:ReAdW),
the raw format ﬁles could be also converted
to the mzXML format by other format converters such as
MSConvert [62]. Then the mzXML format ﬁles are used to
search against database with X!Tandem open source database
search engine (other search engines such as MASCOT or
SEQUEST may also be used), and the identiﬁed proteins
are saved as local Reﬁnement protein database. The search
results are validated with PeptideProphet and ProteinProphet
based on the use of expectation-maximization algorithm
to derive a mixture model of correct and incorrect peptide
Figure 3. Bioinformatics workﬂow
in IGAP. (A) The reﬁnement
protein database is created from
the protein IDs conﬁdently identiﬁed
with the LC-ESI-MS/MS
data. Each ID is recorded on
a separate line. This database
is experiment speciﬁc. Protein
identiﬁcation may be performed
using other search engines. (B)
The N-glycan database is constructed
with the data from
consortium for functional glycomics
(CFG) glycan structures
database. This database only
needs to be updated when
new entries are found at CFG.
(C) MALDI-DIT MS/MS spectrum
is searched by IGAP,
which generates the top 20 glycan
composition results in the
tab-delimited ﬁle “Output.txt”
and the matched peaks annotation
in “Output.txt.annotation.”
Manual validation of the results
is performed. DITViewer
provides visual access to the
acquired spectra. Reproduced
with permission from [65].
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2013, 13, 341–354 347
identiﬁcations from the data [63]. Next, IGAP utilities are
used to construct local N-Glycan database based on CFG
(consortium for functional glycomics) glycan structure
database, which offers detailed structural and chemical information
on thousands of glycans, including both synthetic
glycans and glycans isolated from biological sources [64]. For
example, as of May 2012, searching the glycan structures
database with the N-linked “core” substructure containing
three mannose and two N-acetylglucosamine residues
returns 4605 N-linked glycan records [http://www. func-
tionalglycomics.org/glycomics/molecule/jsp/carbohydrate/
structure/searchThisStructure.jsp?lincode=Ma3%28Ma6%
29Mb4GNb4GN]. The acquired glycopeptide MS/MS spectra
are processed through IGAP to compute the possible
theoretical glycopeptides based on protein sequences from
local Reﬁnement protein database and local CFG glycan
database. IGAP in silico digests the theoretical tryptic
peptides from the Reﬁnement protein database with 0, 1
and 2 missed cleavages, and only peptides containing a consensus
sequence N-X-S/T (X except proline) are considered
as potential N-glycosylated peptides. IGAP considers the
fragmentation of the glycosidic bonds (with Y ions and X0–2
of N-glycans on the intact potential N-linked glycopeptides)
and the fragmentation of the peptide bonds (with both b and
y series ions of the potential N-linked glycopeptides along
with an attached GlcNAc). Then these two kinds of fragment
ions are merged together to generate the theoretical MS/MS
spectrum. Experimental spectrum is cumulative intensity
normalized, the measured m/z range is divided into 100
Th (Thomson) regions, the eight most intensive peaks in
each Th region is extracted to form the intensity spectrum,
which is used to match against the theoretical spectrum.
In this process, A-Score algorithm is used to score the
matching, and the collected matches are ranked by glycan
moiety score, glycopeptide probability of random matches,
and glycopeptide score. The top 20 glycan composition
results are retained and further narrowed down based on
other available information. Thus, without striping down
the attached glycans, data generated by MS/MS could be
used to identify intact glycopeptide sequences and the sites
and structures of the attached glycans based on in-depth
tandem mass spectra data mining technologies. Typical
bioinformatics workﬂow in IGAP is depicted in Fig. 3.
IGAP program can be obtained from the Nature website
(http://www.nature.com/nprot/journal/v6/n3/full/nprot.
2010.176.html#supplementary-information) from the supplementary
ﬁles of the following article [65]. However,
currently IGAP can be used for N-glycoproteomics analysis
only.
Protein prospector is also a tandem mass spectra database
search software used for identifying the glycosylation sites
in one-step strategy glycoproteomics analysis. It has been
optimized and successfully utilized for O-glycoproteomics
[66,67]. MS data acquired using CID combined with ECD or
ETD have been used to identify O-GlcNAc modiﬁcation sites
on native peptides with concomitant identiﬁcation of peptides
sequences by the protein prospector database search
engine [5, 68, 69]. Compared with ECD, ETD markedly increases
the number of O-GlcNAc modiﬁcation sites determined
in a single experiment, which could be related to a
higher charge and lower m/z components in the ETD fragmentation
process [70]. Unlike many of the modiﬁcation site
scoring tools implemented in other database search engines,
the Batch-Tag scoring used in protein prospector is applicable
to all modiﬁcations, and the scoring algorithm named SLIP
(site localization in peptide) has been designed to calculate
and compare probability and expectation values for the same
peptide with different site assignments [71].
GlyDB annotates tandem mass spectra of N-linked glycopeptides
using an in-house custom-built linearized glycan
structure database and utilizes a general peptide database
search engine Sequest to assign experimental tandem mass
spectra to individual glycoforms [72]. Some other in-house
developed software tools or scripts like Peptoonist, GlycoPep
ID, GlycoX, or the upgrade version of GP ﬁnder can also
be used for interpreting glycopeptide MS data in one-step
strategy glycoproteomics [73–77].
In addition to using fully automated software tools like
IGAP or Protein Prospector, identiﬁcation of many N-, and Oglycosylation
sites was reported based on manual analysis of
tandem mass spectra or using in-house developed scripts. For
example, in the analysis of O-glycosylation, diagnostic glycan
oxoniums such as m/z 163 (Hex+), m/z 292 (NeuAc+), m/z
204 (HexNAc+), or m/z 366 (HexHexNAc+) will be formed
in the tandem mass spectra generated from B-type and Ytype
cleavages of glycosidic bonds [78,79]. By tracking these
diagnostic glycan oxoniums in the MS2, the glycosidic linkage
fragmentation patterns of MS2 and MS3 spectra can be
used to analyze and determine the sites and structures of the
attached glycans. However, this kind of manual interpretation
is time consuming and potential errors including false
positives and false negatives cannot be accounted for due to
the lack of statistical validation process.
6 Bioinformatics tools used in two-step
strategy approach
Two-step strategy is another glycosylation analysis strategy
commonly used in glycoproteomics research. In this approach,
glycans are stripped down from intact glycopeptides
and then “deglycosylated” peptides and recovered glycans
identiﬁed separately by MS. In this approach, glycopeptide
backbone sequences and the structures of the attached glycans
could be identiﬁed, but information regarding the sites
of glycan attachment cannot be acquired [80]. The ﬁrst step or
the process of identifying “deglycosylated” peptide sequences
is the same that is used in general proteomics research, and
the software tools or algorithms for analyzing the MS/MS
data have been well reviewed [81–84]. Mascot and Sequest are
currently the two database search engines most commonly
used for identifying protein sequences with MS, which are
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
348 F. Li et al. Proteomics 2013, 13, 341–354
Table 2. General database search engines used in MS/MS
Software Function Reference Availability
MASCOT MS data search engine http://www.matrixscience.com/ Commercial
OMSSA MS data search engine http://pubchem.ncbi.nlm.nih.gov/omssa/ Open source free
Protein prospector MS data search engine http://prospector.ucsf.edu/prospector/mshome.htm Access through the web
SEQUEST MS data search engine http://ﬁelds.scripps.edu/sequest/ Commercial
Spectrum mill MS data search engine http://www.chem.agilent.com/ Commercial
X!Tandem MS data search engine http://www.thegpm.org/tandem/ Open source free
publicly available (Table 2). Other open source software tools
like X!tandem or OMSSA are also available and could be
downloaded and installed on local computers offering much
ﬂexibility in conﬁguring the search parameters and databases.
The second step in the two-step strategy glycopeptide analysis
is to identify the structures of the separated glycans.
Although automatic interpretation of glycan MS data is still a
challenging task, there are several software applications available
to deduce probable carbohydrate compositions from MS
data [85, 86] generated from a single round mass analysis
or a tandem mass analysis (MS/MS or higher order (MSn
)).
Consequently, these glycoinformatics tools can be generally
divided into single-MS glycan analysis software and/or tandem
MS glycan analysis software (Table 3).
GlycoMod is one of the ﬁrst such tools developed to compute
all possible glycan compositions from experimentally
derived mass spectra data by comparing the actual mass of
the glycan to a list of precomputed masses of glycan compositions
thus allowing for the composition of a glycan attached
to a peptide to be computed if the sequence or the mass of
the peptide is known [87]. Cartoonist is another program designed
for computing glycans based on data generated from a
single round of mass analysis, which is used by CFG to proﬁle
glycans from various organisms and tissues [88,89]. Glypeps
is the software that allows unraveling information encrypted
in the deltamass value of accurate peptide masses. When the
deltamass value is indicative of a glycopeptide, Glypeps could
be used to compute a list of proposed N-glycan structures if
the sequence of the peptide is known [90]. GlycoSpectrumScan
is a web-based tool to identify the glycoheterogeneity on
a peptide from mass spectra data. It uses single-MS data and
two experimental datasets, including oligosaccharide compositions
of the N- and/or O-linked glycans present in the
sample and in silico derived peptide masses of proteolytically
digested proteins, to identify glycopeptides and determine the
relative distribution of N- and O-glycoforms at each site [91].
The structures of the attached glycans could be successfully
identiﬁed with MS/MS as well. This also requires the
use of advanced data interpretation tools to decipher complicated
glycan structures. GlycoFragment and GlycoSearchMS
are two web tools available to compute all theoretically possible
fragments of complex carbohydrates based on MS/MS
data. GlycoFragment computes all theoretically possible MSrelevant
fragments of oligosaccharides as deﬁned by the extended
IUPAC nomenclature, while GlycoSearchMS takes
the experimental mass spectra peak values as an input and
searches for matches with the calculated fragments of all
structures contained in the SweetDB database [92]. GlycosidIQ
interprets oligosaccharide mass spectra based on
matching experimental data with theoretically fragmented
oligosaccharides generated from the database GlycoSuiteDB
[93]. GlycoWorkbench is a tool developed by the EUROCarbDB
initiative, which is designed to provide support for
the routine interpretation of MS data. It evaluates a set of
structures proposed by the user via matching the list of peaks
derived from the tandem mass spectra against the corresponding
theoretical list of fragment masses [7,94]. Glyquest
determines asparagine-linked glycan (N-glycan) structures
based on tandem mass spectra of glycopeptides using a builtin
N-glycan structure database and an integrated database
search engine [95]. Glycominer identiﬁes N-glycans from tandem
mass spectra based on an empirical algorithm, which
determines the low mass oxonium ions, deduces oligosaccharide
losses from the protonated molecule, and identiﬁes
the mass of the peptide residue [96]. SimGlycanTM
is a desktop
tool designed to predict the glycan structures from MS/MS
spectra through using database searching and propriety scoring
algorithm against its own database of theoretical fragmentation
of over 9000 glycans. SimGlycanTm
can predict the
attached glycan structures if the mass of the peptide or the
peptide sequence is known [97].
In contrast to adopting a strategy of matching the oligosaccharide
mass spectra with databases of the theoretically fragmented
ones employed in bioinformatics tools mentioned
above, several software tools interpret tandem mass spectra
data based on de novo glycan sequencing. STAT is a webbased
tool for saccharide topology analysis (Table 4). It extracts
information from a set of MSn
spectra and computes
all possible structures, which are generated and evaluated
against the MSn
data so the list of possible structures is
assigned a rating based on the likelihood that it is the correct
sequence [98]. GlySpy is the prototype tool that implements
the OSCAR algorithms. It accepts user-selected MSn
ion fragment paths and applies logical constraints to produce
the full set of glycan structures that could yield the selected
ions [99]. StrOligo algorithm ﬁrst builds a relationship tree accounting
for each observed loss of a monosaccharide moiety
and then evaluates the agreement between the tree and each
proposed possible structure from combinations of adducts
and fragment ion types generated by MS/MS with a score.
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2013, 13, 341–354 349
Table 3. Partial list of software tools currently used in glycopeptide analysis utilising a two-step strategy
Software Function Referencea) Availability
Tools for assigning glycan structures based on single-MS
GlycoMod Predict glycan strutures based on single-MS http://web.expasy.org/
glycomod
Access through the web
Cartoonist Annotate permethylated N-glycans with
single-MS
[88,89] NAD
GlyPeps Annotate N-glycans with single-MS when
peptide sequences are known
http://www.glycosciences.
de/spec/glypeps/
Access through the web
GlycoSpectrumScan Web-based tool to identify the
glycoheterogeneity on a peptide from MS
data
http://www.
glycospectrumscan.org/
Access through the web
Tools for assigning glycan structures based on tandem MS
GlycoFragment Annotate glycan structures based on
theoretically possible fragments and
tandem MS
http://www.glycosciences.
de/tools/GlycoFragments/
Access through the web
GlycoSearchMS Annotate glycan structures based on
theoretically possible fragments from
SweetDB
http://www.glycosciences
.de/sweetdb/start.php?
action=form_ms_search
Access through the web
GlycosidIQ Annotate glycan structures based on
theoretically possible fragments from
GlycoSuite DB
http://glycosuitedb.expasy.
org/glycosuite/query
Access through the web
GlycoWorkbench Design for rapid drawing of glycan
structures and it can automatically match
to tandem MS data
http://www.glycoworkbench.
org/
Free
GlyQuest Determine N-glycan structures with built-in
database and search engine
[95] NAD
GlycoMiner Determine N-glycans from tandem MS data
with empirical algorithm
http://www.chemres.hu/ms/
glycominer/index.php
Free
a) If the software, program, or web service is an open access resource, the web address is provided, otherwise the paper reference numbers
are provided.
NAD, not accessible directly, the authors of the papers should be contacted.
Subsequently, the best combination is selected based on the
score and the relevant peaks are labeled in the experimental
mass spectrum using a modiﬁed nomenclature [100]. GlycoMaster
uses heuristic dynamic programing technique to
compute the best possible sequence structure among all possible
monosaccharide combinations [101]. GLYCH interprets
tandem mass spectra of oligosaccharides based on the appearance
pattern of cross-ring ions taking into account double
fragment ions as well [102]. Glyco-Peakﬁnder is a tool for a fast
annotation of glycan MS spectra, which provides the option of
detecting differently and/or multiply charged ions in one calculation
cycle accounting as well for modiﬁcations in the reducing
ends or within the sequences of oligosaccharides [103].
7 Other issues related to glycan analysis
pipeline and concluding remarks
The interpretation and implementation of bioinformatic solutions
for glycopeptide data generated by MS/MS is still a very
challenging task due to the overlap in both peptide and glycan
fragmentation. The evaluation of the matches between experimental
and theoretical spectra (peptide-spectrum matches,
PSMs) is vital for correctly identifying the PSMs with statistical
conﬁdence, especially in glycoproteomics based on the
bottom-up pipeline. However, to the best of our knowledge,
to date only two algorithms have been implemented in the
evaluation of PSMs in glycoproteomics. IGAP evaluates the
matching between experimental spectra against combined
theoretical spectra with the A-score algorithm, originated
from phosphorylation analysis, to discriminate the N-linked
glycopeptides from unmodiﬁed peptides. Similarly, OScore
algorithm has been reported to validate the identiﬁcation of
O-GlcNAc modiﬁed peptides generated from data-dependent
ETD tandem experiments [104]. However, due to the extreme
complexity of glycosylation, additional efforts are urgently
needed to develop more reliable tools and algorithms assisting
with the evaluation of PSMs, assignment of the glycosylation
sites on glycopeptides, and identiﬁcation of the structures
of the attached glycan moieties.
As glycan structures cannot be predicted from theoretical
sequence databases, no matter which glycan site assignment
strategy is used, a glycomics database is necessary for glycan
identiﬁcation in MS-based glycoproteomics. Currently, there
are several carbohydrate databases developed and maintained
by different academic and commercial organizations including
CFG Glycan database, GLYCOSCIENCES.de Glycan
database, KEGG Glycan database, and other glycan databases
and resources such as EUROCarbDB, UniCarb-DB, GMDB,
glycosuite database, glycominds, bacterial carbohydrate
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
350 F. Li et al. Proteomics 2013, 13, 341–354
Table 4. Partial list of software tools currently used for de novo sequencing based on tandem MS
Software Function Referencea) Availability
STAT De novo assign tandem MS spectra for an
oligosaccharide of up to 10 residues
[98] NAD
GlySpy (OSCAR) Assign glycan structures based on
user-selected MSn ion fragment paths logical
constraints
[99] NAD
StrOligo Interpret tandem MS data with relationship tree
model to ﬁt the experimental data
[100] NAD
GlycoMaster De novo assign N-glycan structures with
heuristic dynamic programming technique
and Branch-and-Bound algorithm
[101] NAD
GLYCH Interpret glycan strucutres from tandem MS
data based on the appearance pattern of
cross-ring ions and dynamic programming
algorithm
[102] NAD
b)Glyco-Peakﬁnder De novo assign tandem MS data with all types
of fragment ions including monosaccharide
cross-ring cleavage products and multiply
charged ions
http://www.glyco-
peakﬁnder.org/
Access through
the web
a) If the software, program, or web service is an open access resource, the web address is provided, otherwise the paper reference numbers
are provided.
b) This tool can be accessed through the website http://www.glyco-peakﬁnder.org/.
NAD, not accessible directly, the authors of the papers should be contacted.
structural databases (BCSCD), CAZy (Carbohydrate-Active
enZYmes Database), which have been well introduced
and/or reviewed elsewhere [7, 23, 105, 106]. However, due
to the different data formats used to encode carbohydrate
structures, there is almost no direct cross-referencing
between these established carbohydrate databases leading
to the existence of multiple disconnected and incompatible
islands in glycomics. Although, several efforts to correct
this situation (generation of GlyDE data exchange standard,
GlycoCT sequence format, GlycomeDB database) have
been reported [85, 107–109], currently it presents major
roadblocks for efﬁcient communication and data sharing
within the glycoscience community. In this situation, a
closer collaboration regarding the development of glycobioinfomatic
concepts between major North-American,
European, and Asian bioinformatics centers could be
expected to eliminate the lack of commonly recognized
standards for glycan deﬁnition formats, data exchange
formats, and data share databases. The 2nd
Beilstein Symposium
on glycobioinformatics “Cracking the Sugar Code
by Navigating the Glycospace” showed trends of cooperation
between several big bioinformatics centers and toward
integration of glycoprotein resources (http://www.beilstein-
institut.de/en/symposia/overview/proceedings/2011glycobioinformatics/).
The new initiative UniCarbKB
represents recent efforts of glycobioinformatics community
toward integrating different resources into one universal
glycomics knowledgebase, which could provide a comprehensive
publically accessible catalogue of information
about carbohydrates [110]. Also, developing proteomic
resources like Tranche repository should improve the data
exchange and tool share in glycoproteomics as well [111].
EUROcarbDB project has published their software libraries
and bioinformatics tools through googlecode at
http://eurocarb.googlecode.com, under the terms of the
Lesser General Public License like many other open source
projects. However, many glycobioinformatics tools still
can only be downloaded from the developer’s websites
or through the paper’s attachments. Enhancing rapidly
developing ﬁeld of glycobioinformatics, which provides vital
support for glycomics and glycoproteomics research, with
new comprehensive and universal tools for data mining
and structural analysis will greatly improve glycopeptide
decoding.
This work was supported in parts by Award Number
1I01BX000609 from the Biomedical Laboratory Research & Development
Service of the VA Ofﬁce of Research and Development to
VVG and American Heart Association National SDG 0830287N
to OVG.
The authors have declared no conﬂict of interest.
8 References
[1] Jensen, O. N., Interpreting the protein language using proteomics.
Nat. Rev. Mol. Cell Biol. 2006, 7, 391–403.
[2] Walsh, C. T., Posttranslational Modiﬁcation of Proteins: Expanding
Nature’s Inventory, Roberts and Company Publishers,
Greenwood Village 2006.
[3] Ribet, D., Cossart, P., Pathogen-mediated posttranslational
modiﬁcations: a re-emerging ﬁeld. Cell 2010, 143, 694–702.
[4] Walsh, C. T., Garneau-Tsodikova, S., Gatto, G. J., Jr.,
Protein posttranslational modiﬁcations: the chemistry of
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2013, 13, 341–354 351
proteome diversiﬁcations. Angew. Chem. Int. Ed. Engl.
2005, 44, 7342–7372.
[5] Chalkley, R. J., Thalhammer, A., Schoepfer, R., Burlingame,
A. L., Identiﬁcation of protein O-GlcNAcylation sites using
electron transfer dissociation mass spectrometry on native
peptides. Proc. Natl. Acad. Sci. USA 2009, 106, 8894–8899.
[6] Turnbull, J. E., Field, R. A., Emerging glycomics technologies.
Nat. Chem. Biol. 2007, 3, 74–77.
[7] von der Lieth, C. W., Freire, A. A., Blank, D., Campbell, M. P.
et al., EUROCarbDB: an open-access platform for glycoinformatics.
Glycobiology 2011, 21, 493–502.
[8] Hart, G. W., Copeland, R. J., Glycomics hits the big time.
Cell 2010, 143, 672–676.
[9] Schauer, R., Sialic acids as regulators of molecular and cellular
interactions. Curr. Opin. Struct. Biol. 2009, 19, 507–514.
[10] Spiro, R. G., Protein glycosylation: nature, distribution, enzymatic
formation, and disease implications of glycopeptide
bonds. Glycobiology 2002, 12, 43R–56R.
[11] Schachter, H., Freeze, H. H., Glycosylation diseases: quo
vadis? Biochim. Biophys. Acta 2009, 1792, 925–930.
[12] Hansen, S. F., Bettler, E., Rinnan, A., Engelsen, S. B. et al.,
Exploring genomes for glycosyltransferases. Mol. Biosyst.
2010, 6, 1773–1781.
[13] Narimatsu, H., Construction of a human glycogene library
and comprehensive functional analysis. Glycoconj. J. 2004,
21, 17–24.
[14] Khoury, G. A., Baliban, R. C., Floudas, C. A., Proteome-wide
post-translational modiﬁcation statistics: frequency analysis
and curation of the swiss-prot database. Sci. Rep. 2011,
1, 90.
[15] Apweiler, R., Hermjakob, H., Sharon, N., On the frequency
of protein glycosylation, as deduced from analysis of the
SWISS-PROT database. Biochim. Biophys. Acta 1999, 1473,
4–8.
[16] Colgrave, M. L., Snelling, H. J., Shiell, B. J., Feng, Y. R. et al.,
Site occupancy and glycan compositional analysis of two
soluble recombinant forms of the attachment glycoprotein
of Hendra virus. Glycobiology 2012, 22, 572–584.
[17] Arnold, J. N., Wormald, M. R., Sim, R. B., Rudd, P. M., Dwek,
R. A., The impact of glycosylation on the biological function
and structure of human immunoglobulins. Annu. Rev.
Immunol. 2007, 25, 21–50.
[18] Zielinska, D. F., Gnad, F., Wisniewski, J. R., Mann, M., Precision
mapping of an in vivo N-glycoproteome reveals
rigid topological and sequence constraints. Cell 2010, 141,
897–907.
[19] Bertozzi, C. R., Sasisekharan, R., in: Varki, A., Cummings,
R. D., Esko, J. D., Freeze, H. H., Stanley, P., Bertozzi, C. R.,
Hart, G. W., Etzler, M. E. (Eds.), Essentials of Glycobiology,
2nd Edn., Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, NY 2009, pp. 679–690.
[20] Varki, A., Glycan-based interactions involving vertebrate
sialic-acid-recognizing proteins. Nature 2007, 446,
1023–1029.
[21] Tian, Y., Zhang, H., Glycoproteomics and clinical applications.
Proteomics Clin. Appl. 2010, 4, 124–132.
[22] Rademacher, T. W., Parekh, R. B., Dwek, R. A., Glycobiology.
Annu. Rev. Biochem. 1988, 57, 785–838.
[23] Raman, R., Raguram, S., Venkataraman, G., Paulson,
J. C. et al., Glycomics: an integrated systems approach to
structure-function relationships of glycans. Nat. Methods
2005, 2, 817–824.
[24] Moremen, K. W., Molinari, M., N-linked glycan recognition
and processing: the molecular basis of endoplasmic
reticulum quality control. Curr. Opin. Struct. Biol. 2006, 16,
592–599.
[25] Maattanen, P., Gehring, K., Bergeron, J. J., Thomas, D. Y.,
Protein quality control in the ER: the recognition of misfolded
proteins. Semin. Cell Dev. Biol. 2010, 21, 500–511.
[26] Mulloy, B., Hart, G. W., Stanley, P., in: Varki, A., Cummings,
R. D., Esko, J. D., Freeze, H. H., Stanley, P., Bertozzi, C. R.,
Hart, G. W., Etzler, M. E. (Eds.), Essentials of Glycobiology,
2nd Edn., Cold Spring Harbor Laboratory Press, Cold Spring
Harbor, NY 2009, pp. 661–678.
[27] Dempski, R. E., Jr., Imperiali, B., Oligosaccharyl transferase:
gatekeeper to the secretory pathway. Curr. Opin. Chem.
Biol. 2002, 6, 844–850.
[28] Stanley, P., Schachter, H., Taniguchi, N., in: Varki, A., Cummings,
R. D., Esko, J. D., Freeze, H. H., Stanley, P., Bertozzi,
C. R., Hart, G. W., Etzler, M. E. (Eds.), Essentials of Glycobiology,
2nd Edn., Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, NY 2009, pp. 101–114.
[29] McDonald, C. A., Yang, J. Y., Marathe, V., Yen, T. Y. et al.,
Combining results from lectin afﬁnity chromatography and
glycocapture approaches substantially improves the coverage
of the glycoproteome. Mol. Cell Proteomics 2009, 8,
287–301.
[30] Brockhausen, I., Schachter, H., Stanley, P., in: Varki, A., Cummings,
R. D., Esko, J. D., Freeze, H. H., Stanley, P., Bertozzi,
C. R., Hart, G. W., Etzler, M. E. (Eds.), Essentials of Glycobiology,
2nd Edn., Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, NY 2009, pp. 115–128.
[31] Hart, G. W., Slawson, C., Ramirez-Correa, G., Lagerlof, O.,
Cross talk between O-GlcNAcylation and phosphorylation:
roles in signaling, transcription, and chronic disease. Annu.
Rev. Biochem. 2011, 80, 825–858.
[32] Hanover, J. A., Krause, M. W., Love, D. C., Bittersweet
memories: linking metabolism to epigenetics through OGlcNAcylation.
Nat. Rev. Mol. Cell Biol. 2012, 13, 312–321.
[33] Hart, G. W., Housley, M. P., Slawson, C., Cycling of O-linked
beta-N-acetylglucosamine on nucleocytoplasmic proteins.
Nature 2007, 446, 1017–1022.
[34] Zhang, Q., Ames, J. M., Smith, R. D., Baynes, J. W. et al.,
A perspective on the Maillard reaction and the analysis
of protein glycation by mass spectrometry: probing the
pathogenesis of chronic disease. J. Proteome Res. 2009, 8,
754–769.
[35] Pan, S., Chen, R., Aebersold, R., Brentnall, T. A., Mass spectrometry
based glycoproteomics–from a proteomics perspective.
Mol. Cell Proteomics 2011, 10, R110 003251.
[36] Lazar, I. M., Lazar, A. C., Cortes, D. F., Kabulski, J. L., Recent
advances in the MS analysis of glycoproteins: theoretical
considerations. Electrophoresis 2011, 32, 3–13.
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
352 F. Li et al. Proteomics 2013, 13, 341–354
[37] Chait, B. T., Chemistry. Mass spectrometry: bottom-up or
top-down? Science 2006, 314, 65–66.
[38] Borchers, C. H., Thapar, R., Petrotchenko, E. V., Torres,
M. P. et al., Combined top-down and bottom-up proteomics
identiﬁes a phosphorylation site in stem-loop-binding proteins
that contributes to high-afﬁnity RNA binding. Proc.
Natl. Acad. Sci. USA 2006, 103, 3094–3099.
[39] Wu, S., Tolic, N., Tian, Z., Robinson, E. W. et al., An integrated
top-down and bottom-up strategy for characterization
of protein isoforms and modiﬁcations. Methods Mol.
Biol. 2011, 694, 291–304.
[40] Carpentieri, A., Giangrande, C., Pucci, P., Amoresano, A.,
Glycoproteome study in myocardial lesions serum by
integrated mass spectrometry approach: preliminary insights.
Eur. J. Mass Spectrom. (Chichester, Eng) 2010, 16,
123–149.
[41] Chen, G., Pramanik, B. N., Application of LC/MS to proteomics
studies: current status and future prospects. Drug
Discov. Today 2009, 14, 465–471.
[42] Siuti, N., Kelleher, N. L., Decoding protein modiﬁcations
using top-down mass spectrometry. Nat. Methods 2007, 4,
817–821.
[43] Palumbo, A. M., Smith, S. A., Kalcic, C. L., Dantus, M.
et al., Tandem mass spectrometry strategies for phosphoproteome
analysis. Mass Spectrom. Rev. 2011, 30, 600–625.
[44] Wuhrer, M., Catalina, M. I., Deelder, A. M., Hokke, C. H.,
Glycoproteomics based on tandem mass spectrometry of
glycopeptides. J. Chromatogr. B Analyt. Technol. Biomed.
Life Sci. 2007, 849, 115–128.
[45] Greis, K. D., Hayes, B. K., Comer, F. I., Kirk, M. et al., Selective
detection and site-analysis of O-GlcNAc-modiﬁed glycopeptides
by beta-elimination and tandem electrospray
mass spectrometry. Anal. Biochem. 1996, 234, 38–49.
[46] Ito, H., Takegawa, Y., Deguchi, K., Nagai, S. et al., Direct
structural assignment of neutral and sialylated N-glycans
of glycopeptides using collision-induced dissociation MSn
spectral matching. Rapid Commun. Mass Spectrom. 2006,
20, 3557–3565.
[47] Renfrow, M. B., Mackay, C. L., Chalmers, M. J., Julian,
B. A. et al., Analysis of O-glycan heterogeneity in IgA1
myeloma proteins by Fourier transform ion cyclotron resonance
mass spectrometry: implications for IgA nephropathy.
Anal. Bioanal. Chem. 2007, 389, 1397–1407.
[48] Kjeldsen, F., Haselmann, K. F., Budnik, B. A., Sorensen, E. S.
et al., Complete characterization of posttranslational modiﬁcation
sites in the bovine milk protein PP3 by tandem
mass spectrometry with electron capture dissociation as
the last stage. Anal. Chem. 2003, 75, 2355–2361.
[49] Zubarev, R. A., Kelleher, N. L., McLafferty, F. W., Electron
capture dissociation of multiply charged protein cations.
A non-ergodic process. J. Am Chem. Soc. 1998, 120,
3265–3266.
[50] Mirgorodskaya, E., Roepstorff, P., Zubarev, R. A., Localization
of O-glycosylation sites in peptides by electron capture
dissociation in a Fourier transform mass spectrometer.
Anal. Chem. 1999, 71, 4431–4436.
[51] Baba, T., Hashimoto, Y., Hasegawa, H., Hirabayashi, A. et al.,
Electron capture dissociation in a radio frequency ion trap.
Anal. Chem. 2004, 76, 4263–4266.
[52] Mikesh, L. M., Ueberheide, B., Chi, A., Coon, J. J. et al.,
The utility of ETD mass spectrometry in proteomic analysis.
Biochim. Biophys. Acta 2006, 1764, 1811–1822.
[53] Wiesner, J., Premsler, T., Sickmann, A., Application of electron
transfer dissociation (ETD) for the analysis of posttranslational
modiﬁcations. Proteomics 2008, 8, 4466–4483.
[54] Perdivara, I., Petrovich, R., Allinquant, B., Deterding, L.
J. et al., Elucidation of O-glycosylation structures of the
beta-amyloid precursor protein by liquid chromatographymass
spectrometry using electron transfer dissociation and
collision induced dissociation. J. Proteome Res. 2009, 8,
631–642.
[55] Harvey, D. J., Proteomic analysis of glycosylation: structural
determination of N- and O-linked glycans by mass
spectrometry. Expert. Rev. Proteomics 2005, 2, 87–101.
[56] Geyer, H., Geyer, R., Strategies for analysis of glycoprotein
glycosylation. Biochim. Biophys. Acta 2006, 1764,
1853–1869.
[57] Rakus, J. F., Mahal, L. K., New technologies for glycomic
analysis: toward a systematic understanding of the glycome.
Annu. Rev. Anal. Chem. (Palo Alto Calif) 2011, 4,
367–392.
[58] Goetz, J. A., Novotny, M. V., Mechref, Y., Enzymatic/chemical
release of O-glycans allowing MS analysis
at high sensitivity. Anal. Chem. 2009, 81, 9546–9552.
[59] Roth, Z., Parnes, S., Wiel, S., Sagi, A. et al., N-glycan moieties
of the crustacean egg yolk protein and their glycosylation
sites. Glycoconj. J. 2010, 27, 159–169.
[60] Kuster, B., Mann, M., 18O-labeling of N-glycosylation sites
to improve the identiﬁcation of gel-separated glycoproteins
using peptide mass mapping and database searching. Anal.
Chem. 1999, 71, 1431–1440.
[61] Kaji, H., Saito, H., Yamauchi, Y., Shinkawa, T. et al., Lectin
afﬁnity capture, isotope-coded tagging and mass spectrometry
to identify N-linked glycoproteins. Nat. Biotechnol.
2003, 21, 667–672.
[62] Deutsch, E. W., Mendoza, L., Shteynberg, D., Farrah, T. et al.,
A guided tour of the Trans-Proteomic Pipeline. Proteomics
2010, 10, 1150–1159.
[63] Nesvizhskii, A. I., Keller, A., Kolker, E., Aebersold, R., A
statistical model for identifying proteins by tandem mass
spectrometry. Anal. Chem. 2003, 75, 4646–4658.
[64] Raman, R., Venkataraman, M., Ramakrishnan, S., Lang, W.
et al., Advancing glycomics: implementation strategies
at the consortium for functional glycomics. Glycobiology
2006, 16, 82R–90R.
[65] Wang, H., Wong, C. H., Chin, A., Taguchi, A. et al., Integrated
mass spectrometry-based analysis of plasma glycoproteins
and their glycan modiﬁcations. Nat. Protoc. 2011, 6,
253–269.
[66] Chalkley, R. J., Baker, P. R., Huang, L., Hansen, K. C.
et al., Comprehensive analysis of a multidimensional liquid
chromatography mass spectrometry dataset acquired
on a quadrupole selecting, quadrupole collision cell, timeof-ﬂight
mass spectrometer: II. New developments in
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
Proteomics 2013, 13, 341–354 353
Protein Prospector allow for reliable and comprehensive
automatic analysis of large datasets. Mol. Cell Proteomics
2005, 4, 1194–1204.
[67] Chalkley, R. J., Baker, P. R., Medzihradszky, K. F., Lynn,
A. J. et al., In-depth analysis of tandem mass spectrometry
data from disparate instrument types. Mol. Cell Proteomics
2008, 7, 2386–2398.
[68] Darula, Z., Chalkley, R. J., Baker, P., Burlingame, A. L.
et al., Mass spectrometric analysis, automated identiﬁcation
and complete annotation of O-linked glycopeptides.
Eur. J. Mass Spectrom. (Chichester, Eng) 2010, 16, 421–428.
[69] Vosseller, K., Trinidad, J. C., Chalkley, R. J., Specht, C. G.
et al., O-linked N-acetylglucosamine proteomics of postsynaptic
density preparations using lectin weak afﬁnity chromatography
and mass spectrometry. Mol. Cell Proteomics
2006, 5, 923–934.
[70] Baker, P. R., Medzihradszky, K. F., Chalkley, R. J., Improving
software performance for peptide electron transfer dissociation
data analysis by implementation of charge state- and
sequence-dependent scoring. Mol. Cell Proteomics 2010, 9,
1795–1803.
[71] Baker, P. R., Trinidad, J. C., Chalkley, R. J., Modiﬁcation site
localization scoring integrated into a search engine. Mol.
Cell Proteomics 2011, 10, M111 008078.
[72] Ren, J. M., Rejtar, T., Li, L., Karger, B. L., N-Glycan structure
annotation of glycopeptides using a linearized glycan
structure database (GlyDB). J. Proteome Res. 2007, 6,
3162–3173.
[73] Goldberg, D., Bern, M., Parry, S., Sutton-Smith, M. et al.,
Automated N-glycopeptide identiﬁcation using a combination
of single- and tandem-MS. J. Proteome Res. 2007, 6,
3995–4005.
[74] Nwosu, C. C., Seipert, R. R., Strum, J. S., Hua, S. S.
et al., Simultaneous and extensive site-speciﬁc N- and Oglycosylation
analysis in protein mixtures. J. Proteome Res.
2011, 10, 2612–2624.
[75] An, H. J., Tillinghast, J. S., Woodruff, D. L., Rocke, D. M.
et al., A new computer program (GlycoX) to determine
simultaneously the glycosylation sites and oligosaccharide
heterogeneity of glycoproteins. J. Proteome Res. 2006, 5,
2800–2808.
[76] Joenvaara, S., Ritamo, I., Peltoniemi, H., Renkonen, R., Nglycoproteomics—an
automated workﬂow approach. Glycobiology
2008, 18, 339–349.
[77] Irungu, J., Go, E. P., Dalpathado, D. S., Desaire, H., Simpliﬁcation
of mass spectral analysis of acidic glycopeptides
using GlycoPep ID. Anal. Chem. 2007, 79, 3065–3074.
[78] Nilsson, J., Larson, G., Grahn, A., Characterization of sitespeciﬁc
O-glycan structures within the mucin-like domain
of alpha-dystroglycan from human skeletal muscle. Glycobiology
2010, 20, 1160–1169.
[79] Chen, Y., Liu, M., Yan, G., Lu, H. et al., One-pipeline approach
achieving glycoprotein identiﬁcation and obtaining
intact glycopeptide information by tandem mass spectrometry.
Mol. Biosyst. 2010, 6, 2417–2422.
[80] Wada, Y., Dell, A., Haslam, S. M., Tissot, B. et al., Comparison
of methods for proﬁling O-glycosylation: human proteome
organisation human disease glycomics/proteome
initiative multi-institutional study of IgA1. Mol. Cell Proteomics
2010, 9, 719–727.
[81] Kapp, E., Schutz, F., Overview of tandem mass spectrometry
(MS/MS) database search algorithms. Curr. Protoc. Protein
Sci. 2007, 49, 25.2.1–25.2.19.
[82] Nesvizhskii, A. I., Protein identiﬁcation by tandem mass
spectrometry and sequence database searching. Methods
Mol. Biol. 2007, 367, 87–119.
[83] Lu, B., Xu, T., Park, S. K., Yates, J. R., 3rd, Shotgun protein
identiﬁcation and quantiﬁcation by mass spectrometry.
Methods Mol. Biol. 2009, 564, 261–288.
[84] Nesvizhskii, A. I., Vitek, O., Aebersold, R., Analysis and validation
of proteomic data generated by tandem mass spectrometry.
Nat. Methods 2007, 4, 787–797.
[85] Frank, M., Schloissnig, S., Bioinformatics and molecular
modeling in glycobiology. Cell Mol. Life Sci. 2010, 67,
2749–2772.
[86] Perez, S., Mulloy, B., Prospects for glycoinformatics. Curr.
Opin. Struct. Biol. 2005, 15, 517–524.
[87] Cooper, C. A., Gasteiger, E., Packer, N. H., GlycoMod—
a software tool for determining glycosylation compositions
from mass spectrometric data. Proteomics 2001, 1,
340–349.
[88] Goldberg, D., Sutton-Smith, M., Paulson, J., Dell, A.,
Automatic annotation of matrix-assisted laser desorption/ionization
N-glycan spectra. Proteomics 2005, 5, 865–
875.
[89] Goldberg, D., Bern, M., North, S. J., Haslam, S. M., Dell,
A., Glycan family analysis for deducing N-glycan topology
from single MS. Bioinformatics 2009, 25, 365–371.
[90] Lehmann, W. D., Bohne, A., von Der Lieth, C. W., The information
encrypted in accurate peptide masses-improved
protein identiﬁcation and assistance in glycopeptide identiﬁcation
and characterization. J. Mass Spectrom. 2000, 35,
1335–1341.
[91] Deshpande, N., Jensen, P. H., Packer, N. H., Kolarich, D., GlycoSpectrumScan:
ﬁshing glycopeptides from MS spectra
of protease digests of human colostrum sIgA. J. Proteome
Res. 2010, 9, 1063–1075.
[92] Lohmann, K. K., von der Lieth, C. W., GlycoFragment and
GlycoSearchMS: web tools to support the interpretation of
mass spectra of complex carbohydrates. Nucleic Acids Res.
2004, 32, W261–266.
[93] Joshi, H. J., Harrison, M. J., Schulz, B. L., Cooper, C. A. et al.,
Development of a mass ﬁngerprinting tool for automated
interpretation of oligosaccharide fragmentation data. Proteomics
2004, 4, 1650–1664.
[94] Ceroni, A., Maass, K., Geyer, H., Geyer, R. et al., GlycoWorkbench:
a tool for the computer-assisted annotation of mass
spectra of glycans. J. Proteome Res. 2008, 7, 1650–1659.
[95] Gao, H. Y., Generation of asparagine-linked glycan structure
databases and their use. J. Am. Soc. Mass Spectrom. 2009,
20, 1739–1742.
[96] Ozohanics, O., Krenyacz, J., Ludanyi, K., Pollreisz, F. et al.,
GlycoMiner: a new software tool to elucidate glycopeptide
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com
354 F. Li et al. Proteomics 2013, 13, 341–354
composition. Rapid Commun. Mass Spectrom. 2008, 22,
3245–3254.
[97] Apte, A., Meitei, N. S., Bioinformatics in glycomics: glycan
characterization with mass spectrometric data using SimGlycan.
Methods Mol. Biol. 2010, 600, 269–281.
[98] Gaucher, S. P., Morrow, J., Leary, J. A., STAT: a saccharide
topology analysis tool used in combination with tandem
mass spectrometry. Anal. Chem. 2000, 72, 2331–2336.
[99] Lapadula, A. J., Hatcher, P. J., Hanneman, A. J., Ashline,
D. J. et al., Congruent strategies for carbohydrate sequencing.
3. OSCAR: an algorithm for assigning oligosaccharide
topology from MSn data. Anal. Chem. 2005, 77,
6271–6279.
[100] Ethier, M., Saba, J. A., Spearman, M., Krokhin, O. et al.,
Application of the StrOligo algorithm for the automated
structure assignment of complex N-linked glycans from
glycoproteins using tandem mass spectrometry. Rapid
Commun. Mass Spectrom. 2003, 17, 2713–2720.
[101] Shan, B., Ma, B., Zhang, K., Lajoie, G., Complexities and algorithms
for glycan sequencing using tandem mass spectrometry.
J. Bioinform. Comput. Biol. 2008, 6, 77–91.
[102] Tang, H., Mechref, Y., Novotny, M. V., Automated interpretation
of MS/MS spectra of oligosaccharides. Bioinformatics
2005, 21(Suppl 1), i431–i439.
[103] Maass, K., Ranzinger, R., Geyer, H., von der Lieth, C. W.
et al., “Glyco-peakﬁnder”–de novo composition analysis
of glycoconjugates. Proteomics 2007, 7, 4435–4444.
[104] Hahne, H., Kuster, B., A novel two-stage tandem mass
spectrometry approach and scoring scheme for the identiﬁcation
of O-GlcNAc modiﬁed peptides. J. Am. Soc. Mass
Spectrom. 2011, 22, 931–942.
[105] Hayes, C. A., Karlsson, N. G., Struwe, W. B., Lisacek, F. et al.,
UniCarb-DB: a database resource for glycomic discovery.
Bioinformatics 2011, 27, 1343–1344.
[106] Marino, K., Bones, J., Kattla, J. J., Rudd, P. M., A systematic
approach to protein glycosylation analysis: a path through
the maze. Nat. Chem. Biol. 2010, 6, 713–723.
[107] Herget, S., Ranzinger, R., Maass, K., Lieth, C. W., GlycoCT—
a unifying sequence format for carbohydrates. Carbohydr.
Res. 2008, 343, 2162–2171.
[108] Ranzinger, R., Herget, S., Wetter, T., von der Lieth,
C. W., GlycomeDB—integration of open-access carbohydrate
structure databases. BMC Bioinformatics 2008, 9, 384.
[109] Sahoo, S. S., Thomas, C., Sheth, A., Henson, C. et al.,
GLYDE—an expressive XML standard for the representation
of glycan structure. Carbohydr. Res. 2005, 340,
2802–2807.
[110] Campbell, M. P., Hayes, C. A., Struwe, W. B., Wilkins, M. R.
et al., UniCarbKB: putting the pieces together for glycomics
research. Proteomics 2011, 11, 4117–4121.
[111] Hill, J. A., Smith, B. E., Papoulias, P. G., Andrews, P. C., ProteomeCommons.org
collaborative annotation and project
management resource integrated with the Tranche repository.
J. Proteome Res. 2010, 9, 2809–2811.
C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.proteomics-journal.com