Protein characterization by mass spectrometry C7250 Part II Zbyněk Zdráhal RG Proteomics, CEITEC-MU Proteomics CF, CEITEC-MU NCBR FS MU zdrahal@sci.muni.cz Functional Genomics and Proteomics National Centre for Biomolecular Research Faculty of Science Masaryk University mu1 Basic methods of protein characterization by mass spectrometry Functional Genomics and Proteomics National Centre for Biomolecular Research Faculty of Science Masaryk University 06 Protein formation N-terminus C-terminus Alanin, Ala, A Serin, Ser, S Dipeptide Ala-Ser C7250 Primary structure (aminoacids ladder): MAPLLAAAMNHAAAHPGLRSHLVGPNNESFSRHHLPSSSPQSSKRRCNLSFTTRSARVGS QNGVQMLSPSEIPQRDWFPSDFTFGAATSAYQIEGAWNEDGKGESNWDHFCHNHPERILD GSNSDIGANSYHMYKTDVRLLKEMGMDAYRFSISWPRILPKGTKEGGINPDGIKYYRNLI NLLLENGIEPYVTIFHWDVPQALEEKYGGFLDKSHKSIVEDYTYFAKVCFDNFGDKVKNW LTFNEPQTFTSFSYGTGVFAPGRCSPGLDCAYPTGNSLVEPYTAGHNILLAHAEAVDLYN KHYKRDDTRIGLAFDVMGRVPYGTSFLDKQAEERSWDINLGWFLEPVVRGDYPFSMRSLA RERLPFFKDEQKEKLAGSYNMLGLNYYTSRFSKNIDISPNYSPVLNTDDAYASQEVNGPD GKPIGPPMGNPWIYMYPEGLKDLLMIMKNKYGNPPIYITENGIGDVDTKETPLPMEAALN DYKRLDYIQRHIATLKESIDLGSNVQGYFAWSLLDNFEWFAGFTERYGIVYVDRNNNCTR YMKESAKWLKEFNTAKKPSKKILTPA Secondary structure (local sub-structures, hydrogen bonds): Tertiary structure (3-D, folded protein): Quarternary structure (subunits assembly) Protein structure and mass spectrometry Beta-glucosidase (maize), 566 aminoacids, 8280 atoms, 128 474 Da Primary structure (aminoacids ladder): MAPLLAAAMNHAAAHPGLRSHLVGPNNESFSRHHLPSSSPQSSKRRCNLSFTTRSARVGSQNGVQMLSPSEIPQRDWFPSDFTFGAATSAYQIEGAWNE DGKGESNWDHFCHNHPERILDGSNSDIGANSYHMYKTDVRLLKEMGMDAYRFSISWPRILPKGTKEGGINPDGIKYYRNLINLLLENGIEPYVTIFHWD VPQALEEKYGGFLDKSHKSIVEDYTYFAKVCFDNFGDKVKNWLTFNEPQTFTSFSYGTGVFAPGRCSPGLDCAYPTGNSLVEPYTAGHNILLAHAEAVD LYNKHYKRDDTRIGLAFDVMGRVPYGTSFLDKQAEERSWDINLGWFLEPVVRGDYPFSMRSLARERLPFFKDEQKEKLAGSYNMLGLNYYTSRFSKNID ISPNYSPVLNTDDAYASQEVNGPDGKPIGPPMGNPWIYMYPEGLKDLLMIMKNKYGNPPIYITENGIGDVDTKETPLPMEAALNDYKRLDYIQRHIATL KESIDLGSNVQGYFAWSLLDNFEWFAGFTERYGIVYVDRNNNCTRYMKESAKWLKEFNTAKKPSKKILTPA C7250 20040715001_Cmeliak0 Determination of protein mass Intact mass analysis Functional Genomics and Proteomics National Centre for Biomolecular Research Faculty of Science Masaryk University m/z 20000 40000 60000 80000 100000 120000 140000 0 50 100 150 200 250 300 350 400 450 500 550 600 650 [Abs. Int.] [M+H]+ 66465 33219 132721 [M+2H]2+ [2M+H]+ RSD » 0.1% BSA ± 66 Da MALDI MS spectrum of protein (BSA » 15 pmol) C7250 MW(adukt) = 250 Počet navázaných aduktů N = 12 20000 40000 60000 80000 100000 120000 140000 m/z 0 200 400 600 800 1000 1200 1400 a.i. /D=/Data/Zbynek/020205/BSA/2Lin/pdata/1 Administrator Fri Feb 22 13:59:04 2002 Modified BSA vs. std. BSA (» 5 pmol) mass diff. ~ 3 kDa MW(ligand) = 250 Number of binded ligands N = 12 C7250 MALDI - MS P5 MALDI-MS spectrum of glycoprotein protein C7250 mass diff. ~ glycan protein with another glycoform Pepbtb-5 ESI spectrum of myoglobin (16 951 Da) C7250 Determination of protein mass * useful information for initial characterization * does not enable protein identification and it is not necessary for identification * in limited way allows characterization of modifications (mainly using high-resolution MS) C7250 20040715004_Cmeliak0 Protein identification Functional Genomics and Proteomics National Centre for Biomolecular Research Faculty of Science Masaryk University bottom up Protein identification using mass spectrometric data top down protein (mix) separation digestion (specific protease) protein (mix) MS MS/MS analysis separation MS/MS analysis Identification (DB search, de novo) C7250 Identification of known proteins is mainly performed at peptide level - primary sequence is in database 1th step specific digestion of proteins MS/MS MS/MS Ion Search MS Peptide Mass Fingerprinting bottom up C7250 Specific protein digestion - Enzymatic digestion at selected aminoacids - examples of specific proteases trypsin K-X, R-X except P Glu-C E-X except P Asp-N X-D - „low-specifity“ proteases (pepsin, thermolysin) - Chemical digestion CNBr(FA) X-M http://www.expasy.ch/tools/peptidecutter/ C7250 QNGVQMLSPSEIPQRDWFPSDFTFGAATSAYQIEGAWNEDGKGESNWDHFCHNHPERILDGSNSDIGANSYHMYKTDVRPLLKPMGMDAYRFSISWPRI LPKGTKEGGINPDGIKYYRNLINLLLENGIEP digests always after lysine (K) or arginine (R), if the next aminoacid is not proline Tryptic digestion Specifity of digestion – one of crucial prerequisites of identification reliability Set of masses of these formed peptides (i.e. peptide map) is characteristic for given protein similarly as fingerprint for human individual. Peptide map QNGVQMLSPSEIPQR 1-15 1683.848 Da DWFPSDFTFGAATSAYQIEGAWNEDGK 16-42 3010.317 Da GESNWDHFCHNHPER 43-57 1864.757 Da ILDGSNSDIGANSYHMYK 58-75 1984.907 Da TDVR 76-79 490.262 Da ... C7250 m/z 1600 1700 1800 1900 2000 2100 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 Abs. Int. * 1000 1969.88 152-166 2097.97 151-166 1743.73 1625.66 1856.80 1642.67 2027.93 KLLTQDWVQENYLEYR LLTQDWVQENYLEYR LTQDWVQENYLEYR -L TQDWVQENYLEYR -LL -K QDWVQENYLEYR -LLT Pyro-Glu example of unspecific digestion MALDI-MS (spectrum detail) Sequence of unspecifically digested peptides verified by MALDI-MS/MS Tryptic digestion C7250 20040715003_Cmeliak0 MS Peptide Mass Fingerprinting * method applicable for pro individual (separated) proteins separation of proteins necessary principle: - MS analysis of specifically digested peptides obtained set of “peptide masses” (peptide map) = specific information suitable for protein identification - database search C7250 Measured peptide map is searched against database of protein sequences using database search engines. Database search engine calculates theoretical peptide map for each protein sequence in database (applying cleavage rules for selected protease) and stepwise compares experimentally obtained peptide map of our analysed protein with in-silico calculated peptide maps. The search results in a list of proteins with most similar in-silico peptide maps. Similarity extent is given by score, all protein candidates with score value higher than the limit significant value (calculated by software) are considered as identified by search engine. MS Peptide Mass Fingerprinting Experimental design cells,tissue 2-D gel electrophoresis set of peptides specific for analyzed protein digestion (trypsin) m/z purified protein MS ??? gel spot solution or C7250 MS Peptide Mass Fingerprinting C7250 MS Peptide Mass Fingerprinting MS spectrum contains masses of peptides formed by digestion of selected protein MALDI - TOF MS spectrum of peptides after protein digestion MS Peptide Mass Fingerprinting C7250 http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=PMF CAYYVZNB Mascot Search Results 1. S18600 Mass: 47780 Total score: 165 Peptides matched: 12 glutamate-ammonia ligase (EC 6.3.1.2) precursor, chloroplast (clone lambdaAtgsl1) - Arabidopsis thaliana 2. S32228 Mass: 47714 Total score: 76 Peptides matched: 7 glutamate-ammonia ligase (EC 6.3.1.2) precursor - rape - Brassica napus Database : MSDB 20021127 (1019653 sequences) Timestamp : 26 Jan 2003 at 10:36:50 GMT Top Score : 165 for S18600, glutamate-ammonia ligase .... Sequence Coverage: 44% 1 MAQILAASPT CQMRVPKHSS VIASSSKLWS SVVLKQKKQS NNKVRGFRVL 51 ALQSDNSTVN RVETLLNLDT KPYSDRIIAE YIWIGGSGID LRSKSRTIEK 101 PVEDPSELPK WNYDGSSTGQ APGEDSEVIL YPQAIFRDPF RGGNNILVIC 151 DTWTPAGEPI PTNKRAKAAE IFSNKKVSGE VPWFGIEQEY TLLQQNVKWP 201 LGWPVGAFPG PQGPYYCGVG ADKIWGRDIS DAHYKACLYA GINISGTNGE 251 VMPGQWEFQV GPSVGIDAGD HVWCARYLLE RITEQAGVVL TLDPKPIEGD 301 WNGAGCHTNY STKSMREEGG FEVIKKAILN LSLRHKEHIS AYGEGNERRL 351 TGKHETASID QFSWGVANRG CSIRVGRDTE AKGKGYLEDR RPASNMDPYI 401 VTSLLAETTL LWEPTLEAEA LAAQKLSLNV www.matrixscience.com MS Peptide Mass Fingerprinting C7250 ü fast identification technique ü usually MALDI-MS (storage of samples) * only known proteins * identification based only on mass of whole peptides * detailed structural information is not possible (modification type – yes?, localization – no) * protein separation always necessary C7250 MS Peptide Mass Fingerprinting 20040715005_Cmeliak0 * method suitable for protein mixtures separation of intact proteins is not necessary principle: - MS/MS analysis of specifically digested peptides obtained set of “masses” (m/z) of fragments of individual peptides = specific information suitable for protein identification - database search MS/MS MS/MS Ion Search C7250 Measured fragmentation maps (i.e. sets of masses (or m/z) of fragments formed during MS/MS of individual peptides) are searched against database of protein sequences by search engine. At first, database search engine prepares theoretical peptide map for a protein sequence in database, subsequently, it calculates theoretical fragmentation map for each peptide of the corresponding peptide map (according to given fragmentation rules) and then these in-silico prepared fragmentation maps are compared with our experimentally obtained fragmentation maps of analyzed peptides. The engine performs this operation for each protein sequence in database. Software calculates individual score for each peptide, score value higher than limit peptide score determines significant similarity between theoretical and measured fragmentation map – significant peptide identification. In final, search engine assort peptides to corresponding protein sequences (the more peptides with significant score per protein – the more reliable protein identification). The software also calculate protein score which is derived from individual peptide score as a tool for setting up results. C7250 MS/MS MS/MS Ion Search IonFragmentation MS/MS fragmentation of peptides v peptides consist of individual aminoacids which are connected by peptide bond v during fragmentation (e.g. CID), peptide is fragmented preferentially at peptide bond and thus: all peptide bonds might be fragmented (in each precursor molecule different ones) forming set of fragments with various number of aminoacids differences in m/z (or mass) of „neighbouring fragments“ determines type of terminal aminoacid in the longer fragment v serie of fragment ions are formed (b – y, a – x, c – z) which can be used for de novo primary structure elucidation; moreover they are predictable and they can be used for database search based protein identification even if they are not complete Outline of tripeptide fragmentation Roepstorff P. and Fohlman, J., Biomed. Mass Spectrom., 11 (11), 601 (1984) N-terminus C-terminus MS/MS MS/MS Ion Search C7250 + + + + b1 b2 b3 b4 b- ion serie + + + + y1 y2 y3 y4 y- ion serie fragmentatation maps for individual peptides [M+H]+ C-terminus N-terminus MS/MS of peptides + Peptide fragmentation - CID MS of peptides C7250 by courtesy of Dr. Arnd Ingendoh (Bruker) Peptide fragmentation - ETD C7250 * lower preference of weak bonds * uniform fragmentation of all bonds C7250 Peptide fragmentation - ETD by courtesy of Dr. Arnd Ingendoh (Bruker) N-terminus (c-serie) C-terminus (z- serie) CID ETD C7250 Peptide fragmentation - ETD MS/MS MS/MS Ion Search MS/MS fragmentation of peptide – in silico: ALELFR a b c Res: x y z 44.050 72.045 89.071 1 Ala 6 - - - 157.134 185.129 202.156 2 Leu 5 701.363 675.384 658.358 286.177 314.172 331.198 3 Glu 4 588.279 562.300 545.273 399.261 427.256 444.282 4 Leu 3 459.237 433.257 416.231 546.329 574.324 591.351 5 Phe 2 346.153 320.173 303.147 - - - 6 Arg 1 199.084 173.105 156.078 BS00580_ C7250 CID spectrum of peptide QGFGNVATNTDGK (b, y) C7250 ETD spectrum of peptide QGFGNVATNTDGK (c, y, z) C7250 set of digested peptides of all proteins in mixture digestion m/z protein mixture LC-MS/MS separation of peptides and MS/MS analysis cells tissues MS/MS MS/MS Ion Search ! Experimental design C7250 Separation of myoglobin tryptic peptides MS/MS MS/MS Ion Search C7250 MS/MS spectrum of peptide, 374.8 Da, 2+ MS/MS MS/MS Ion Search C7250 score_gif Mascot Search Results 1. MYBD Mass: 16955 Score: 29 Peptides matched: 1 myoglobin - Eurasian badger (tentative sequence) Query Observed Mr(expt) Mr(calc) Delta Miss Score Expect Rank Peptide 1. 374.81 747.60 747.43 0.18 0 29 0.014 1 ALELFR Database : MSDB 20040329 (1457190 sequences) Taxonomy : Other mammalia (30839 sequences) Timestamp : 20 May 2004 at 06:55:04 GMT Accession Mass Score Description 1. Q865L4 3798 55 Myoglobin (Fragment).- Bos taurus (Bovine). 2. 1A6K 17004 50 myoglobin - sperm whale 3. 1MNJB 16734 50 myoglobin (met, ph 7.1) mutant with his 64 replaced by val ….. 4. 1MNKA 16722 50 myoglobin (aquomet, ph 7.1) mutant with his 64 replaced by val….. 5. 1DTMA 17052 50 recombinant sperm whale myoglobin variant h93g mutant YES - sperm whale MS/MS MS/MS Ion Search C7250 MS/MS spectrum of peptide, 747 Da, ALELFR R F L E MS/MS MS/MS Ion Search C7250 MS/MS MS/MS Ion Search PD, taxonomy: All entries Individual ions scores > 25 indicate identity or extensive homology 1. ct74_rgenePd06_2913 Mass: 49299 Score: 46 Queries matched: 1 emPAI: 0.07 ct74_rgenePd06_2913 Query Observed Mr(expt) Mr(calc) Delta Miss Score Expect Rank Peptide 272 711.4 1420.7 1420.698 0.10 0 46 0.00025 1 R.WFSLDEINELR.R 6. gi|94982457 Mass: 105502 Score: 229 Queries matched: 7 emPAI: 0.20 actinin alpha 1 isoform b [Homo sapiens] Query Observed Mr(expt) Mr(calc) Delta Miss Score Expect Rank Peptide 157 600.3 1198.6 1198.623 0.04 0 40 0.057 1 R.DLLLDPAWEK.Q 219 663.4 1324.8 1324.648 0.20 1 15 16 7 R.RDQALTEEHAR.Q 241 686.9 1371.8 1371.779 0.11 0 72 3.4e-005 1 K.LMLLLEVISGER.L 248 693.9 1385.9 1385.766 0.14 0 87 1.1e-006 1 R.VGWEQLLTTIAR.T 272 711.4 1420.7 1420.698 0.10 0 60 0.00046 1 K.GYEEWLLNEIR.R 274 715.4 1428.8 1428.757 0.05 0 79 5.3e-006 1 R.TINEVENQILTR.D 334 780.4 2338.3 2338.180 0.16 0 66 0.0001 1 K.IDQLEGDHQLIQEALIFDNK.H NCBInr, taxonomy: Homo Sapiens Individual ions scores > 39 indicate identity or extensive homology Probability is mathematics only C7250 # b Seq. y # 1 58.0287 G 11 2 221.0921 Y 1364.6845 10 3 350.1347 E 1201.6212 9 4 479.1773 E 1072.5786 8 5 665.2566 W 943.5360 7 6 778.3406 L 757.4567 6 7 891.4247 L 644.3726 5 8 1005.4676 N 531.2885 4 9 1134.5102 E 417.2456 3 10 1247.5943 I 288.2030 2 11 R 175.1190 GYEEWL LNEI R # b Seq. y # 1 187.0866 W 11 2 334.1550 F 1235.6266 10 3 421.1870 S 1088.5582 9 4 534.2711 L 1001.5262 8 5 649.2980 D 888.4421 7 6 778.3406 E 773.4152 6 7 891.4247 I 644.3726 5 8 1005.4676 N 531.2885 4 9 1134.5102 E 417.2456 3 10 1247.5943 L 288.2030 2 11 R 175.1190 1 WFSLDE INEL R MS/MS MS/MS Ion Search Error Distribution (ppm) Error Distribution (ppm) C7250 MS/MS MS/MS Ion Search ü more reliable identification based on peptide fragmentation (protein is identifiable based on MS/MS spectrum of one peptide) ü MS/MS data allows sequence determination of unknown proteins (de novo sequencing) ü MS/MS techniques are suitable for detailed characterization of sequence and PTMs * more technically (financially) demanding and more time-consuming than MS techniques C7250 MS/MS De novo sequencing Determination of structure of unknown proteins * different proteases overlap of peptides succession in sequence * MS/MS of peptides (MALDI-MS/MS, LC-ESI-MS/MS) * MS/MS spectra interpretation (manual, automatic - SWs) * * supporting information (BLAST…) C7250 De novo sequencing of peptides – MS/MS + digestion in H218O confirmation of y fragments +4 (+2) C7250 20040715002_Cmeliak0 The top-down approach uses the mass of the intact proteins, individually or in mixtures, and then fragments the intact proteins inside the mass spectrometer without prior enzymatic digestion[3]. The advantages of top-down proteomics are the ability to measure the actual intact protein molecular weight, preserving both the entire protein sequence and the integrity of post-translational modifications. Currently, top-down proteomics are limited to FTICR instruments because of requirements for high resolving power, mass accuracy and complementary fragmentation methods. Intact protein and fragment molecular weights can be searched against a corresponding database in a manner similar to that of the bottom up approach in order to provide protein identification[4-6]. 3. Reid GE, McLuckey SA. J Mass Spectrom. 2002; 37: 663. 4. Senko MW, Beu SC, McLafferty FW. Anal Chem. 1994; 66: 415. 5. Mortz E, O'Connor PB, Roepstorff P, Kelleher NL, Wood TD,McLafferty FW, Mann M. Proc Natl Acad Sci U S A 1996; 93: 8264. 6. Meng F, Cargile BJ, Patrie SM, Johnson JR, McLoughlin SM, Kelleher NL.Anal Chem. 2002; 74: 2923. ThermoElectron`s poster at HUPO 05 Top Down C7250 ETD/PTR (ESI-IT) C7250 by courtesy of Dr. Arnd Ingendoh (Bruker) sequence coverage – 87% ETD/PTR (ESI-IT) Ubiquitin (8559.6 Da) C7250 by courtesy of Dr. Arnd Ingendoh (Bruker) m/z 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 0 5 10 15 20 25 30 35 40 45 50 [Abs. Int. * 1000] c M D S S T S A A S S S N 1545.953 c 13 1660.959 c 14 1747.989 c 15 1835.017 c 16 1936.065 c 17 2023.098 c 18 2094.126 c 19 2165.124 c 20 2252.152 c 21 2339.196 c 22 2426.217 c 23 In-source decay, ISD (Rnase B, 13,7 kDa) MALDI- MS Top-Down ISD only pure protein C7250 ISD histone H3, (15,3 kDa) MALDI-ISD MS fragments up to 4,5 kDa N-terminus – c serie C-terminus – y serie C7250 ARTKQTARKSTGGKAPR... .... KDIQLARRIRGERA histone H3, (15,3 kDa) MALDI-ISD MS, spektrum detail C7250 LC-MALDI setup Characterization of therapeutic antibodies Assessment of N- and C-terminal modification status. LC-fractions containing, e.g., the Fd, Fc/2 and LC domains C7250 TDS – Top-Down-Sequencing End