BioMed Central Page 1 of 17 (page number not for citation purposes) BMC Genomics Open AccessResearch article Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes Sook Jung*1, Dorrie Main2, Margaret Staton1, Ilhyung Cho3, Tatyana Zhebentyayeva1, Pere Arús4 and Albert Abbott1 Address: 1Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA, 2Department of Horticulture and Landscape Architecture, Washington State University, Pullman, WA 99164, USA, 3Department of Computer Science, Saginaw Valley State University, University Center, MI 48710, USA and 4Departament de Gentica Vegetal, Laboratori de Gentica Molecular Vegetal. CSIC-IRTA,08348 Cabrils, Spain Email: Sook Jung* - sook@genome.clemson.edu; Dorrie Main - dorrie@wsu.edu; Margaret Staton - meg@genome.clemson.edu; Ilhyung Cho - icho@svsu.edu; Tatyana Zhebentyayeva - tzhebe@clemson.edu; Pere Arús - pere.arus@irta.es; Albert Abbott - aalbert@clemson.edu * Corresponding author Abstract Background: Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results: We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo- ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion: We report here the result of the first extensive analysis of the conserved microsynteny using DNA sequences across the Prunus genome and their Arabidopsis homologs. Our study also illustrates that both the ancestral and present Arabidopsis genomes can provide a useful resource for marker saturation and candidate gene search, as well as elucidating evolutionary relationships between species. Published: 14 April 2006 BMC Genomics 2006, 7:81 doi:10.1186/1471-2164-7-81 Received: 13 December 2005 Accepted: 14 April 2006 This article is available from: http://www.biomedcentral.com/1471-2164/7/81 2006 Jung et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 2 of 17 (page number not for citation purposes) Background The eukaryote genome size is vastly diverse and is not dependent on the genetic and organismal complexity. Most of the DNA in large genomes, however, is non-cod- ing and the gene content is relatively constant [1,2]. Ara- bidopsis thaliana (estimated haploid size of 115 Mb) contains more than 25,000 genes [3], and the Human genome (estimated haploid size of 3200 Mb) contains 20,000­25,000 genes [4]. In addition to the gene content, the conservation in the synteny (the presence of two or more genes in the same chromosome) and gene order has been observed among many plant species. One of the ear- liest observations of conserved macrosynteny was between potato and tomato in Solanaceae, where cDNA markers along the 12 chromosomes were largely collinear [5]. Significant conservation in the marker and gene order has been observed among grass species, despite the diverse genome size and chromosome numbers [6-8]. Similar conserved macrosynteny has also been observed in Rosaceae. Comparisons of anchor markers of the Prunus reference map with those of 13 maps constructed with other Prunus populations showed that the genomes of seven Prunus diploid species are essentially collinear [9]. Large collinear blocks were also detected among different genera in Rosaceae, such as Prunus and Malus [9]. On the other hand, genome sequence comparisons have revealed that plant genome evolution involved various small chromosomal rearrangements, such as insertions, deletions, inversions and translocations [10]. For exam- ple, Kilian and coworkers have shown that a barley gene in regions of high microsynteny with rice is in fact trans- posed to a position that is no longer syntenous with rice [11]. In addition to small chromosomal rearrangements, large segmental duplications and polyploidy is prevalent in plant genome evolution [12-14]. Genome duplication was well observed in Brassicaceae; The Brassica genome is extensively triplicated [15] and the Arabidopsis genome contains numerous large duplicated chromosomal seg- ments [3,16]. Comparative physical mapping between Brassica species and Arabidopsis showed high conservation in the gene order but not the gene content, possibly result- ing from random gene loss after extensive genome dupli- cation in both genomes [14]. The degree of synteny conservation has also been exam- ined between Arabidopsis and less closely related species. Rosid I and rosid II comparisons (Figure 1) have been made by sequence homology between soybean marker sequences and Arabidopsis sequences [17]. Shared linkages were identified along with signs of extensive genome duplication and reorganization. A few microsyntenic regions were also identified by comparative physical map- ping between Arabidopsis and soybean [18]. A gene-con- taining BAC sequence of tomato (asteroid I) had conserved synteny with four different segments of Arabi- dopsis chromosomes 2­5 [19]. Synteny between Arabidopsis and four dicotyledonous spe- cies from three major families, caryophyllids, rosids and asteroids, has also been explored by constructing genetic maps based on ESTs that are homologous to Arabidopsis genes [20]. Some syntenic blocks were conserved in all five maps, Arabidopsis, sugar beet, potato, sunflower and Prunus, suggesting their evolutionary significance. The syntenic blocks usually contained only several loci, how- ever, and each linkage group of the crop genetic maps matched to multiple Arabidopsis genome regions. Com- A dendrogram depicting the phylogenetic relationship of peach, Arabidopsis and many other crop speciesFigure 1 A dendrogram depicting the phylogenetic relationship of peach, Arabidopsis and many other crop species. The probable posi- tion of the recent polyploidization event identified from Blanc and corworkers (22) is marked by an arrow. Figure is based on Figure 1 in reference 19 and Figure 5 in reference 22. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 3 of 17 (page number not for citation purposes) plex syntenic relationships, suggestive of chromosome rearrangement, selective gene loss and genome duplica- tion, were also observed [20]. Synteny between rice and Arabidopsis genomes, after 200 million years of divergence [21], were also observed, but the syntenic regions were scarce and separated by intervening proteins as previously suggested [20]. Also, most of the rice syntenic regions map to more than one Arabidopsis chromosome [21], support- ing the theme of large scale genome duplication and selec- tive gene loss in plant genome evolution. A recent study has systematically analyzed the timing and number of segmental duplications in the Arabidopsis genome and suggested a recent polyploidy superimposed on older large-scale duplication [22]. The recent poly- ploidy appeared to have occurred during the early emer- gence of the Brassicaceae family and the older set of duplicated blocks between rosid I and rosid II groups. One of the interesting outcomes from this study is the reconstruction of the approximate gene order of the ancestral genome that existed prior to the recent poly- ploidy event. The reconstruction was done by merging genes in both sister regions duplicated at the time of poly- ploidy. Rosaceae contains numerous important fruit crops such as peach, apple, cherry, pear, raspberry, blackberry and strawberry [23]. Due to the lack of availability of large genomic sequences for peach or other Rosaceae species, little information has been available to study the degree of synteny conservation between the Rosaceae species and Arabidopsis. A recent study has detected fragmentary mac- rosynteny between the Prunus general map and Arabidop- sis, from comparisons of the genetic marker sequences Table 1: Number of conserved syntenic regions between Arabidopsis and Prunus genetic maps. Map Name No. anchored ESTs No. Syntenic regions (No. three or more gene pairs) 1TxE (almond × peach) 306 68 (12) 2PxF (peach × peach × P. ferganensis) 188 9 (1) 3JxF (peach) 78 7 (1) 4GxN (almond × peach) 82 1 (0) 5FxT (almond) 171 45 (6) 6FxB (almond) 119 9 (0) All Maps 475 139 (20) 1Dirlewanger et al. 2004 (9); 2Dettori et al. 2001 (33); 3Dirlewanger et al. 1999 (34); 4Jáuregui et al. 2001 (35); 5Joobeur et al. 2004 (36); 6Ballester et al. 2001 (37) Number of syntenic groups in each TxE linkage group that match to each Arabidopsis chromosomeFigure 2 Number of syntenic groups in each TxE linkage group that match to each Arabidopsis chromosome. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 4 of 17 (page number not for citation purposes) Table 2: Conserved syntenic regions with three or more gene pairs between the Arabidopsis genome and Prunus genetic maps. Peach Group # Pairs Arabidopsis Putative Function EST Name Linkage Group gp15 3 AT1G02460 glycoside hydrolase family 28 protein PP_LEa0030E14f FxT-G3F AT1G02130 Ras-related protein (ARA-5) PP_LEa0010O05f AT1G03000 AAA-type ATPase family protein PP_LEa0001O24f gp21 3 AT1G53750 26S proteasome AAA-ATPase subunit (RPT1a) PP_LEa0010K05f PxF-G6 AT1G54080 oligouridylate-binding protein PP_LEa0012K19f AT1G54110 cation exchanger, putative (CAX10) Ca2+ PP_LEa0007O07f gp33 3 AT1G66540 cytochrome P450 PP_LEa0013L12f TxE-G5 AT1G66250 glycosyl hydrolase family 17 protein PP_LEa0012I12f AT1G66680 S locus-linked protein PP_LEa0003H24f gp42 3 AT2G35330 zinc finger (C3HC4-type RING finger) protein PP_LEa0017P13f JxF-G7 AT2G35930 U-box domain-containing protein PP_LEa0004C12f AT2G36530 enolase PP_LEa0003M24f gp54 3 AT2G36530 enolase PP_LEa0003M24f TxE-G7 AT2G35930 U-box domain-containing protein PP_LEa0004C12f AT2G35330 zinc finger (C3HC4-type RING finger) protein-related PP_LEa0017P13f gp74 3 AT3G60340 palmitoyl protein thioesterase family protein PP_LEa0012C18f TxE-G5 AT3G60510 enoyl-CoA hydratase/isomerase family protein PP_LEa0009I06f AT3G60030 squamosa promoter-binding protein-like 12 (SPL12) PP_LEa0002J03f gp75 3 AT3G07160 glycosyl transferase family 48 protein PP_LEa0004K19f TxE-G5 AT3G06650 ATP-citrate synthase, ATP-citrate (pro-S-)-lyase PP_LEa0005D13f AT3G06880 transducin family protein PP_LEa0009A14f gp76 3 AT3G02770 dimethylmenaquinone methyltransferase PP_LEa0030G03f TxE-G5 AT3G01930 nodulin family protein similar to nodulin-like protein PP_LEa0012O21f AT3G02420 expressed protein PP_LEa0037N22f gp80 3 AT3G08560 vacuolar ATP synthase subunit E PP_LEa0009M17f TxE-G6 AT3G08710 thioredoxin family protein PP_LEa0016G12f AT3G08770 lipid transfer protein 6 (LTP6) PP_LEa0029C22f gp85 3 AT4G17720 RNA recognition motif (RRM)-containing protein PP_LEa0027L14f FxT-G2F AT4G16900 disease resistance protein (TIR-NBS-LRR class) PP_LEa0003A21f AT4G17483 palmitoyl protein thioesterase family protein PP_LEa0012C18f gp98 3 AT4G17483 palmitoyl protein thioesterase family protein PP_LEa0012C18f TxE-G5 AT4G17486 expressed protein PP_LEa0005J05f BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 5 of 17 (page number not for citation purposes) AT4G17615 calcineurin B-like protein 1 (CBL1) PP_LEa0009N08f gp101 3 AT4G32450 pentatricopeptide (PPR) repeat-containing protein PP_LEa0009C16f TxE-G5 AT4G31970 cytochrome P450 family protein PP_LEa0013L12f AT4G31810 enoyl-CoA hydratase/isomerase family protein PP_LEa0009I06f gp106 3 AT5G61790 calnexin 1 (CNX1) PP_LEa0006I23f FxT-G1F AT5G62310 incomplete root hair elongation (IRE)/protein kinase PP_LEa0009I05f AT5G62090 expressed protein PP_LEa0030I08f gp109 3 AT5G47350 palmitoyl protein thioesterase family protein PP_LEa0012C18f FxT-G2F AT5G47710 C2 domain-containing protein contains PP_LEa0011F23f AT5G46870 RNA recognition motif (RRM)-containing protein PP_LEa0027L14f gp114 3 AT5G03520 Ras-related GTP-binding protein PP_LEa0010O05f FxT-G3F AT5G03340 cell division cycle protein 48, putative/CDC48 PP_LEa0001O24f AT5G03650 1,4-alpha-glucan branching enzyme PP_LEa0009P15f gp115 3 AT5G07990 flavonoid 3'-monooxygenase PP_LEa0007M11f FxT-G3F AT5G07340 calnexin PP_LEa0006I23f AT5G08470 peroxisome biogenesis protein (PEX1) PP_LEa0001O24f gp126 3 AT5G08390 transducin family protein PP_LEa0010I06f TxE-G1 AT5G07990 flavonoid 3'-monooxygenase PP_LEa0007M11f AT5G07340 calnexin PP_LEa0006I23f gp128 4 AT5G47350 palmitoyl protein thioesterase family protein PP_LEa0012C18f TxE-G2 AT5G46870 RNA recognition motif (RRM)-containing protein PP_LEa0027L14f AT5G47810 phosphofructokinase family protein PP_LEa0001K06f AT5G47710 C2 domain-containing protein PP_LEa0011F23f gp132 3 AT5G47100 calcineurin B-like protein 9 (CBL9) PP_LEa0009N08f TxE-G5 AT5G47350 palmitoyl protein thioesterase family protein PP_LEa0012C18f AT5G47310 expressed protein PP_LEa0005J05f gp133 3 AT5G10840 endomembrane protein 70, putative TM4 family PP_LEa0015M20f TxE-G5 AT5G11110 sucrose-phosphate synthase PP_LEa0003F22f AT5G10430 arabinogalactan-protein (AGP4) PP_LEa0008B15f Table 2: Conserved syntenic regions with three or more gene pairs between the Arabidopsis genome and Prunus genetic maps. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 6 of 17 (page number not for citation purposes) and their Arabidopsis homologs [9]. When sequences of three peach genomic regions were used, only short (two or three genes) blocks that are collinear with the Arabidopsis genome were found [24]. With the international effort to make peach the reference species for the Rosaceae family, peach physical mapping is underway and peach ESTs are being anchored to both the genetic and physical map [25]. The objective of this study was to assess the degree of con- served synteny between Prunus and Arabidopsis using these extensive EST sequences anchored to the genetic and physical maps. We also used the reconstructed ancestral Arabidopsis genome to see if we coulc find additional syn- tenic regions. This study demonstrates that comparative genome analyses between the reconstructed Arabidopsis genome and other plant species can further facilitate the utilization of the genetic resources of both species and help us to understand the evolutionary relationship between these species. Results Conserved synteny between Prunus and Arabidopsis We searched for conserved syntenic regions between the Prunus maps and the Arabidopsis genome using 475 peach ESTs anchored to the Prunus maps and their Arabidopsis homologs detected by a FASTX sequence similarity search (E value less than 10 -5). The syntenic groups were selected when the distance between the two adjacent matches were less than 250 kb in the Arabidopsis genome and less than 10 cM in the Prunus maps. We detected 139 conserved syntenic regions, and 20 of them had three or more gene pairs. The number of syntenic regions between Arabidopsis and each of the Prunus maps are shown in Table 1. Microsyntenic regions were detected between all five Ara- bidopsis chromosomes and seven of the eight linkage groups of the Prunus TxE reference map. All of the TxE linkage groups which contained syntenic regions matched to more than two different Arabidopsis chromosomes (Fig- ure 2). The gene pairs in the syntenic regions showed sig- nificant sequence similarity; 78% had E values less than 10 -15, and 88% had E values less than 10 -10. There were 20 conserved syntenic regions with three or more gene pairs between the Prunus TxE map and the Ara- bidopsis genome (Figure 3). Table 2 lists these syntenic regions with the putative functions of the Arabidopsis genes. The largest block (group gp128) had four gene pairs, and covered 20 cM in G2 of the TxE Prunus map and 342 Kb in chromosome 5 of Arabidopsis (Figure 3). Among 20 regions with three or more gene pairs, five groups showed conserved gene order. In two groups, the colline- arity could not be assessed because two different peach ESTs were anchored to the same BAC, probably by hybrid- izing to different gene sequences in the same BAC. In the rest of the syntenic groups, the gene order was not con- served, suggesting many chromosomal rearrangement events. Reflecting the synteny conservation among Prunus maps, we detected many Arabidopsis regions matching to more than one Prunus map region. In groups gp42 and gp54, the Arabidopsis genes matched to the ESTs that were anchored to the same markers present in the linkage group G7 of both the TxE Prunus map and the JxF peach map (Table 2). In groups gp85 and gp98, the Arabidopsis genes within 350 kb matched to ESTs anchored to G2F of the FxT almond map and G5 of the TxE Prunus map (Table 2). Most of the peach ESTs showed strong similarity to more than one Arabidopsis genes, and we were able to detect Pru- nus blocks that map to more than one site in the Arabidop- sis genome. Interestingly, some of these putative duplicated Arabidopsis regions were located in the Arabi- Conserved syntenic regions with three or more gene pairsbetween Arabidopsis genome and Prunus genomeFigure 3 Conserved syntenic regions with three or more gene pairs between Arabidopsis genome and Prunus genome. Bolded blocks are the ones with conserved gene order. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 7 of 17 (page number not for citation purposes) dopsis paralogous blocks ­ duplicated blocks in a genome ­ reported in the previous study [21]. Figure 4 shows those Prunus blocks, syntenic to two different Arabidopsis regions, juxtaposed to the plot of the paralogous blocks of Arabidopsis. All three paralogons were the ones that were generated by a recent polyploidy event that occurred dur- ing the early emergence of the Brassicaceae. Arabidopsis blocks with conserved synteny to a region in FxT-G1F and JxF-G1 belong to the paralogons in chromosome 1 and 4, and those with conserved synteny to a region in FxT-G2T belong to the paralogons in two different arms of chromo- some 5 (Figure 4). Three distinct regions in TxE ­ linkage groups G2, G4 and G5 ­ showed conserved synteny to three overlapping blocks in each paralogon on chromo- some 4 and 5 (Figure 4). These TxE map regions may rep- resent triplicated Prunus regions that subsequently went through selective gene loss. Synteny between Prunus and the pseudo ancestral Arabidopsis genome To further analyze the evolutionary relationship between the Arabidopsis and Prunus genomes, we searched for con- served syntenic regions between Prunus maps and the ancestral Arabidopsis genome [22]. The pseudo ancestral genome contained 20187 genes, which is about 69% of the genes in the present genome, arranged in a linear array. We used the same 475 peach ESTs and their Arabi- dopsis homologs detected by FASTX sequence similarity searching (E value less than 10 -5) in our search for the conserved syntenic regions. The syntenic groups were selected when the number of genes between the two adja- Prunus genomic blocks that map to two distinct Arabidopsis regionsFigure 4 Prunus genomic blocks that map to two distinct Arabidopsis regions. Shown are the Prunus blocks that identified Arabidopsis sis- ter regions generated by the proposed polyploidy event. The Prunus blocks with the same color (red or green) are homologous regions that share more than two anchored ESTs. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 8 of 17 (page number not for citation purposes) cent matches is less than 61 in the Arabidopsis genome and the distance less than 10 cM of the Prunus maps. The esti- mated number of genes in 250 kb was used as the maxi- mum distance between two matches in the Arabidopsis genome, since only the gene order, instead of the kb, was available as a position along the ancestral genome (see Methods). We detected 101 conserved syntenic regions, and 12 of them had three or more gene pairs. The details, including the putative functions of the syntenic blocks with three or more gene pairs, are shown in Table 3. Fewer syntenic blocks were detected in the ancestral genome using these criteria, but much fewer blocks matched to the duplicated Arabidopsis genome. In the present Arabidopsis genome, 20 syntenic blocks, with three conserved genes, matched to 14 distinct Prunus regions, but, in the ancestral genome, 12 syntenic blocks matched to 10 distinct Prunus regions. Some groups contained the same Arabidopsis gene and peach EST pairs as in the syntenic groups detected from the Prunus-present Arabidopsis genome analysis. Several new Prunus regions were found to have conserved synteny with the ancestral Arabidopsis genome. The Arabidopsis genes in these syntenic blocks were apparently relocated in distinct regions after the putative Arabidopsis genome duplication event. For example, group ga54 in ancestral genome is composed of two genes in chromosome 5 and one from chromosome 3, and they were paired with ESTs that were anchored to the linkage group G1 of TxE map. Group ga28 and ga79 represent regions where three genes were closely located in the ancestral genome but they were rearranged into two different regions of the present Arabi- dopsis chromosome 5. We also found examples where the gene content in the Prunus genome is more conserved in the ancestral genome than the present Arabidopsis genome. For example, group ga81 in ancestral genome contains four gene pairs that match to the linkage group G5 of the TxE map (Figure 5). Group gp48 and gp101 in the present genome match to the same region in TxE-G5, but contain only part of the gene pairs. Figure 5 illustrates the proposed evolutionary steps that may have occurred in these regions: large scale genome duplication and subsequent selective gene loss and gene duplication. The genomic regions in chromo- some 2 and 4 were part of the previously reported dupli- cated regions with 68 gene pairs [22], supporting our proposed evolutionary steps. Synteny analysis between the peach physical transcriptome map and the Arabidopsis genome We also used peach EST sequences that are anchored to the developing peach physical map to search for con- served syntenic regions between peach and Arabidopsis. Our data were composed of 1097 peach ESTs that are anchored to 431 BAC contigs, and their Arabidopsis homologs detected by FASTX sequence similarity search- ing (E value less than 10 -5). The sequence similarity search results produced 4448 peach-Arabidopsis sequence pairs that consist of 904 distinct ESTs and 3747 distinct Arabidopsis proteins. These sequence pairs were used to detect syntenic regions between peach and Arabidopsis. The syntenic groups were selected when the distance between the two adjacent matches was less than 250 kb in the Arabidopsis genome and anchored to the same BAC contig. Our analysis identified 287 Arabidopsis genes and 204 peach ESTs found in 140 syntenic blocks with at least two gene pairs. The syntenic blocks were found in all of the five Arabidopsis chromosomes. In peach, the syntenic blocks were found in a total of 77 BAC contigs. The syn- teny conservation was fragmentary; 16 out of the 18 BAC contigs with multiple syntenic regions matched to more than one Arabidopsis chromosome. The number of gene pairs in the syntenic blocks was small: two blocks with four gene pairs, 14 blocks with three gene pairs and 124 blocks with two gene pairs. The syntenic blocks with three or more gene pairs are shown in Table 4 and Figure 6. Only two of the 16 blocks were collinear. It is possible that the content in the block is con- served but the gene order has differentially evolved in the two genomes. On the other hand, the order of the peach ESTs was estimated by the positions of the EST-hybridiz- ing BACs in a BAC contig which may not represent the actual order of the ESTs in the genome. The average size of the syntenic blocks in Arabidopsis genome was 97 kb with a maximum 360 kb (group pp96: Arabidopsis chromo- some 4 and ctg2264) and minimum 2.7 kb. Groups pp129 and pp130 were close enough to be combined into one syntenic region containing five gene pairs, and they covered 451 kb in the Arabidopsis genome (Figure 6). Ctg2264 is the BAC contig that has the most anchored ESTs. It is composed of only five BACs but has 70 anchored ESTs, suggesting it represents a gene-rich region. Ctg2264 and the Arabidopsis genome had a number of syn- tenic regions including nine with three gene pairs and 22 with two gene pairs. In eight cases, the same peach EST sets in ctg2264 matched to two distinct Arabidopsis regions. It is notable that a relatively small contig, com- posed of only five overlapping BACs, had numerous microsyntenic regions found in all five Arabidopsis chro- mosomes. Ctg1502 has the second most anchored ESTs, and all the 48 anchored ESTs are limited to three BACs of the total 14 BACs composing the contig. Despite the many anchored ESTs in ctg1502, only three syntenic regions with two gene pairs were found. Only 11 of the 48 anchored ESTs had Arabidopsis homologs, suggesting that BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 9 of 17 (page number not for citation purposes) Table 3: Conserved syntenic regions with three or more gene pairs between the pseudo-ancestral Arabidopsis genome and Prunus genetic maps. Peach Group # Pairs Arabidopsis Putative Function EST Name BAC Contig ga18 3 AT5G47350 palmitoyl protein thioesterase family protein PP_LEa0012C18f FxT-G2F AT4G17720 RNA recognition motif (RRM)-containing protein PP_LEa0027L14f AT5G47710 C2 domain-containing protein contains PP_LEa0011F23f ga28 3 AT5G07340 calnexin, putative PP_LEa0006I23f FxT-G3F AT5G07990 flavonoid 3'-monooxygenase PP_LEa0007M11f AT5G61580 phosphofructokinase family protein PP_LEa0001K06f ga29 3 AT5G14650 polygalacturonase, putative/pectinase, putative PP_LEa0030E14f FxT-G3F AT3G01610 AAA-type ATPase family protein PP_LEa0001O24f AT5G14370 expressed protein PP_LEa0011N22f ga54 3 AT5G59180 DNA-directed RNA polymerase II PP_LEa0026O17f TxE-G1 AT5G59840 Ras-related GTP-binding family protein epsin N- terminal homology (ENTH) domain-containing PP_LEa0036D15f AT3G46540 PP_LEa0003I01f ga60 4 AT2G24640 ubiquitin carboxyl-terminal hydrolase family protein PP_LEa0006J17f TxE-G1 AT4G32400 mitochondrial substrate carrier family protein PP_LEa0009H16f AT2G25420 transducin family protein PP_LEa0009H21f AT2G25160 cytochrome P450 PP_LEa0013L12f ga66 3 AT4G17720 RNA recognition motif (RRM)-containing protein PP_LEa0027L14f TxE-G2 AT5G47350 palmitoyl protein thioesterase family protein PP_LEa0012C18f AT5G47710 C2 domain-containing protein PP_LEa0011F23f ga77 3 AT4G17486 expressed protein PP_LEa0005J05f TxE-G5 AT5G47350 palmitoyl protein thioesterase family protein PP_LEa0012C18f AT4G17615 calcineurin B-like protein 1 (CBL1) PP_LEa0009N08f ga79 3 AT5G25170 expressed protein PP_LEa0005J05f TxE-G5 AT5G11110 sucrose-phosphate synthase PP_LEa0003F22f AT5G10840 endomembrane protein 70, putative TM4 family; PP_LEa0015M20f ga81 4 AT4G31940 cytochrome P450 PP_LEa0013L12f TxE-G5 AT2G25190 expressed protein PP_LEa0005J05f AT2G25160 cytochrome P450 PP_LEa0013L12f AT4G31810 enoyl-CoA hydratase/isomerase family protein PP_LEa0009I06f ga83 3 AT1G66540 cytochrome P450 PP_LEa0013L12f TxE-G5 AT1G66250 glycosyl hydrolase family 17 protein PP_LEa0012I12f AT1G66680 S locus-linked protein PP_LEa0003H24f ga94 3 AT5G58160 formin homology 2 domain-containing protein PP_LEa0035A24f TxE-G6 AT5G57990 ubiquitin-specific protease 23 PP_LEa0006J17f AT5G58590 Ran-binding protein 1, putative/RanBP1, putative PP_LEa0003G19f ga95 3 AT5G01870 lipid transfer protein, putative PP_LEa0029C22f TxE-G6 AT3G08560 vacuolar ATP synthase subunit E PP_LEa0009M17f AT3G08710 thioredoxin family protein PP_LEa0016G12f BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 10 of 17 (page number not for citation purposes) the rest of the ESTs may represent genes that do not exist in the Arabidopsis gene repertoire. However, it is also pos- sible that we will detect more Arabidopsis homologs, hence more microsyntenic regions, when the entire gene sequences are available instead of short EST sequences. In addition to the blocks in ctg2264, we found many other peach blocks corresponding to more than one syn- tenic region in Arabidopsis, reflecting the fact that the Ara- bidopsis genome contains numerous large duplicated segments [21]. In our data set, there were 21 peach seg- ments that each corresponds to more than one distinct Arabidopsis segment. As expected, the Arabidopsis genes that matched to the same peach ESTs in these duplicated regions had similar putative function or belong to the same protein family. Some of the syntenic blocks, espe- cially those duplicated in the Arabidopsis genome, were composed of genes with related function, suggesting that related genes that tend to cluster in Arabidopsis also do in peach. For example, all four Arabidopsis genes in groups pp77 and pp110 were FAD-binding domain-containing protein, similar to reticuline oxidase precursor. Similar observation has been reported in the analysis between Arabidopsis and rice [25]. We also observed two Arabidopsis segments that each corresponds to more than one distinct peach segment. Groups pp113 and pp132 involve an Ara- bidopsis region with three genes in chromosome 5 match- ing three peach ESTs in two different contigs (ctg1505 and ctg2269) and groups pp114 and pp123 involve an Arabi- dopsis region that matches to two different peach contigs (ctg1565 and ctg2287). Synteny analysis between the peach physical transcriptome map and the reconstructed Arabidopsis ancestral genome The evolutionary relationship between Arabidopsis and peach was further analyzed by searching for conserved syntenic regions between the ancestral Arabidopsis genome and the peach physical transcriptome map. The syntenic groups were selected when the number of genes between the two adjacent matches was less than 61 in the Arabidop- sis genome and anchored to the same BAC contig. This analysis identified 231 Arabidopsis proteins and 179 peach ESTs found in 111 conserved gene blocks. The average block size in the Arabidopsis genome was 27.6 genes with a maximum of 97 genes and a minimum of two genes. The estimated size of the syntenic blocks, using the aver- age size of the Arabidopsis genome containing one gene per 4.1 kb (see Methods), is on average 113.2 kb with a max- imum 397.7 kb and a minimum of 8.2 kb. The syntenic blocks were distributed quite evenly across the ancestral genome. In peach, the syntenic blocks were found in a Proposed evolutionary steps involving some syntenic blocks between Arabidopsis and the Prunus genomesFigure 5 Proposed evolutionary steps involving some syntenic blocks between Arabidopsis and the Prunus genomes. Blocks in the puta- tive ancestral Arabidopsis genome and Arabidopsis chromosome 2 and 4 that match to the same block in Prunus TxE map are illustrated. Red and green colors were used to help track the genes. Dashed lines were used to indicate the relationship with less stronger homology when the same EST was homologous to more than one Arabidopsis genes. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 11 of 17 (page number not for citation purposes) Table 4: Conserved syntenic regions with three or more gene pairs between the Arabidopsis genome and EST-anchored peach BAC contigs. Peach Group # Pairs Arabidopsis Putative Function EST Name BAC Contig pp23 3 AT1G19570 dehydroascorbate reductase PP_LEa0036C16f ctg2264 AT1G20010 tubulin beta-5 chain (TUB5) PP_LEa0035B10f AT1G20450 dehydrin (ERD10) PP_LEa0035C17f pp48 3 AT2G18470 protein kinase family protein PP_LEa0036C20f ctg2264 AT2G18840 integral membrane Yip1 family protein PP_LEa0034N14f AT2G18280 tubby-like protein 2 (TULP2) PP_LEa0034J18f pp52 4 AT2G40280 Putative methyltransferase PP_LEa0017H06f ctg58 AT2G39750 Putative methyltransferase PP_LEa0017H06f AT2G39770 GDP-mannose pyrophosphorylase (GMP1) PP_LEa0005L09f AT2G40060 expressed protein PP_LEa0017F24f pp54 3 AT2G19740 60S ribosomal protein L31 (RPL31A) PP_LEa0008A18f ctg9 AT2G19680 mitochondrial ATP synthase g subunit PP_LEa0025C15f AT2G19730 60S ribosomal protein L28 (RPL28A) PP_LEa0001M19f pp69 3 AT3G02200 proteasome family protein PP_LEa0025D12f ctg2264 AT3G02310 developmental protein SEPALLATA2 PP_LEa0035H10f AT3G01520 universal stress protein (USP) family PP_LEa0025L13f pp94 3 AT4G27880 seven in absentia (SINA) family protein PP_LEa0035M04f ctg2264 AT4G27560 glycosyltransferase family protein PP_LEa0036D18f AT4G27740 Yippee putative zinc-binding protein PP_LEa0035H22f pp96 3 AT4G10710 transcriptional regulator-related PP_LEa0034P24f ctg2264 AT4G11450 expressed protein PP_LEa0035H16f AT4G11030 long-chain-fatty-acid ­ CoA ligase PP_LEa0034M07f pp113 3 AT5G66460 PP_LEa0003M21f ctg1505 AT5G66140 20S proteasome alpha subunit D2 PP_LEa0027M15f AT5G66510 bacterial transferase PP_LEa0009C17f pp114 4 AT5G08400 expressed protein PP_LEa0011C13f ctg1565 AT5G08380 alpha-galactosidase PP_LEa0009B18f AT5G08540 expressed protein PP_LEa0027N06f AT5G08410 ferredoxin-thioredoxin reductase PP_LEa0009N05f pp119 3 AT5G47040 Lon protease homolog 1 PP_LEa0001P13f ctg190 AT5G47020 glycine-rich protein PP_LEa0012O09f AT5G47010 RNA helicase PP_LEa0010E19f pp126 3 AT5G54010 glycosyltransferase family protein PP_LEa0036D18f ctg2264 AT5G53940 Yippee putative zinc-binding protein PP_LEa0035H22f AT5G53770 nucleotidyltransferase family protein PP_LEa0025D10f pp127 3 AT5G51050 mitochondrial substrate carrier family protein PP_LEa0034P07f ctg2264 AT5G50550 WD-40 repeat family protein/St12p protein PP_LEa0036H23f AT5G51180 expressed protein similar to auxin down- regulated protein PP_LEa0035K24f pp128 3 AT5G43830 ARG10 PP_LEa0034K23f ctg2264 AT5G44340 tubulin beta-4 chain (TUB4) PP_LEa0035B10f AT5G44090 calcium-binding EF hand family protein PP_LEa0035H07f pp130 3 AT5G15160 bHLH family protein PP_LEa0035P14f ctg2264 AT5G14680 universal stress protein (USP) family protein PP_LEa0025L13f AT5G14590 isocitrate dehydrogenase PP_LEa0034O16f pp132 3 AT5G66460 PP_LEa0003M21f ctg2269 AT5G66510 bacterial transferase PP_LEa0009C17f AT5G66140 20S proteasome alpha subunit PP_LEa0027M15f pp137 3 AT5G53280 expressed protein PP_LEa0027O13f ctg378 AT5G53310 myosin heavy chain-related PP_LEa0013H04f AT5G53340 galactosyltransferase family protein PP_LEa0003L02f BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 12 of 17 (page number not for citation purposes) total of 69 contigs. Among the 111 syntenic blocks, two blocks had four gene pairs, 12 blocks had three gene pairs and the rest had two gene pairs. The details of the 12 blocks with three or more gene pairs are shown in Table 5. Four of the 12 blocks with three or more gene pairs were collinear. Five groups contained the same Arabidopsis gene and peach EST pairs as those in the syntenic groups detected from the peach-present Arabidopsis genome anal- ysis. Four groups involved the same regions to the ones observed in the peach-present Arabidopsis genome analy- sis, except that one or two peach ESTs were paired with Arabidopsis proteins from other duplicated regions. The rest of the blocks disclose peach regions that have con- served synteny with the ancestral Arabidopsis genome but not with the present one. In group pa3, AT5G60910 and the other two genes are closer in the ancestral genome, with only four genes in between, than in the present genome where they are 21 Mbp apart from each other. Groups pa5 and pa35 shows a similar situation in which three genes are far apart in the same chromosome of the present genome, but they are much closer in the ancestral genome. Ctg2264, containing the most anchored ESTs, had one with four unordered gene pairs, four with three unordered gene pairs and 18 with two gene pairs. Upon close exam- ination, the syntenic block with the five unordered genes observed in the present Arabidopsis genome (Figure 6) was also detected in the ancestral genome (Figure 7). The block was not detected from our original analysis because some of the gaps between the genes were larger than the limit set by the search parameters. The comparison revealed a syntenic block with six gene pairs in the ances- tral genome and two blocks containing rearranged gene pairs in chromosome 3 and 5 of the present Arabidopsis genome (Figure 7). Figure 7 illustrates the proposed evo- lutionary steps that may have occurred in these regions: large scale genome duplication and subsequent selective gene loss in chromosome 3 and inversion in chromosome 5. Since the reconstructed ancestral Arabidopsis genome has been reported to contain a considerable amount of duplicated regions [22], we searched for peach EST seg- ments that paired with more than one distinct Arabidopsis region. In this data set, there were eleven peach segments that each corresponds to two distinct Arabidopsis seg- Conserved syntenic regions with three or more gene pairs between Arabidopsis genome and EST-anchored peach BAC contigsFigure 6 Conserved syntenic regions with three or more gene pairs between Arabidopsis genome and EST-anchored peach BAC contigs. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 13 of 17 (page number not for citation purposes) ments. It is notable, however, that twice as many dupli- cated blocks were identified by the peach EST segments in the present genome than the ancestral genome. We also observed three Arabidopsis segments that each corre- sponded to more than one distinct peach segment. Two Arabidopsis segments identified the same duplicated peach segments, detected from the analysis with the present Ara- bidopsis genome. Another Arabidopsis region identified duplicated peach regions in ctg1112 and ctg2175. Simulation study To determine whether the syntenic groups we report were detected by chance, we tested the statistical significance for each group. Both the current and putative ancestral Arabidopsis genomes were randomized by leaving the loca- tions the same but permuting the gene names. We ana- lyzed 1000 simulated Arabidopsis genomes for the occurrence of the each conserved syntenic group and cal- culated the probability of the match occurring by chance. The probability of the association by chance was less than 1% for all the syntenic groups with more than three gene pairs. The numbers of syntenic groups at various signifi- cance thresholds are shown in Table 6. Discussion We surveyed the degree of synteny conservation between the Prunus and the Arabidopsis genomes using extensive EST sequences anchored to several Prunus genetic maps and the developing peach physical map. Our study is the first to systematically examine the conserved microsyn- teny using DNA sequences across the Prunus genome and their Arabidopsis homologs. We could detect considerable conserved microsytenic regions even with our stringent parameters. Among the 475 genetically anchored ESTs, 142 distinct ESTs belong to the syntenic groups that were conserved with either the present or ancestral Arabidopsis genomes. However, the syntenic blocks were rather small in size and contained only a few gene pairs. In addition, most of the BAC contigs with more than two conserved syntenic regions matched to more than one Arabidopsis chromosome. Our finding is in accordance with the previ- ous study of peach BAC sequences that the segments with a gene order congruent with Arabidopsis were short in any peach region studied and the corresponding segments were found in diverse locations in the Arabidopsis genome [24]. From the analysis with the genetically anchored ESTs, the largest block we detected had four gene pairs, and covered 20 cM in G2 of the TxE Prunus map and 342 Kb in chromosome 5 of Arabidopsis. From the analysis with the physical map-anchored ESTs, the largest block we detected contained five gene pairs and spanned 451 kb in the Arabidopsis genome. We may be able to find more syn- tenic blocks with over three gene pairs when more ESTs are hybridized to map-anchored BACs and longer BAC contigs are available. We may also find more syntenic blocks when the entire gene sequences are available. The results from the BAC contig rich in anchored ESTs, how- ever, suggest that the syntenic regions between Arabidopsis and peach are typically small and contain several gene pairs at most. For example, ctg2264, with five BACs and 70 anchored ESTs, have numerous microsyntenic regions in all five Arabidopsis chromosomes instead of having rel- atively large syntenic regions. We also detected conserved syntenic regions in the pseudo ancestral Arabidopsis genome that existed prior to the Proposed evolutionary steps involving some syntenic blocks between Arabidopsis and Peach genomesFigure 7 Proposed evolutionary steps involving some syntenic blocks between Arabidopsis and Peach genomes. Blocks in the putative ancestral Arabidopsis genome and Arabidopsis chromosome 3 and 5 that match to the same peach BAC contig are illustrated. Red colors were used to help track the genes. The order of the ESTs in the BAC contig was not shown because the ESTs were anchored to overlapping BACs. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 14 of 17 (page number not for citation purposes) Table 5: Conserved syntenic regions with three or more gene pairs between the pseudo-ancestral Arabidopsis genome and EST- anchored peach BAC contigs. Peach Group # Pairs Arabidopsis Putative Function EST Name BAC Contig pa3 3 AT5G07990 flavonoid 3'-monooxygenase PP_LEa0010I09f ctg1172 AT5G08100 L-asparaginase/L-asparagine amidohydrolase PP_LEa0007L05f AT5G60910 agamous-like MADS box protein AGL8 PP_LEa0002N13f pa4 3 AT2G45560 cytochrome P450 family protein PP_LEa0010I09f ctg1172 AT3G61040 cytochrome P450 family protein PP_LEa0010I09f AT2G45650 MADS-box protein (AGL6) PP_LEa0002N13f pa5 3 AT1G68020 glycosyl transferase family 20 protein PP_LEa0001F16f ctg1172 AT1G23870 glycosyl transferase family 20 protein PP_LEa0001F16f AT1G24260 MADS-box protein (AGL9) PP_LEa0002N13f pa23 3 AT5G66510 contains bacterial transferase hexapeptide repea PP_LEa0009C17f ctg1505 AT5G66140 20S proteasome alpha subunit D2 PP_LEa0027M15f AT5G66460 PP_LEa0003M21f pa26 4 AT5G08380 alpha-galactosidase/melibiase PP_LEa0009B18f ctg1565 AT5G08540 expressed protein PP_LEa0027N06f AT5G08400 expressed protein predicted proteins PP_LEa0011C13f AT5G23440 ferredoxin-thioredoxin reductase PP_LEa0009N05f pa35 3 AT5G26030 ferrochelatase I PP_LEa0004A06f ctg1823 AT5G11710 epsin N-terminal homology domain- containing protein PP_LEa0003I01f AT5G11770 NADH-ubiquinone oxidoreductase 20 kDa subunit PP_LEa0001H16f pa37 3 AT5G47010 RNA helicase PP_LEa0010E19f ctg190 AT5G47040 Lon protease homolog 1, mitochondrial (LON) PP_LEa0001P13f AT5G47020 glycine-rich protein PP_LEa0012O09f pa59 3 AT4G27740 yippee family protein PP_LEa0035H22f ctg2264 AT4G27880 seven in absentia (SINA) family protein PP_LEa0035M04f AT4G27560 glycosyltransferase family protein PP_LEa0036D18f pa61 3 AT5G51050 mitochondrial substrate carrier family protein PP_LEa0034P07f ctg2264 AT5G51180 expressed protein PP_LEa0035K24f AT5G50550 WD-40 repeat family protein/St12p protein PP_LEa0036H23f pa64 3 AT4G14960 tubulin alpha-6 chain (TUA6) PP_LEa0035B10f ctg2264 AT3G22170 far-red impaired responsive protein PP_LEa0036G03f AT3G22850 similar to auxin down-regulated protein ARG10 PP_LEa0034K23f pa71 3 AT2G18280 tubby-like protein 2 (TULP2) PP_LEa0034J18f ctg2264 AT4G30260 integral membrane Yip1 family protein PP_LEa0034N14f AT2G18470 protein kinase family protein PP_LEa0036C20f pa82 3 AT5G66510 contains bacterial transferase hexapeptide repea PP_LEa0009C17f ctg2269 AT5G66460 PP_LEa0003M21f AT5G66140 20S proteasome alpha subunit D2 PP_LEa0027M15f pa103 4 AT3G56080 dehydration-responsive protein-related PP_LEa0017H06f ctg58 AT2G40060 expressed protein PP_LEa0017F24f AT2G39750 dehydration-responsive family protein PP_LEa0017H06f AT3G55590 GDP-mannose pyrophosphorylase PP_LEa0005L09f pa108 3 AT4G29410 60S ribosomal protein L28 (RPL28C) PP_LEa0001M19f ctg9 AT4G29480 mitochondrial ATP synthase g subunit family protein PP_LEa0025C15f AT2G19740 60S ribosomal protein L31 (RPL31A) PP_LEa0008A18f BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 15 of 17 (page number not for citation purposes) recent polyploidy event. We did not find markedly differ- ent results in the conserved synteny with the ancestral genome compared to the present genome, which was to be expected given that the polyploidization event that dif- ferentiated the present and the ancestral Arabidopsis genome occurred 24­40 million years ago, which is rela- tively recent compared to the peach-Arabidopsis diver- gence, 90 million years ago. We did find, however, a number of syntenic regions in the ancestral genome that do not exist in the present genome. We also found some examples where gene content and the gene order is more conserved in the ancestral genome than in the present genome. Our study illustrates that comparative genome analysis of both the ancestral and present Arabidopsis genomes with other plant species can provide a useful resource for marker saturation in a specific region and candidate gene searches, as well as elucidating evolution- ary relationships between species. Conclusion We report the results of the systematic examination of conserved microsynteny between the Prunus and Arabidop- sis. Our study is the first to systematically examine the conserved microsynteny using extensive DNA sequences across the Prunus genome and their Arabidopsis homologs. More importantly, this study utilized the pseudo-ancestral Arabidopsis genome, as well as the present Arabidopsis genome, in the comparison of the Arabidopsis with other plant genomes. This method helped us to find more con- served microsyntenic regions between the ancestral Arabi- dopsis and Prunus genomes and also to delineate the putative evolutionary steps in the microsyntenic regions. We believe that this report will give a new insight in the study of evolutionary relationships among plants and provide new way to more efficient utilization of the resources of the model genome. Methods Data description For the synteny analysis between the Prunus and Arabidop- sis genomes, we used peach EST sequences anchored to the Prunus genetic maps [25]. Among the 475 genetically anchored peach ESTs used in this analysis, 306 ESTs were hybridized to BACs that have been hybridized to genetic markers, and the rest were hybridized to BACs belonging to a contig containing other BACs hybridized to genetic markers. The positions (cM) of the genetic markers were used as the positions for the genetically anchored ESTs. For the synteny analysis between the peach physical tran- scriptome map and Arabidopsis, we used peach EST sequences that are anchored the developing peach physi- cal map. The data set is composed of 1097 sequences that are anchored to 431 BAC contigs containing at least two anchored ESTs. The position of the individual BACs in the BAC contigs were used as the positions of the physical map anchored ESTs. For the ESTs that are anchored to multiple overlapping ESTs in a BAC contig, the innermost left and right positions were assigned. All the sequences and positions of the peach ESTs were obtained from the Genome Database for Rosaceae (GDR) [27,28]. The sequence data (ATH1_pep_cm_20040228) and the chromosome coordinate data (sv_gene.data) of the 29161 Arabidopsis translated proteins were downloaded from the Arabidopsis Information Resources (TAIR) database [29,30] in March 2005. The ordered list of 20187 gene names in the reconstructed ancestral Arabidopsis genome was downloaded from the Paralogons in Arabidopsis thal- iana web site [22,31]. Detection of the conserved syntenic regions Mapped peach ESTs that are homologous to the Arabidop- sis proteins were determined using the FASTX 3.4 algo- rithm [27]. Matches with E values less than 10 -5 were selected for further analysis. For the comparison between the Arabidopsis genome and the Prunus maps, the syntenic groups were selected when the distance between the two adjacent matches were less than 250 kb in the Arabidopsis genome and less than 10 cM for the Prunus maps. For the comparison between the Arabidopsis genome and the peach physical map, the syntenic groups were selected when the matches were located within 250 kb in the cur- rent Arabidopsis genome and belong to the same BAC con- tigs. In the analysis of the conserved synteny between the ancestral Arabidopsis genome and the peach physical map Table 6: Number of syntenic groups between Prunus/Peach and Arabidopsis that are detectecd at various significance thresholds. Significance threshold Syntenic Group 99.90% 99% 95% 90% 80% Total gp 21 (17) 27 (20) 56 81 108 139 (20) ga 11 (8) 22 (12) 39 64 86 101 (12) pp 18 (11) 36 (16) 65 85 102 140 (16) pa 13 (10) 25 (14) 50 70 93 111 (14) Numbers in parenthesis stands for the syntenic groups with more than three gene pairs. BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 16 of 17 (page number not for citation purposes) or the Prunus genetic maps, we used the estimated number of genes in 250 kb (61 genes) as the maximum distance between the two adjacent matches in the Arabidopsis genome. The estimation was done by dividing 250 kb by the average size per gene (4.1 kb) in Arabidopsis, which is derived by the division of the total length in kb by the number of genes in the Arabidopsis genome. We used a program called DAGchainer [32] to detect col- linear chromosomal segment conserved in the peach/Pru- nus and Arabidopsis genomes. DAGchainer was run with parameters set to detect any collinear blocks with two or more gene pairs and with the maximum distance between the two adjacent matches specified above. Since the DAGchainer program detects only the regions with con- served order, we developed scripts to detect both collinear and non-collinear regions from the output. Evaluation of the conserved syntenic regions To determine whether the syntenic groups we report were detected by chance, we tested the statistical significance for each group. Both of the current and putative ancestral Arabidopsis genomes were randomized by leaving the loca- tions the same but permuting the gene names. We ana- lyzed 1000 simulated Arabidopsis genomes for the occurrence of each conserved syntenic group and calcu- lated the probability of the match occurring by chance. Authors' contributions SJ designed the protocol for synteny analysis and the sta- tistical analysis, designed and developed scripts, per- formed the research, analyzed the data and wrote the paper. DM conceived of the study and participated in its design and coordination, and critically revised the manu- script. MS performed the sequence similarity search and wrote the scripts for statistical analysis. IC wrote the scripts for detecting non-linear syntenic regions and duplicate syntenic regions and parting the DAGchainer outputs. TZ provided the EST data hybridized to peach BAC contigs. PA critically revised the manuscript. AA con- ceived of the study and critically revised the manuscript. All authors read and approved the final manuscript. Acknowledgements This work was supported by an award (#0320544) from the National Sci- ence Foundation. References 1. Cavalier-Smith T: Economy, speed and size matter: evolution- ary forces driving nuclear genome miniaturization and expansion. Ann Bot (Lond) 2005, 95:147-175. 2. Bennetzen JL, Coleman C, Liu R, Ma J, Ramakrishna W: Consistent over-estimation of gene number in complex plant genomes. Curr Opin Plant Biol 2004, 7:732-736. 3. Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408:796-815. 4. International Human Genome Sequencing Consortium: Fin- ishing the euchromatic sequence of the human genome. Nature 2004, 431:931-945. 5. Bonierbale MW, Plaisted RL, Tanksley SD: RFLP Maps Based on a Common Set of Clones Reveal Modes of Chromosomal Evo- lution in Potato and Tomato Genetics. Genetics 1988, 120:1095-1103. 6. Devos KM, Gale MD: Comparative genetics in the grasses. Plant Mol Biol 1997, 35:3-15. 7. Gale MD, Devos KM: Comparative genetics in the grasses. Proc Natl Acad Sci USA 1998, 95:1971-1974. 8. Keller B, Feuillet C: Colinearity and gene density in grass genomes. Trends Plant Sci 2000, 5:246-251. 9. Dirlewanger E, Graziano E, Joobeur T, Garriga-Caldere F, Cosson P, Howad W, Arus P: Comparative mapping and marker-assisted selection in Rosaceae fruit crops. Proc Natl Acad Sci USA 2004, 101:9891-9896. 10. Bennetzen JL: Comparative sequence analysis of plant nuclear genomes:m microcolinearity and its many exceptions. Plant Cell 2000, 12:1021-1029. 11. Kilian A, Chen J, Han F, Steffenson B, Kleinhofs A: Towards map- based cloning of the barley stem rust resistance genes Rpg1 and rpg4 using rice as an intergenomic cloning vehicle. Plant Mol Biol 1997, 35:187-195. 12. Helentjaris T, Weber D, Wright S: Identification of the Genomic Locations of Duplicate Nucleotide Sequences in Maize by Analysis of Restriction Fragment Length Polymorphisms. Genetics 1988, 118:353-363. 13. Lagercrantz U: Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accom- panied by chromosome fusions and frequent rearrange- ments. Genetics 1998, 150:1217-1228. 14. McCouch SR: Genomics and synteny. Plant Physiol 2001, 125:152-155. 15. O'Neill CM, Bancroft I: Comparative physical mapping of seg- ments of the genome of Brassica oleracea var. alboglabra that are homoeologous to sequenced regions of chromo- somes 4 and 5 of Arabidopsis thaliana. Plant J 2000, 23:233-243. 16. Vision TJ, Brown DG, Tanksley SD: The origins of genomic dupli- cations in Arabidopsis. Science 2000, 290:2114-2117. truncatula, and Arabidopsis thaliana. Genome 2004, 47: 141­155. 17. Grant D, Cregan P, Shoemaker RC: Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. Proc Natl Acad Sci USA 2000, 97:4168-4173. 18. Yan HH, Mudge J, Kim DJ, Shoemaker RC, Cook DR, Young ND: Comparative physical mapping reveals features of microsyn- teny between Glycine max. Medicago . 19. Ku HM, Vision T, Liu J, Tanksley SD: Comparing sequenced seg- ments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny. Proc Natl Acad Sci USA 2000, 97:9121-9126. 20. Dominguez I, Graziano E, Gebhardt C, Barakat A, Berry S, Arus P, Delseny M, Barnes S: Plant genome archaeology: evidence for conserved ancestral chromosome segments in dicotyledo- nous plant species. Plant Biotechnology Journal 2003, 1:91-99. 21. Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, et al.: A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 2002, 296:92-100. 22. Blanc G, Hokamp K, Wolfe KH: A recent polyploidy superim- posed on older large-scale duplications in the Arabidopsis genome. Genome Res 2003, 13:137-144. 23. Georgi L, Wang Y, Yvergniaux D, Ormsbee T, Inigo M, Reighard G, Abbott G: Construction of a BAC library and its application to the identification of simple sequence repeats in peach [Pru- nus persica (L.) Batsch]. Theor Appl Genet 2002, 105:1151-1158. 24. Georgi LL, Wang Y, Reighard GL, Mao L, Wing RA, Abbott AG: Comparison of peach and Arabidopsis genomic sequences: fragmentary conservation of gene neighborhoods. Genome 2003, 46:268-276. 25. Horn R, Lecouls AC, Callahan A, Dandekar A, Garay L, McCord P, Howad W, Chan H, Verde I, Main D, et al.: Candidate gene data- base and transcript map for peach, a model species for fruit trees. Theor Appl Genet 2005, 110:1419-1428. Publish with BioMed Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and publishedimmediately upon acceptance cited in PubMed and archived on PubMed Central yours -- you keep the copyright Submit your manuscript here: http://www.biomedcentral.com/info/publishing_adv.asp BioMedcentral BMC Genomics 2006, 7:81 http://www.biomedcentral.com/1471-2164/7/81 Page 17 of 17 (page number not for citation purposes) 26. Liu H, Sachidanandam R, Stein L: Comparative genomics between rice and Arabidopsis shows scant collinearity in gene order. Genome Res 2001, 11:2020-2026. 27. Jung S, Jesudurai C, Staton M, Du Z, Ficklin S, Cho I, Abbott A, Tom- kins J, Main D: GDR (Genome Database for Rosaceae): inte- grated web resources for Rosaceae genomics and genetics research. BMC Bioinformatics 2004, 5:130. 28. Genome Database for Rosaceae (GDR) [http:// www.rosaceae.org/] 29. Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia- Hernandez M, Huala E, Lander G, Montoya M, et al.: The Arabidop- sis Information Resource (TAIR): a model organism data- base providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 2003, 31:224. 30. The Arabidopsis Information Resource [http://www.arabidop sis.org/] 31. The Paralogons in Arabidopsis thaliana web site [http:// wolfe.gen.tcd.ie/athal/] 32. Haas BJ, Delcher AL, Wortman JR, Salzberg SL: DAGchainer: a tool for mining segmental genome duplications and synteny. Bio- informatics 2004, 20:3643-3646. 33. Dettori MT, Quarta R, Verde I: A peach linkage map integrating RFLPs, SSRs, RAPDs, and morphological markers. Genome 2001, 44:783-790. 34. Dirlewanger E, Moing A, Rothan C, Svanella L, Pronier V, Guye A, Plo- mion C, Monet R: Mapping QTLs controlling fruit quality in peach (Prunus persica (L) Batsch). Theor Appl Genet 1999, 98:18-31. 35. Jáuregui B, de Vicente MC, Messeguer R, Felipe A, Bonnet A, Salesses G, Arús P: A reciprocal translocation between 'Garfi' almond and 'Nemared' peach. Theor Appl Genet 2001, 102:1169-1176. 36. Joobeur T, Periam N, de Vicente MC, King GJ, Arus P: Develop- ment of a second generation linkage map for almond using RAPD and SSR markers. Genome 2000, 43:649-655. 37. Ballester J, Socias I, Company R, Arus P, De Vicente MC: Genetic mapping of a major gene delaying blooming time in almond. Plant Breeding 2001, 120:268-270.