1
2
3
4
5
6
7
8
Genomics is a science discipline that is interested in the analysis of
genomes. Genome of each organism is a complex of all genes of the
respective organism. The genes could be located in cytoplasm (prokaryots)
nucleus (in most euckaryotic organisms), mitochondria or chloroplasts (in
plants).
The critical prerequisite of genomics is the knowledge of gene sequences.
Functional genomics is interested in function of individual genes.
With the knowledge of gene sequences (or the knowledge of the gene files in the
individual organisms, i.e. the knowledge of genomes), Reverse Genetics appears
that allows study their function.
In comparison to ”classical” or Forward Genetics, starting with the phenotype,
the reverse genetics starts with the sequence identified as a gene in the
sequenced genome. The gene identification using approaches of Bioinformatics
will be described later (see Lesson 02).
Reverse genetics uses a spectrum of approaches that will be described in the
Lesson 03 that allow isolation of sequence-specific mutants and thus their
phenotype analysis.
The necessity of having phenotype alterations in the forward genomics approach
introduces important difference between those two approaches. Thus, the gene
is no longer understood as a factor (trait) determining phenotype, but rather as a
piece of DNA characterized by the unique string of nucleotides. i.e. physical DNA
molecule.
9
10
11
NIH WORKING DEFINITION OF BIOINFORMATICS AND COMPUTATIONAL
BIOLOGY
July 17, 2000
The following working definition of bioinformatics and computational
biology were developed by the BISTIC Definition Committee and released
on July 17, 2000. The committee was chaired by Dr. Michael Huerta of the
National Institute of Mental Health and consisted of the following
members:
Bioinformatics Definition Committee BISTIC Members Expert Members
Michael Huerta (Chair) Gregory Downing
Florence Haseltine Belinda Seto
Yuan Liu
Preamble
Bioinformatics and computational biology are rooted in life sciences as
well as computer and information sciences and technologies. Both of
these interdisciplinary approaches draw from specific disciplines such as
mathematics, physics, computer science and engineering, biology, and behavioral
science. Bioinformatics and computational biology each maintain close
interactions with life sciences to realize their full potential. Bioinformatics applies
principles of information sciences and technologies to make the vast, diverse, and
complex life sciences data more understandable and useful. Computational
biology uses mathematical and computational approaches to address theoretical
and experimental questions in biology. Although bioinformatics and
computational biology are distinct, there is also significant overlap and activity at
their interface.
Definition
The NIH Biomedical Information Science and Technology Initiative Consortium
agreed on the following definitions of bioinformatics and computational biology
recognizing that no definition could completely eliminate overlap with other
activities or preclude variations in interpretation by different individuals and
organizations.
Bioinformatics: Research, development, or application of computational tools and
approaches for expanding the use of biological, medical, behavioral or health
data, including those to acquire, store, organize, archive, analyze, or visualize such
data.
Computational Biology: The development and application of data-analytical and
theoretical methods, mathematical modeling and computational simulation
techniques to the study of biological, behavioral, and social systems.
11
12
13
NIH WORKING DEFINITION OF BIOINFORMATICS AND COMPUTATIONAL
BIOLOGY
July 17, 2000
The following working definition of bioinformatics and computational
biology were developed by the BISTIC Definition Committee and released
on July 17, 2000. The committee was chaired by Dr. Michael Huerta of the
National Institute of Mental Health and consisted of the following
members:
Bioinformatics Definition Committee BISTIC Members Expert Members
Michael Huerta (Chair) Gregory Downing
Florence Haseltine Belinda Seto
Yuan Liu
Preamble
Bioinformatics and computational biology are rooted in life sciences as
well as computer and information sciences and technologies. Both of
these interdisciplinary approaches draw from specific disciplines such as
mathematics, physics, computer science and engineering, biology, and behavioral
science. Bioinformatics and computational biology each maintain close
interactions with life sciences to realize their full potential. Bioinformatics applies
principles of information sciences and technologies to make the vast, diverse, and
complex life sciences data more understandable and useful. Computational
biology uses mathematical and computational approaches to address theoretical
and experimental questions in biology. Although bioinformatics and
computational biology are distinct, there is also significant overlap and activity at
their interface.
Definition
The NIH Biomedical Information Science and Technology Initiative Consortium
agreed on the following definitions of bioinformatics and computational biology
recognizing that no definition could completely eliminate overlap with other
activities or preclude variations in interpretation by different individuals and
organizations.
Bioinformatics: Research, development, or application of computational tools and
approaches for expanding the use of biological, medical, behavioral or health
data, including those to acquire, store, organize, archive, analyze, or visualize such
data.
Computational Biology: The development and application of data-analytical and
theoretical methods, mathematical modeling and computational simulation
techniques to the study of biological, behavioral, and social systems.
13
14
There are many of on-line resources that could be used.
15
Nowadays, the resources are interconnected and could be accessed via dedicated
web pages. Among the best and mostluy used www resources integrating plenty
of database resources belong www portal of European Bioinformatics Institute
(EBI) in Europe (Germany) and National Center of Biotechnology Information
(NCBI) in the USA (
16
Nowadays, the resources are interconnected and could be accessed via dedicated
web pages.
17
18
19
20
21
Shotgun sequencing allows a scientist to rapidly determine the sequence of very long
stretches of DNA. The key to this process is fragmenting of the genome into smaller
pieces that are then sequenced side by side, rather than trying to read the entire
genome in order from beginning to end. The genomic DNA is usually first divided into its
individual chromosomes. Each chromosome is then randomly broken into small strands
of hundreds to several thousand base pairs, usually accomplished by mechanical
shearing of the purified genetic material. Each of the short DNA pieces is then inserted
into a DNA vector (a viral genome), resulting in a viral particle containing "cloned"
genomic DNA (Fig. 1).
The collection of all the viral particles with all the different genomic DNA pieces is
referred to as a library. Just as a library consists of a set of books that together make up
all of human knowledge, a genomic library consists of a set of DNA pieces that together
make up the entire genome sequence. Placing the genomic DNA within the viral genome
allows bacteria infected with the virus to faithfully replicate the genomic DNA pieces.
Additionally, since a little bit of known sequence is needed to start the sequencing
reaction, the reaction can be primed off the known flanking viral DNA.
In order to read all the nucleotides of one organism, millions of individual clones are
sequenced. The data is sorted by computer, which compares the sequences of all the
small DNA pieces at once (in a "shotgun" approach) and places them in order by virtue
of their overlapping sequences to generate the full-length sequence of the genome (Fig.
2). To statistically ensure that the whole genome sequence is acquired by this method,
an amount of DNA equal to five to ten times the length of the genome must be
sequenced. (Interactive concepts in biochemistry, Rodney Boyer, Wiley, 2002,
http://www.wiley.com//college/boyer/0470003790/)
22
DDBJ/EMBL/GenBank accepts both complete and incomplete genomes. Whole
Genome Shotgun (WGS) sequencing projects are incomplete genomes or
incomplete chromosomes that are being sequenced by a whole genome shotgun
strategy. WGS projects may be annotated, but annotation is not required.
The pieces of a WGS project are the contigs (overlapping reads), and they do not
include any gaps. An AGP file can be submitted to indicate how the contig
sequences are assembled together into scaffolds (contig sequences separated by
gaps) and/or chromosomes. We must have the contig sequences without gaps as
the basic units for all WGS projects.
23
24
25
26
27
BLINK is a link to the
28
29
30
31
32
33
34
35
36
- Doplnit dalsi DB (Blanka P.)
37
- Doplnit dalsi DB (Blanka P.)
38
- Doplnit dalsi DB (Blanka P.)
39
- Doplnit dalsi DB (Blanka P.)
40
S/MARt DB (saffold/matrix attached region transaction database). This database
collects information about S/MARs and the nuclear matrix proteins that are
supposed be involved in the interaction of these elements with the nuclear
matrix. http://transfac.gbf.de/SMARtDB/index.html)
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
BLINK is a link to the pre-computed BLAST search results for the respective
sequence (see the next slide).
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82