Dr. Prativa Deka
AssociateProfessor
Department of Botany
Mangaldai College, Mangaldai
E-Mail: pdeka.mld@gmail.com
Bioinformatics
2.
Biologists
Collect Molecular Data:
DNA& Protein Sequences,
Gene Expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop Tools, Softwares, Algorithms
to Store and Analyze the Data.
Bioinformaticians
Study of Biological Questions by
Analyzing Molecular Data
Bioinformatics: The field of science in which biology, computer
science and information technology merge into a single discipline
Paulien
hogeweg
3.
From DNA toGenome
3
Watson and Crick
DNA model
Sanger sequences
insulin protein
Sanger dideoxy
DNA sequencing
PCR (Polymerase
Chain Reaction)
1955
1960
1965
1970
1975
1980
1985
ARPANET
(early Internet)
PDB (Protein Data
Bank)
Sequence
alignment
GenBank database
Dayhoff’s Atlas
05/05/2025 5
Biological Databases
Whatis a database?
– A collection of related data elements
• tables
• columns (fields)
• rows (records)
– Records retrieved using a query language
– Database technology is well established
6.
05/05/2025 6
• Tables(entitites)
• basic elements of information to track, e.g.,
gene, organism, sequence, citation
• Columns (fields)
• attributes of tables, e.g. for citation table, title,
journal, volume, author
• Rows (records)
• actual data
• whereas fields describe what data is stored, the rows
of a table are where the actual data is stored
7.
05/05/2025 7
How onlinedatabase work?
When you query an online database, your query is translated into SQL, the database
is interrogated, and the answer displayed on your web browser.
Your computer and
browser (the “client”)
Software to receive
and translate the
instructions you enter
into your browser (on
the “server”)
The database itself
Image source: David Lane and Hugh E. Williams. Web Database Applications with PHP & MySQL. O’Reilly (2002).
8.
Why biological databases?
•Make biological data available to scientists
– Consolidation of data (gather data from different sources)
– Provide access to large dataset that cannot be published
explicitly (genome, proteome,…)
• Make biological data available in computer-readable format
– Make data accessible for automated analysis
Bioinformatics: “To extract, store and to analysis the
biological data”
9.
05/05/2025 9
Biological Databases
•Over 1000 biological databases
• Vary in size, quality, coverage, level of interest
• Many of the major ones covered in the annual
Database Issue of Nucleic Acids Research
• What makes a good database?
• comprehensiveness
• accuracy
• is up-to-date
• good interface
• batch search/download
• API (web services, DAS, etc.)
Ten Important BioinformaticsDatabases
• GenBank www.ncbi.nlm.nih.gov nucleotide sequences
• Ensembl www.ensembl.org human/mouse/Plants genome
• PubMed www.ncbi.nlm.nih.gov literature references
• NR www.ncbi.nlm.nih.gov protein sequences
• SWISS-PROT www.expasy.ch protein sequences
• InterPro www.ebi.ac.uk protein domains
• OMIM www.ncbi.nlm.nih.gov genetic diseases
• Enzymes www.chem.qmul.ac.uk enzymes
• PDB www.rcsb.org/pdb/ protein structures
• KEGG www.genome.ad.jp metabolic pathways
• In 1965, Dayhoff gathered all the available sequence data to create the first
bioinformatics database (Atlas of Protein Sequence and Structure).
14.
NCBI (National Centerfor Biotechnology
Information)
•over 30 databases including
GenBank, PubMed, OMIM,
and GEO
• Access all NCBI resources
via Entrez
(www.ncbi.nlm.nih.gov/Entr
ez/)
BLAST For SequenceAlignment
• Basic Local Alignment Search Tool
– Altschul et al. 1990,1994,1997
• A best method for local alignment
• Designed specifically for database searches
• Benefits-Speed, User friendly, Statistical rigor,
More sensitive
• Types of BLAST- BLASTN, BLASTP, BLASTX,
TBLASTN, TBLASTX
#11
The purpose of databases is to curate the increasing amount of experimental data from various disciplines of biology
Bioinformatics can be viewed as this circle of science where high throughput experimental procedures produce large quantity of data
This data is stored and processed in databases from which it can be extracted for analysis with computational methods
The results form the computational methods in turn inspire additional experiments and hypotheses concerning biological phenomena