The Protein Data Bank (PDB) is a repository for 3-D structural data of proteins and nucleic acids. This data, typically obtained by X-ray crystallography or NMR spectroscopy, is submitted by biologists and biochemists from around the world, is released into the public domain, and can be accessed for free. The database is the central repository for biological structural data.
In 2003, the Worldwide Protein Data Bankwas formed, consisting of three member organizations that act as deposition, data processing and distribution centers for PDB data. The founding members are RCSB PDB (USA), MSD-EBI (Europe) and PDBj (Japan). The mission of the wwPDB is to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community.
The PDB is a key resource in structural biology and is critical to more recent work in structural genomics.
Countless derived databases and projects have been developed to integrate and classify the PDB in terms of protein structure, protein function and protein evolution.
The growth rate of the PDB has been the subject of fairly extensive analysis.
Note that the database stores information about the exact location of all atoms in a large biomolecule; if one is only interested in sequence data, i.e. the list of amino acids making up a particular protein or the list of nucleotides making up a particular nucleic acid, the much larger databases from Swiss-Prot and the International Nucleotide Sequence Database Collaboration should be used.
| Proteins | Nucleic Acids | Protein/NA complexes | Other | Total | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| X-ray diffraction | 29258 | 902 | 1353 | 28 | 31541 | |||||||
| NMR | 4690 | 705 | 121 | 6 | 5522 | |||||||
| Electron microscopy | 88 | 9 | 29 | 0 | 126 | |||||||
| Other | 73 | 4 | 3 | 0 | 80 | -align="right" | Total | 34109 | 1620 | 1506 | 34 | 37269 |
This legacy format has caused many problems with the format, and consequently the PDB has three distinct 'clean-up' projects;
Each of these grant-funded projects has attempted to achieve the same goal via different routes. The Data Uniformity Project is hosted by the RCSB (the current home of the PDB). Each uses the original PDB data to derive a new format; The MMDB uses ASN.1 (and an XML conversion of this format); The MSD uses a Relational Database; The Data Uniformity Project uses mmCIF (and another XML conversion of this format).
Some people would say that this is a Good Thing; others would argue that, without a universal repository of information (i.e., a common dictionary), how can we talk about the same thing.
Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. This should not be used as an identifier for biomolecules, since often several structures for the same molecule (in different environments or conformations) are contained in PDB with different PDB IDs.
If a biologist submits structure data for a protein or nucleic acid, PDB staff reviews and annotates it. The data are then automatically checked for plausibility. The source code for this validation software has been released for free. The main data base accepts only experimentally derived structures, and not theoretically predicted ones (see protein structure prediction).
Various funding agencies and scientific journals now require scientists to submit their structure data to PDB.
Protein Data Bank | Protein Data Bank | Banque de données des protéines
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Protein Data Bank".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world