Uniprot/Swiss-Prot Data Base
A Uniprot/Swiss-Prot Data Base is a broad-coverage curated biomedical knowledge base of protein data in the form of annotated Swiss-Prot records.
- AKA: SwissProt KB.
- Context:
- It can (typically) be a Disambiguated Database.
- It can (typically) provide a high level of Integration with other Databases.
- It can be a part of the Universal Protein Resource (UniProt). The other component being TrEMBL.
- It can be distributed by European Molecular Biology Laboratory (EMBL)
- It can be updated regularly the PSC
- It can be published in Swiss-Prot Accession Format[1]
- It can include a Controlled Vocabulary for Biological Entities, such as the Swiss-Prot SCL Controlled Vocabulary.
- Example(s):
- Release 52.3 of 17-Apr-2007 of UniProtKB/Swiss-Prot contains 264,492 sequence entries, comprising 96,880,444 amino acids abstracted from 154,049 references.
- Release 49.0 of 07-Feb-2006 of UniProtKB/Swiss-Prot contains 207,132 sequence entries, comprising 75,438,310 amino acids abstracted from 139,151 references. From more than 9,731 different species;
- Counter-Example(s):
- See: PPLRE Swiss-Prot Table, Protein Sequence.
References
- http://www.ebi.ac.uk/swissprot/
- ftp://ftp.ebi.ac.uk/pub/databases/swissprot/release_compressed/]].
- http://www.expasy.org/sprot/relnotes/relstat.html
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/UniProt#UniProtKB Retrieved:2017-6-8.
- UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from scientific literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to provide all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings. The manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature.[1]
Sequences from the same gene and the same species are merged into the same database entry. Differences between sequences are identified, and their cause documented (for example alternative splicing, natural variation, incorrect initiation sites, incorrect exon boundaries, frameshifts, unidentified conflicts). A range of sequence analysis tools is used in the annotation of UniProtKB/Swiss-Prot entries. Computer-predictions are manually evaluated, and relevant results selected for inclusion in the entry. These predictions include post-translational modifications, transmembrane domains and topology, signal peptides, domain identification, and protein family classification.
Relevant publications are identified by searching databases such as PubMed. The full text of each paper is read, and information is extracted and added to the entry. Annotation arising from the scientific literature includes, but is not limited to:
- Protein and gene names
- Function
- Enzyme-specific information such as catalytic activity, cofactors and catalytic residues.
- Subcellular location.
- Protein-protein interactions.
- Pattern of expression
- Locations and roles of significant domains and sites
- Ion-, substrate- and cofactor-binding sites
- Protein variant forms produced by natural genetic variation, RNA editing, alternative splicing, proteolytic processing, and post-translational modification
- Annotated entries undergo quality assurance before inclusion into UniProtKB/Swiss-Prot. When new data becomes available, entries are updated.
- UniProtKB/Swiss-Prot is a manually annotated, non-redundant protein sequence database. It combines information extracted from scientific literature and biocurator-evaluated computational analysis. The aim of UniProtKB/Swiss-Prot is to provide all known relevant information about a particular protein. Annotation is regularly reviewed to keep up with current scientific findings. The manual annotation of an entry involves detailed analysis of the protein sequence and of the scientific literature.[1]
2009
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Swiss-Prot
- … From the March and April 2007 releases
3.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 15803 Homo sapiens (Human)
2 12577 Mus musculus (Mouse)
6 4931 Escherichia coli
10 2848 Bacillus subtilis
12 1882 Escherichia coli O157:H7
13 1782 Methanococcus jannaschii
14 1774 Haemophilus influenzae
17 1624 Salmonella typhimurium
19 1550 Escherichia coli O6
20 1521 Shigella flexneri
21 1416 Mycobacterium tuberculosis
22 1220 Salmonella typhi
23 1158 Mycobacterium bovis
26 1106 Pseudomonas aeruginosa
28 976 Synechocystis sp. (strain PCC 6803)
29 971 Archaeoglobus fulgidus
31 886 Vibrio cholerae
Total Number of Average
Line type / subtype number entries per entry
-------- --------- ---------
References (RL) 511783 1.97
Journal 446039 233211 1.71
Submitted to EMBL/GenBank/DDBJ 61712 54310 0.24
Submitted to Swiss-Prot 1137 1119 <0.01
...
Comments (CC) 1051386 4.04
SIMILARITY 295447 237694 1.14
FUNCTION 183661 177123 0.71
SUBCELLULAR LOCATION 142311 142311 0.55
...
5.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 14709 Journal of Biological Chemistry
2 7044 Proceedings of the National Academy of Sciences of the U.S.A.
3 4488 Journal of Bacteriology
4 4199 Gene
5 4063 Nucleic Acids Research
6 3794 Biochemical and Biophysical Research Communications
7 3524 FEBS Letters
8 3274 Biochemistry
9 3238 The EMBO Journal
10 2931 European Journal of Biochemistry
11 2801 Nature
User Interfaces
Basic UniProtKB Entry Viewer
- http://www.expasy.org/cgi-bin/niceprot.pl?P23189
- http://www.expasy.org/uniprot/ELAS_PSEAE
- http://www.expasy.org/sprot/userman.html
PIR
- http://www.pir.uniprot.org/cgi-bin/upEntry?id=P23189
- http://pir.georgetown.edu/cgi-bin/ipcEntry?id=GSHR_PSEAE
- http://www.pir.uniprot.org/start/faq.shtml