Databases maintained locally by ACHS

A List of locally stored databases and their latest updates. Databases are stored on an Apple XServe RAID mounted on the ACHS bioinformatics server, watson.achs.virginia.edu. These databases are accessed through GCG/SeqWeb.

Release #
Release or Upgrade date
GenBank (gb_new division updated nightly
Release 166.0
06/15/08
NCBI RefSeq, complete release (DNA & protein - updated nightly)
Release 27.0
01/06/08
GenPept (gp_new division updated nightly)
Release 166.0
06/16/08
PIR_Protein - no longer updated - use UniProt instead
Release 80
01/2005
SWISS-PROT - no longer updated - use Uniprot instead
Release 44
01/2005
UniProt (both uniprot_swissprot and uniprot_trembl - updated weekly)
UniProt is a combination of the information previously contained in the separate databases SwissProt, TrEMBL and PIR
Release 11.0
new (07/04)
05/29/07
H. sapiens reference genome (build 36, version 2)
NCBI constructs
09/2006
M. musculus reference genome (build 36, version 1)
NCBI constructs
05/2006
D. rerio reference genome (build 1, version 1)
NCBI constructs
07/2005
C. elegans genome
NCBI constructs

01/2003

Yeast Genome Database
original
05/1998
Yeast - genbank entries (yeast_gb)
N/A
03/2006
Yeast - complete genome (yeast_chr) from NCBI RefSeq constructs
N/A
03/2006
Yeast Genome Peptide database
original
05/1998
Yeast - non-redundant protein entries (yeastpep_nr)
N/A
03/2006
Yeast ORF Database (no introns)
N/A
03/2006
PROSITE
Release 18.26
05/11/04
Transcription Factor Database (no longer updated)
3.4
06/01/99
Pfam
21.0
11/2006
Restriction Enzyme
REBASE 612
11/ 30/2006

    notes on referencing databases from within GCG

    notes on referencing BLAST databases within GCG

    GenBank

    A copy of the GenBank DNA database is maintained on site. The database is updated daily.

    RefSeq

    RefSeq provides reference sequence standards for genomes, transcripts and proteins. Updated weekly.

    GenPept

    A protein database consisting of the translated protein sequences, contained in the GenBank sequence headers. Most sequences are hypothetical and not confirmed experimentally. The database is divided into the same subdivisions as GenBank. Updated daily.

    UniProt *NEW*

    The Universal Protein Resource is a repository of protein sequence and function. The data in UniProt comes from combining the information contained in Swiss-Prot, TrEMBL, and PIR. It consists of two major sections - uniprot_sw (manually curated and annotated entries from SwissProt and PIR), and uniprot_tr (computer analyzed entries from TrEMBL and PIR, awaiting full manual curation and annotation).

    Since UniProt is wholly redudant with PIR, eventually (ie. end of 2004, or early 2005) production/updates of the PIR protein sequence database will cease. UniProt is updated bi-weekly.

    PIR

    A protein database. Discontinued in January 2005 - replaced with UniProt. Version 80 will remain available, but users should access UniProt for the most recent, and accurate protein sequences.

    SWISSPROT

    A well annotated protein database.

    As of January 2005, SwissProt has been replaced by UniProt. The last update of Swissprot was version 44.2 on January, 2005. This last version will remain available, but users should access UniProt for the most recent, and accurate protein sequences.

    H. sapiens

    Individual chromosome builds from a reference assembly for the whole genome.

    M. musculus

    Individual chromosome builds from a reference assembly for the whole genome (includes mitochondrial genes and unannotated fragments)

    D. rerio

    Individual chromosome builds from a reference assembly for the whole genome (includes mitochondrial genes and unannotated fragments)

    Caenorhabditis elegans Yeast Genome Databases

    The yeast_orf database has been updated and reinstated. This version of the Sanger orf database contains orf sequences with no introns.

    Three new databases have been added locally: yeast_gb, yeastpep_nr, and yeast_chr. These are:

    1. The full contingent of GenBank entries taken from the complete yeast genome.
    2. The full contingent of non-redundant S. cerevisiae protein entries from SWISSPROT, PIR,, and GenPept.
    3. The complete yeast genome (16 chromosomes plus mtDNA), annotated by chromosome number.

    Because the format of these new yeast databases differs slightly from our older databases, the original databases are still available:

    • The original (and incomplete!) yeast genome nucleotide database can be searched (as before) from GCG through FASTA by specifying: yeast:*
    • The original (and out-of-date!) yeast genome peptide database can be searched from GCG through FASTA by specifying: yeastpro:*, or yeastpep:*
    Restriction enzyme database

    The file containing all the restriction enzyme sites used in GCG can be downloaded with the GCG command: fetch enzyme.dat. The file is updated monthly.

    Transcription factor database

    The file containing all the transcription factor sites used in GCG can be downloaded with the GCG command: fetch tfsites.dat.

    TRANSFAC is no longer available freely for updates/downloads. Hence, our local copy is very out of date.

    For the most current version of TRANSFAC, please go to the BioBase Corp. web site

    other Web-based TRANSFAC search engines

    PFAM : Multiple alignments and profile HMMs of protein domains

    Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. Version 7.2 of Pfam (April 2002) contains alignments and models for 3735 protein families, based on the Swissprot 40 and SP-TrEMBL 18 protein sequence databases. It is intended for use with the HMMERSEARCH program in GCG.

    Protein motifs database

    Please email comments or suggestions about the ACHS MolBiol pages to mblack@virginia.edu.

    Academic Computing Health Sciences
    Box 800555
    Charlottesville, VA 22908
    (434) 982-4025

    © 2008 by the Rector and Visitors of the University of Virginia.

    The information contained on the University of Virginia’s Department of Information Technology and Communication (ITC) website is provided as a public service with the understanding that ITC makes no representations or warranties, either expressed or implied, concerning the accuracy, completeness, reliability or suitability of the information, including warrantees of title, non-infringement of copyright or patent rights of others. These pages are expected to represent the University of Virginia community and the State of Virginia in a professional manner in accordance with the University of Virginia’s Computing Policies.