

THE GRADUATE PROGRAM IN BIOMEDICAL SCIENCES AND ENGINEERING
COMPUTATIONAL TOOLS
-
Computing Farms: Three parallel processor clusters and a bank of web servers support the UCSC Genome Browser and its associated tools and databases. These facilities also support much of the computational genome research conducted at UCSC. The newest of the two clusters, the Swarm, consists of 256 nodes with quad-core Intel processors and 8 gigabytes of memory, housed in 4 Rackable storage units. The older PitaKluster, consists of 198 dual AMD Opteron processor compute nodes, each having 4 gigabytes of memory, housed in 3 Rackable storage units. For jobs requiring large amounts of memory, the Memk cluster consists of 8 machines, each with 32 gigabytes of memory. They all run the Linux operating system.
-
The Genome Browser Database (http://genome.ucsc.edu/): UCSC’s Genome Browser Database is used for genome wide analysis and comparison. Originally created to support the Human Genome Project, it now also contains annotated sequence data for the mouse and rat genomes. Other vertebrate genomes will be added as they approach completion. Creation of the database has required the collaborative efforts of numerous institutions, as well as the contributions of individual annotators from around the world. UCSC provides this critical information to academic, nonprofit, and personal users at no cost. The database is maintained by the UCSC Genome Bioinformatics Group. Specially designed computational tools (e.g. BLAT, Parasol) enable a wide range of rapid and complex genomic analyses. The database can be accessed via the web for partial sequence analysis, while large-scale research projects generally take place at the Kilokluster compute farm.
-
The Yeast Intron Database: The Yeast Intron Database is a web-based tool with genome level information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Developed by the research groups of Professors Manual Ares (Dept. of MCD Biology) and David Haussler (Dept. of Biomolecular Engineering), the database lists known spliceosomal introns of yeast and documents the splice sites used by this organism. This information is used to understand splicing patterns, how they are regulated globally, and change during evolution. The website also contains graphs, histograms, images, and hidden Markov model information. Data can be both downloaded and submitted on-line.
-
Blat (Blast-like Alignment Tool): BLAT was designed by Jim Kent (a former MCD Training Program student and current Research Scientists in the Genome Bioinformatics Group) for the Genome Browser Database. Blat enables rapid sequence alignments. It can be run directly on the computing clusters or via the Genome Browser. BLAT is more accurate and 500 times faster than popular tools for DNA sequence alignments, and 50 times faster for protein alignments when comparing vertebrate sequences.
-
Parasol: Aligning whole genomes against each other is one of the most compute-intensive problems in bioinformatics. By breaking genomes into pieces and distributing smaller jobs to many CPUs, processing time is greatly reduced. Unfortunately, this sometimes results in hundreds of thousands of jobs being queued for processing, and traditional schedulers cannot handle such large queues effectively. Parasol, also designed by Jim Kent, is used to schedule extremely large batches of jobs for processing, and responds rapidly to inevitable systems failures that occur on such large clusters by automatically removing a machine from service when a problem is detected. Although originally written for UCSC's computing cluster, Parasol is portable to other operating systems and is an open source project and available without restriction.
-
Protein Structure Prediction Webserver (SAM-T08): SAM is a web-based tool for predicting the fold and secondary structure of a target protein sequence. It uses multi-track hidden Markov models and neural nets trained on multiple alignments generated by an iterated search procedure. SAM was developed by the research groups of Kevin Karplus and Richard Hughey and is maintained at the SOE Center for Biomolecular Science and Engineering.
-
The Intronerator: The Intronerator is a collection of web-based tools for exploring the molecular biology and genomics of C. elegans, with a special emphasis on alternative splicing. Developed at the Department of MCD Biology by Professor Al Zahler and Jim Kent (now in the Genome Bioinformatics Group), it includes a catalog of alternatively spliced genes, an intron database, software for genome alignment comparisons between species, and many other useful tools for molecular biology studies.
-
The Improbizer: The Improbizer is a software tool for detecting regulatory motifs in DNA or RNA sequences. It uses a variation of the expectation maximization (EM) algorithm. This tool finds sequence patterns that occur more frequently than those that appear by chance (i.e. background levels). An assortment of hidden Markov models can be used to adjust for the varying nucleotide background and foreground levels of different species. Designed by Jim Kent for the SOE computing clusters, the program is also downloadable.
-
Splicing Microarray Database: The UCSC microarray database enables researchers to store and browse their data via a web interface. This database focuses on representing alternatively spliced forms of mRNAs and data about their expression as detected in DNA microarray experiments. It integrates experimental data obtained from functional genomics research with information about gene structure that is stored in genomic databases.
| |
UCSC's
graduate Program in Biomedical Sciences and Engineering is
supported by training grants from the National Institute of General Medical
Science. |
Website design by David States, last reviewed
11/4/09
|