Yale University.  
Computer Science.  
   
     
Computer Science
Main Page
Academics
Graduate Program
Undergraduate Program
Course Information
Course Catalog
Course Web Pages
Research
Our Research
Research Areas
Research Projects
Publications
People
Faculty
Graduate Students
Research and Technical Staff
Administrative Staff
Alumni
Resources
Calendars
Computing Facilities
Yale Computer Science FAQ
Yale Workstation Support
Computing Lab
AfterCollege Job Resource
Department Information
Contact Us
History
Life in the Department
Life About Town
Directions
Job Openings
Faculty Positions
Useful Links
City of New Haven
Yale Applied Mathematics
Yale Faculty of Engineering
Yale University Home Page
Google Search
Yale Info Phonebook
Internal
Internal
 

Mark Gerstein
Associate Professor of Molecular Biophysics & Biochemistry and Computer Science

A.B. 1989, Harvard University
Ph.D.1993, Cambridge University
Joined Yale Faculty 1997

Personal Homepage

Mark Gerstein.

Professor Gerstein does research in the new field of bioinformatics, which involves applying quantitative approaches to problems in molecular biology. His research involves a range of computational techniques, including database design, systematic datamining, and molecular simulation. He is interested in large-scale surveys of the rapidly expanding number of genome sequences, protein structures and expression datasets. It is hoped that these will allow one to address a number of statistical questions about macromolecules relating to their physical properties, cellular function, and phylogenetic distribution.

More specifically, Professor Gerstein has three research foci.

1. Comparative Genomics. Here he is interested in comparing genomes in terms of "a finite parts list" of protein folds and families. This involves developing systems for large-scale genome annotation, that attempt to give one an integrated, "global" perspective on large amounts of heterogeneous information associated with the genome. An important part of this is developing ontologies for protein function and statistically reliable methods for predicting protein function based on sequence similarity, functional genomics data, and automated analysis of the literature. Also important is developing approaches for clustering the many small microbial genomes based on features of the entire genome sequence (rather than just the sequence of ribosomal RNA), ways of assessing the degree of bias in the databanks, and methods for identifying genes and pseudogenes.

2. Expression Analysis. Here the focus is on analyzing patterns of gene expression and interrelating these with important properties of proteins and nucleic acids, such as their structure, function, localization, and interactions. This work involves extensive application of machine learning approaches such as Bayesian networks, decision trees and unsupervised clustering.

3. Macromolecular Geometry. Here he concentrates on the relationship between packing and motions and tries to find standardized ways to describe the conformational variability of a given macromolecular "part." This involves developing ways of aligning structures, clustering related ones into fold families, analyzing packing with volumes derived from Voronoi polyhedra, and simulating motions using molecular-mechanics potentials.

Representative Publications:

Bullet.

"Integrative database analysis in structural genomics, Nature Structural Biology 7:960-3, 2000.

Bullet.

"A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome., with A Drawid, J. Molecular Biology 301:1059-75, 2000.

Bullet.

"A unified statistical framework for sequence comparison and structure comparison, with M Levitt, Proc. Natl. Acad. Sci. U S A 95:5913-20, 1998.

Bullet.

"Simulating water and the molecules of life. with M Levitt, Scientific American 279:100-5, 1998.

Top of Page.