|
Bio
Prof. Abadi's research interests are in database system architecture and
implementation, cloud computing, and the Semantic Web. Before joining the Yale
computer science faculty, he spent four years at the Massachusetts Institute of
Technology where he received his Ph.D. Abadi has been a recipient of a
Churchill Scholarship, an NSF CAREER Award, the 2008 SIGMOD Jim Gray Doctoral
Dissertation Award, and the 2007 VLDB best paper award.
Current Projects
- Petascale Parallel Database Systems (HadoopDB)
HadoopDB is:
- A hybrid of DBMS and MapReduce technologies targeting analytical query workloads
- Designed to run on a shared-nothing cluster of commodity machines, or in the cloud
- An attempt to fill the gap in the market for a free and open source parallel DBMS
- Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems
- As scalable as Hadoop, while achieving superior performance on structured data analysis workloads
This projects builds on our paper in VLDB 2009.
- Data management for the Semantic Web (SW-Store)
SW-Store is a recently launched project whose goal is to manage and query
Semantic Web data. We are starting from a clean-slate and designing a DBMS
specifically for this type of data and the prevalent Semantic Web data
model, the Resource Description Framework, or RDF. We explore how common SW
queries and applications such as reasoning and biological data integration
can be built into the database. This work builds on a recent publication that won "Best Paper" at a
recent VLDB.
- Column-Oriented Database Systems (C-Store)
As companies increasingly use analytic data marts and data warehouses for
their customer relationship management and business intelligence
applications, the use of column-oriented DBMS technology is
growing. Column-oriented databases store DBMS tables column-by-column
(instead of row-by-row) and tend to perform better on analytical
applications since these applications tend to only focus on a subset of
table attributes at a time, and are thus more I/O efficient.
Consequently, column-stores have recently seen great momentum in industry
with at least a half-dozen new start-up companies, and in the research
community with a rapidly increasing number of recent publications.
At Yale, we're interested in a variety of research topics within context of
column-store databases systems, including:
- How to build a massively parallel, shared-nothing column-store database?
- How to make column-stores elastically expand or contract across
resources in the cloud?
- How to maximize column-store write performance?
- How to turn PostgreSQL into a hybrid row-store/column-store DBMS?
- High-Performance OLTP Databases (H-Store)
Current OLTP database designs, which date largely from the 1970s, are based on
several assumptions about the architecture of database
applications and hardware that are less true today than they were 30 years
ago. Some examples include:
- OLTP used to be dominated by disk-based systems. Today, most OLTP
applications can fit completely in the main memory of a cluster of
machines arranged in a shared-nothing architecture
- Many locking-based pessimistic concurrency control schemes designed to keep
the CPUs busy during disk and user stalls are no longer necessary
- The number of CPU cores available to process transactions is rapidly
expanding, and legacy DBMS code is struggling to keep up (i.e., they do
not scale)
The goal of the H-Store project is to investigate how these architectural and
application shifts affect the performance of OLTP databases,
and to study what performance benefits would be possible
with a complete redesign of OLTP systems in light of these trends. Our early
results show that a simple prototype built from scratch using
modern assumptions can outperform current commercial DBMS offerings by around a
factor of 80 on OLTP workloads. We are currently working
to build a full-featured system that demonstrates these performance wins in a
more robust prototype. This work is collaboration between MIT, Yale, and Brown.
|