Research

Research Interest

My research interests lie in data management and analytics. Specifically, my focus is on collaborative data management and manipulation, i.e., efficient management and manipulation of collaborative datasets. In addition, I am also interested more broadly in interactive data analytics.

Selected Research Projects

OrpheusDB

OrpheusDB is a hosted platform to bolt-on versioning for traditional relational databases. Compared to the existing multi-version control systems like GitHub, OrpheusDB has the following two crucial advantages: (a) compact storage; (b) rich query language. Since OrpheusDB is built on top of PostgreSQL, it inherits much of the benefits of relational databases, while also compactly storing, keeping track of, and recreating versions on demand. In our current implementation, user can interact with OrpheusDB via the command line using both standard Git-style APIs and SQL-like query language, as shown in the Figure below. Please refer to our website for more details.

OrpheusDB 
GenVisage

Genvisage is on rapid identification of discriminative features for genomic data analysis. Given two different classes of objects and an object-feature matrix, our goal is to find the TOP-K feature pairs separating these two classes. Many biological applications fit in this framework. For instance, when exposed to some drug, one set of genes may be overly expressed, while the other set of genes are not. Given the result for such a drug response experiment, biologists often want to find features to characterize these differentially expressed genes. Our design principle is to prioritize running time over accuracy, serving as a data exploration tool before investing in more time-consuming methods. As shown in the Figure below, we first propose a Rocchio-based separability metric for a given feature pair. Furthermore, we have been developing different optimization strategies to reduce the running time in finding best feature pairs, as depicted by optimization modulars in the Figure below.

GenVisage