The overall goal of our research is to understand how genes and phenotypes are expressed from the genetic information encoded in genomes.

More specifically, we are working on :

  • Decoding and characterizating the regulatory genome and proteome
  • Elucidating the regulatory programs underlying complex processes in normal cells and tissues, but also in diseases such as cancer and neurodegenerative disorders, aging, etc.
  • Creating innovative computational approaches for analyzing data from high-throughput experiments (microarrays, proteomics, high-throughout sequencing, etc)
  • Mapping genotype to complex phenotypes

    Here are some of the current projects in the lab:

    Regulatory element discovery from gene expression

    We have recently developed FIRE, a universal framework for detecting regulatory DNA and RNA motifs from gene expression data. FIRE works across all data types (clustered microarray datasets, single arrays, in situs, etc) and genomes, with exceptional sensitivity and near-zero false-positive rates.

    FIRE-predicted regulatory elements are currently being experimentally tested in labs studying P. falciparum gene expression, Drosophila development, cellular quiescence, root development in Arabidopis and several others at Princeton and elsewhere. Ongoing work includes applying FIRE to many cancer datasets in order to discover transcriptional and post-transcriptional processes that are dysregulated in cancer. Finally, we have started extending FIRE to discover protein motifs that may be involved post-translational modifications, protein-protein interactions and cellular localization.

    Identifying conserved regulatory elements

    We have developed Fastcompare, an alignment-free approach for discovering regulatory elements that are globally conserved between two genomes. We have used Fastcompare to discover DNA and RNA motifs in many species, including yeasts, worms, flies and human.

    Current work includes comprehensive multi-clade discovery of transcription factor binding sites in bacteria. We are also extending Fastcompare using some of the powerful principles and algorithms developed in FIRE (such as motif optimization), and to allow multi-species comparisons. We are also using the elements discovered by FIRE to search metazoan genomes for distal enhancers.

    Genotype to phenotype mapping

    How are complex phenotypes, such as bacterial motility, cell wall, sporulation, respiration, encoded in genomes ? We have started to answer this question in bacteria, using phylogenetic and phenotypic profiles; we have found many functionally coherent modules of genes that are robustly co-inherited in bacteria sharing the same phenotypic traits. Current and future work includes using this approach to analyze the human microbiome, and extending our analysis to eukaryotic organisms. We also plan on extending our framework to analyze the wealth of data generated by whole-genome association projects in human.

    Future research

    Future research will involve studying additional facets of gene regulation : non-coding RNAs and their regulatory potential, A-to-I editing (and its potential to alter gene regulation) and RNA structural motifs. I am also interested in how computational approaches for automated image analysis and literature mining can help us decipher regulatory networks. Finally, I will also investigate the link between regulatory elements and disease, by exploiting data generated from whole-genome association studies.

    On the experimental side, I am very interested in using or developing systematic, high-throughput approaches for identifying the proteins or RNAs targeting the DNA and RNA regulatory elements we predict. This includes yeast one-hybrid (for protein-DNA interactions) and yeast three-hybrid (protein-RNA) screening, high throughput transient transfection reporter assays, etc. More modestly, I also plan to test some of the most interesting predictions from FIRE and Fastcompare, using reporter constructs.