NEW: Mutation detection from RNA-seq data using SNPseeqer and INDELseeqer
NEW: ChIPseeqer, a comprehensive framework for analysis of ChIP-seq data, is now available for download.

The overall goal of our research is to understand how genes and phenotypes are expressed from the genetic information encoded in genomes, and how we can use this improved molecular understanding to treat and diagnose diseases such as cancer.

More specifically, we are working on :

  • Systems biology of regulatory networks in normal and malignant cells, with a strong focus on blood cancers (lymphomas and leukemias)
  • Decoding and characterizating the regulatory genome and proteome
  • Developing innovative computational approaches for analysis of high-throughput experiments (microarrays, proteomics, high-throughout sequencing, etc)

    Here are some of the current projects in the lab:

    Decoding the B cell regulatory genome

    In collaboration with Ari Melnick's lab at WCMC, we are involved in a large scale and ambitious effort to understand how the genome and the epigenome controls gene expression in B cells, and how we can use this knowledge to understand tumorigenesis and ultimately design more targeted therapeutic strategies against lymphoma and other hematological malignancies. This effort involves using novel technologies such as deep sequencing (ChIP-seq, RNA-seq, nucleosome locations) to gain novel insights into cellular activity. We are using these data to decode the regulatory rules underlying combinatorial regulation by master regulators of B cell phenotypes, build multi-scale sequence-based predictive models of gene expression and use these models to predict the effect of regulatory mutations on cellular activity.

    Regulatory element discovery from gene expression

    We have recently developed FIRE, a universal framework for detecting regulatory DNA and RNA motifs from gene expression data. FIRE works across all data types (clustered microarray datasets, single arrays, in situs, etc) and genomes, with exceptional sensitivity and near-zero false-positive rates.

    FIRE-predicted regulatory elements are currently being experimentally tested in labs studying P. falciparum gene expression, Drosophila development, cellular quiescence, root development in Arabidopis and several others at Princeton and elsewhere. Ongoing work includes applying FIRE to many cancer datasets in order to discover transcriptional and post-transcriptional processes that are dysregulated in cancer. Finally, we have started extending FIRE to discover protein motifs that may be involved post-translational modifications, protein-protein interactions and cellular localization.

    Identifying conserved regulatory elements

    We have developed Fastcompare, an alignment-free approach for discovering regulatory elements that are globally conserved between two genomes. We have used Fastcompare to discover DNA and RNA motifs in many species, including yeasts, worms, flies and human.

    Current work includes comprehensive multi-clade discovery of transcription factor binding sites in bacteria. We are also extending Fastcompare using some of the powerful principles and algorithms developed in FIRE (such as motif optimization), and to allow multi-species comparisons. We are also using the elements discovered by FIRE to search metazoan genomes for distal enhancers.

    Genotype to phenotype mapping

    How are complex phenotypes, such as bacterial motility, cell wall, sporulation, respiration, encoded in genomes ? We have started to answer this question in bacteria, using phylogenetic and phenotypic profiles; we have found many functionally coherent modules of genes that are robustly co-inherited in bacteria sharing the same phenotypic traits. Current and future work includes using this approach to analyze the human microbiome, and extending our analysis to eukaryotic organisms. We also plan on extending our framework to analyze the wealth of data generated by whole-genome association projects in human.

    Other research areas in the lab

    We are also involved in studying additional facets of gene regulation : non-coding RNAs and their regulatory potential, A-to-I editing (and its potential to alter gene regulation) and RNA structural motifs. We are also interested in how computational approaches for automated image analysis and literature mining can help us decipher regulatory networks. Finally, we are investigating the link between regulatory elements and disease, by exploiting data generated from whole-genome association studies.