Current research interests
RNA secondary structure analyses.
The eukaryotic translation initiation factor eIF4E is a critical modulator of cellular growth with
functions in the nucleus and cytoplasm. In the cytoplasm, recognition of the 5' m7G cap moiety on all
mRNAs is sufficient for their functional interaction with eIF4E. In contrast, in the
nucleus eIF4E associates and promotes the nuclear export of some mRNAs, such as cyclin D1, but not of
others, such as GAPDH or actin mRNAs.
In collaboration with Dr. Kathy Borden,
we have determined that the basis of this discriminatory interaction is a structurally conserved,
~50-nt sequence in the 3'
untranslated region (UTR) of cyclin D1 mRNA, we refer to as an eIF4E sensitivity element
This element is sufficient
for localization of capped mRNAs to eIF4E nuclear bodies, formation
of eIF4E-specific ribonucleoproteins in the nucleus, and eIF4E-dependent mRNA export.
Use of secondary structure to better define protein motifs.
Sequence signature databases such as PROSITE, which include protein motifs indicative of a protein's function,
are widely used for function prediction studies, cellular localization annotation and sequence classification.
Patterns are typically described as profiles (weighted matrices) or
motifs (regular expressions specifying allowed residues at particular positions).
Although profiles are becoming increasingly sophisticated
and allow greater sensitivity and quantitative scoring, they are heavily used mainly
by computational biologists.
On the other hand, the straightforward simplicity of protein motifs, as well as their ability to describe
confined patterns pertinent to enzymatic activities, explain their abundance and continuing usefulness. Protein motifs tend to be used more than profiles by the biological community.
In collaboration with Dr. Masha Niv, we are looking at refinement of protein motifs using secondary structure information.
To this end, we have developed Scan2S
which we have successfully applied to increasing the precision of detection of Type II REases.
Data mining and biomarker discovery.
Gene expression data are a rich source of information about the transcriptional dis-regulation
of genes in cancer. Genes that display differential regulation in cancer are a subtype of cancer biomarkers.
In collaboration with Dr. Fabien Campagne,
we used TissueInfo to mine expressed sequence tags to discover cancer biomarkers.
We identified the 200 genes most consistently differentially expressed in cancer in human and mouse
When used for prediction in a variety of cancer classification tasks (in 24
independent cancer microarray datasets, 59 classifications total), we show that HM200 achieves the
best or second best classification performance in 79% of the classifications considered, when compared
to 13 published cancer marker gene lists.
We are actively involved in the development of a number of different databases, including
a database that contains the tissue profiles of the human and mouse mRNA transcripts sets from Ensembl;
a database aimed at the assembly of all known PDZ-domain mediated protein-protein interactions;
a repository of mutants of GPCR and transporter proteins, and associated information;
a repository of organized, curated, and detailed information about G Protein-Coupled Receptor (GPCR)
dimerization/oligomerization, and its related structural context.
Applications and utilities developed