Misfolded, non-functional proteins can place a significant burden on a cell by competing with properly folded proteins, and thus disrupting their function. To prevent such negative consequences, cells need to get rid of misfolded proteins, either through degradation or by refolding them into a functional conformation. However, these processes consume a significant proportion of a cell’s total energetic expenditure, which could have been otherwise directed towards growth and ultimately towards increasing an organism’s fitness. Research in this field has focused mostly on the consequences of erroneous protein synthesis, which is a relatively rare occurrence, especially when compared with “chronically” misfolded, or sub-optimally folded, proteins that result from missense mutations in an organism’s genetic code. In my current research, I use protein stability prediction models to assess the impact of such mutations on the gene expression levels and estimate their evolutionary significance. This forms a continuation and extension of my previous work on protein expression.
I am collaborating with the OneKP consortium to annotate sequenced transcripts using HPC resources. Moreover, I am also involved in the gene tree - species tree reconciliation analysis and the development of an on-line discovery platform.
Invalid taxonomic names pollute virtually every dataset, greatly limiting their reusage to answer new question. Problems caused by misspellings and contaminations but also by out-of-date names and synonyms. The taxonomic resolution service gives users the ability to scrub their list of taxonomic names and validate them against an authority of choice. The service is available at http://tnrs.iplantcollaborative.org/.
We are now working to provide a high-throughput service that can handle millions of names.
Researchers that want to add a phylogenetic component to their work are often limited by the lack of a phylogenetic tree for their clade of interest. Various pieces of the “Tree of Life” already exist, but are scattered across different resources and are often incompatible. Phylotastic is a set of loosely coupled components that can be strung together in order to obtain a phylogenetic tree that comprises an input list of species.
Numerical optimization is a key operation in most comparative analyses. However, the performance of the existing optimization routines implemented in the statistical language R has never been assessed for phylogenetic problems. This project aims at closing this knowledge gap and identifying the numerical optimization routines that are most appropriate for common phylogenetic analyses.
Using a custom-made genotyping array comprised of 9,000 high-quality probes we are investigating the origin of common grape cultivars and their phylogenetic relationships.
Very little is known about the evolution and function of long non-coding RNAs in plants and so far they have been studied only in model organisms. The availability of High Performance computational resources as well as expression data from species sampled across a wide phylogenetic gives us the opportunity to study non-coding RNAs in a phylogenetic framework.
*Graduate students.
Another aspect of molecular evolution I'm particularly interested in is the establishment and maintenance of cell-type specific protein expression. On the one hand a cell's identity is defined by the set of proteins it (or its neighbors) expresses. On the other hand a cell needs to maintain its function and therefore must regulate which proteins it expresses and this is the context in which evolutionary innovations arise.
The prevalence of neutral evolution at the DNA and mRNA level means that it is particularly difficult to detect instances of positive selection. Because proteins are closer to the phentoype, where selective forces are acting directly, it is possible that proteins are less subject to the action of drift and therefore provide clearer signals of past selective events. For this reason, I've been studying the evolution of protein expression and localization in humans and other primates using tissue microarrays. This technology allows me to detect and quantitate protein expression in dozens of different tissues and cell types in parallel and thus permits a better interpretation of observed differences. Another project involving the study of proteome is to integrate the information from the different molecular levels in order to build a more comprehensive picture of how molecular evolution works.
One key factor in evolution is changes in the environment, as a novel environment requires new tools and devices to explore it. Humans have not escaped this general trend and the environment we live in today is probably radically different from the one in which our ancestors lived. Accordingly, genes that are involved in interactions with the environment might have undergone positive selection which would have left its trace in our genome. In this respect I have studied the evolution of the olfactory receptor gene repertoire in domesticated species, in human and other primates. Similarly, I looked for evidence of the action of past positive selection in humans on genes involved in hearing.