Quantitative Assessment for Relationship between Sequence Similarity and Function Similarity
Prediction of function using comparative sequence analysis is widely used in genome annotation. However, if not performed appropriately, it may lead to the creation and propagation of assignment errors. In this study, we quantified the relationship between sequence similarity and function similarity in terms of the three aspects of Gene Ontology Annotation (biological process, molecular function, and subcellular localization). Our study provides a benchmark to estimate the confidence in assignment of functions purely based on sequence similarity. We present an analysis of the relationship between sequence similarity and function similarity for the well-characterized proteins from four different genomes. Using a simple measure of functional similarity based on the Gene Ontology classification, it is shown that functional similarity correlates well with sequence similarity measured by sequence identity or statistical significance of the alignment score. We also highlight the differences in the above relationship, which are observed in annotations based on experimental evidences and those based on sequence similarity. The data and results are available for download at this site.
The annotations based on experimental evidence are shown with the species name in green; while annotations based on sequence similarity evidences are shown with species name in purple.