Co-Occurrence of Domain Analysis
CODA is based on domain fusion analysis. The aim of gene fusion methods is to infer protein-protein interactions or more generally functional associations between pairs of separate protein chains in a genome of interest whose orthologues have become fused in another species. Enright et al. (1999) and Marcotte et al. (1999) were the first groups to introduce this approach. CODA uses a Multi-Domain Architecture (MDA) representation of proteins in complete genomes (target genomes) provided by Gene3D Multi-Domain Architecture datasets. The Gene3D database contains protein sequences for all complete genomes with predictions for CATH and Pfam domains as well as functional annotations including GO. MDA CATH and Pfam datasets were created from 527 complete genomes (50 eukaryotes, 438 eubacteria and 39 archaea), CODA predictions were performed on these two (CATH and PFAM) generating the CODAcath and CODApfam datasets. These two datasets power the CodaCathService and CodaPfamService predictors respectively.
More information: Reid et al., CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion
