Detection of homologous proteins by an intermediate sequence search
Authors
Abstract
We developed a variant of the intermediate sequence search method (ISSnew) for detection and alignment of weakly similar pairs of protein sequences. ISSnew relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E‐value larger than 0.001); 2050 of these sequences had a related structure in the set. ISSnew performed significantly better than both PSI‐BLAST and a previously described intermediate sequence search method. PSI‐BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISSnew assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISSnew may be able to assign the folds of domains in ∼29,000 of the ∼500,000 sequences unassigned by PSI‐BLAST, with 90% specificity (1 − false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E‐values include the nearly best alignments constructed by ISSnew.
Digital Object Identifier (DOI)
10.1110/ps.03335004 About DOI



