Genome annotation in differently
evolved organisms presents challenges because the lack of
sequence-based homology limits the ability to determine the function
of putative coding regions. To provide an alternative to annotation
by sequence homology, we developed a method that takes advantage of
unusual trypanosomatid biology and skews in nucleotide composition
between coding regions and upstream regions to rank putative open
reading frames based on the likelihood of coding. The method is 93%
accurate when tested on known genes. We have applied our method to
the full complement of open reading frames on Chromosome I of
Trypanosoma brucei, and we can predict with high confidence that 226
putative coding regions are likely to be functional. Methods such as
the one described here for discriminating true coding regions are
critical for genome annotation when other sources of evidence for
function are limited.