An empirical analysis of training protocols for probabilistic gene finders
Background: Generalized hidden Markov models (GHMMs) appear to be approaching acceptanceas a de facto standard for state-of-the-art ab initio gene finding, as evidenced by the recentproliferation of GHMM implementations. While prevailing methods for modeling and parsing genesusing GHMMs have been described in the literature, little attention has been paid as of yet to theirproper training. The few hints available in the literature together with anecdotal observationssuggest that most practitioners perform maximum likelihood parameter estimation only at the localsubmodel level, and then attend to the optimization of global parameter structure using some formof ad hoc manual tuning of individual parameters.Results: We decided to investigate the utility of applying a more systematic optimization approachto the tuning of global parameter structure by implementing a global discriminative trainingprocedure for our GHMM-based gene finder. Our results show that significant improvement inprediction accuracy can be achieved by this method.Conclusions: We conclude that training of GHMM-based gene finders is best performed usingsome form of discriminative training rather than simple maximum likelihood estimation at thesubmodel level, and that generalized gradient ascent methods are suitable for this task. We alsoconclude that partitioning of training data for the twin purposes of maximum likelihood initializationand gradient ascent optimization appears to be unnecessary, but that strict segregation of test datamust be enforced during final gene finder evaluation to avoid artificially inflated accuracymeasurements.
Year of publication: |
2004-12-21
|
---|---|
Authors: | Majoros, William H. ; Salzberg, Steven L. |
Publisher: |
BMC Bioinformatics |
Subject: | Generalized hidden Markov models (GHMMs) | ab initio gene finding | gene finder |
Saved in:
freely available
Saved in favorites
Similar items by subject
-
Find similar items by using search terms and synonyms from our Thesaurus for Economics (STW).