Building and verifying a shallow ontology for higher-quality NLP
by Eduard Hovy, Information Sciences Institute,
University of Southern California
Research in natural language processing (NLP) over the past fifteen years has produced impressive practical results using statistical methods. But increasingly there are signs that continued quality improvement in language processing applications (including QA, summarization, information extraction, and machine translation) requires deeper and richer representations, possibly even (shallow) semantics of text meaning. Although theories of semantics (formal and informal) abound, no-one has yet built a resource of semantic symbols that effectively supports NLP, that is empirically based, and that has been validated through human agreement scores. Can this be done? This talk describes the construction of the Omega ontology to support various NLP applications, in the context of the OntoNotes project in DARPA's GALE program. Omega contains an Upper Model of about a hundred manually constructed and organized terms and a Middle Model of several thousand 'sense pools', where each sense pool is a collection of word senses from English, Arabic, and Chinese nouns and verbs, and includes one or more associated atomic features to support reasoning, as well as pointers to hundreds of individual sentences containing a word with the appropriate sense. The creation of senses, their pooling, and their integration into Omega is carried out by teams of annotators, and is subjected to cross-annotator agreement tests and other semi-automated validation procedures. To our knowledge, this is by far the most extensive ontology building effort that involves such validation.
This work is a collaboration of researchers at USC/ISI and the University of Colorado at Boulder.
back to keynotes