Abstract
Many tasks related to or supporting information retrieval, such as query expansion, automated question answering, reasoning, or heterogeneous database integration, involve verification of a semantic category (e.g. “coffee” is a drink, “red” is a color, while “steak” is not a drink and “big” is not a color). We present a novel framework to automatically validate a membership in an arbitrary, not a trained a priori semantic category up to a desired level of accuracy. Our approach does not rely on any manually codified knowledge but instead capitalizes on the diversity of topics and word usage in a large corpus (e.g. World Wide Web). Using TREC factoid questions that expect the answer to belong to a specific semantic category, we show that a very high level of accuracy can be reached by automatically identifying more training seeds and more training patterns when needed. We develop a specific quantitative validation model that takes uncertainty and redundancy in the training data into consideration. We empirically confirm the important aspects of our model through ablation studies.
Original language | English |
---|---|
Title of host publication | Advances in Information Retrieval Theory |
Subtitle of host publication | Third International Conference, ICTIR 2011, Bertinoro, Italy, September 12-14, 2011. Proceedings |
Pages | 274-284 |
Number of pages | 11 |
Volume | 6931 |
DOIs | |
Publication status | Published - 2011 |
Keywords
- information retrieval
- IR
- semantic searching
- semantic category verification