A semiautomatic method for converting to GENSAL those parts of Derwent Publications Ltd. Documentation Abstracts which specify generic structures is reported in this paper and that which follows. Techniques of natural language processing (NLP) applied in a prototype system are discussed. This paper deals with the lexical isolation and categorization of tokens from the generic structure textual descriptions. Templates for processing of both the variable and multiplier expressions, which predominate, have been identified; they provide the basis for further analysis. Rules for the isolation of tokens are discussed and illustrated. Some categories of tokens are identified by morphological analysis, while others are dealt with by dictionary lookup. The output is a list of tokens along with a number of associated semantic features which help at the processing stage discussed in the following paper.
|Number of pages||5|
|Journal||Journal of Chemical Information and Computer Sciences|
|Publication status||Published - 1 Sep 1992|
- natural language processing
- prototype system
- isolation of tokens