Automatic interpretation of the texts of chemical patent abstracts. 1. Lexical analysis and categorization

G. G. Chowdhury, M. F. Lynch*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

A semiautomatic method for converting to GENSAL those parts of Derwent Publications Ltd. Documentation Abstracts which specify generic structures is reported in this paper and that which follows. Techniques of natural language processing (NLP) applied in a prototype system are discussed. This paper deals with the lexical isolation and categorization of tokens from the generic structure textual descriptions. Templates for processing of both the variable and multiplier expressions, which predominate, have been identified; they provide the basis for further analysis. Rules for the isolation of tokens are discussed and illustrated. Some categories of tokens are identified by morphological analysis, while others are dealt with by dictionary lookup. The output is a list of tokens along with a number of associated semantic features which help at the processing stage discussed in the following paper.

Original languageEnglish
Pages (from-to)463-467
Number of pages5
JournalJournal of Chemical Information and Computer Sciences
Volume32
Issue number5
DOIs
Publication statusPublished - 1 Sept 1992

Keywords

  • natural language processing
  • prototype system
  • isolation of tokens

Fingerprint

Dive into the research topics of 'Automatic interpretation of the texts of chemical patent abstracts. 1. Lexical analysis and categorization'. Together they form a unique fingerprint.

Cite this