TY - JOUR
T1 - Template mining for information extraction from digital documents
AU - Chowdhury, Gobinda G.
PY - 1999/6/30
Y1 - 1999/6/30
N2 - With the rapid growth of digital information resources, information extraction (IE) - the process of automatically extracting information from natural language texts - is becoming more important. A number of IE systems, particularly in the areas of news/fact retrieval and in domain-specific areas, such as in chemical and patent information retrieval, have been developed in the recent past using the template mining approach that involves a natural language processing (NLP) technique to extract data directly from text if either the data and/or text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to the instructions associated with that template. This article briefly reviews template mining research. It also shows how templates are used in Web search engines - such as Alta Vista - and in meta-search engines - such as Ask Jeeves - for helping end-users generate natural language search expressions. Some potential areas of application of template mining for extraction of different kinds of information from digital documents are highlighted, and how such applications are used are indicated. It is suggested that, in order to facilitate template mining standardization in the presentation, and layout of information within digital documents has to be ensured, and this can be done by generating various templates that authors can easily download and use while preparing digital documents.
AB - With the rapid growth of digital information resources, information extraction (IE) - the process of automatically extracting information from natural language texts - is becoming more important. A number of IE systems, particularly in the areas of news/fact retrieval and in domain-specific areas, such as in chemical and patent information retrieval, have been developed in the recent past using the template mining approach that involves a natural language processing (NLP) technique to extract data directly from text if either the data and/or text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to the instructions associated with that template. This article briefly reviews template mining research. It also shows how templates are used in Web search engines - such as Alta Vista - and in meta-search engines - such as Ask Jeeves - for helping end-users generate natural language search expressions. Some potential areas of application of template mining for extraction of different kinds of information from digital documents are highlighted, and how such applications are used are indicated. It is suggested that, in order to facilitate template mining standardization in the presentation, and layout of information within digital documents has to be ensured, and this can be done by generating various templates that authors can easily download and use while preparing digital documents.
KW - template mining
KW - information extraction
KW - digital documents
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=0033414325&partnerID=8YFLogxK
UR - https://www.press.jhu.edu/journals/library-trends
UR - https://core.ac.uk/download/pdf/4817594.pdf
M3 - Article
AN - SCOPUS:0033414325
SN - 1559-0682
VL - 48
SP - 182
EP - 208
JO - Library Trends
JF - Library Trends
IS - 1
ER -