SpaceLDA: topic distributions aggregation from a heterogeneous corpus for space systems

Research output: Contribution to journalArticlepeer-review

14 Downloads (Pure)

Abstract

The design of highly complex systems such as spacecraft entails large amounts of documentation. Tracking relevant information, including hundreds of requirements, throughout several design stages is a challenge. In this study, we propose a novel strategy based on Topic Modelling to facilitate the management of spacecraft design requirements. We introduce spaceLDA, a novel domain-specific semi-supervised Latent Dirichlet Allocation (LDA) model enriched with lexical priors and an optimised Weighted Sum (WS). We collect and curate the first large collection of unstructured data related to space systems, combining several sources: Wikipedia pages, books, and feasibility reports provided by the European Space Agency (ESA). We train the spaceLDA model on three subsets of our heterogeneous training corpus. To combine the resulting per-document topic distributions, we enrich our model with an aggregation method based on an optimised WS. We evaluate our model through a case study, a categorisation of spacecraft design requirements. We finally compare our model’s performance with an unsupervised LDA model and with a literature aggregation method. The results
demonstrate that the spaceLDA model successfully identifies the topics of requirements and that our proposed approach surpasses the use of a classic LDA model and the state of the art aggregation method.
Original languageEnglish
Article number104273
Number of pages11
JournalEngineering Applications of Artificial Intelligence
Volume102
Early online date5 May 2021
DOIs
Publication statusPublished - 30 Jun 2021

Keywords

  • topic modelling
  • Latent Dirichlet Allocation
  • spacecraft design
  • requirements
  • aggregation

Fingerprint

Dive into the research topics of 'SpaceLDA: topic distributions aggregation from a heterogeneous corpus for space systems'. Together they form a unique fingerprint.

Cite this