MiraBest: a data set of morphologically classified radio galaxies for machine learning

Fiona A. M. Porter*, Anna M. M. Scaife

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

The volume of data from current and future observatories has motivated the increased development and application of automated machine learning methodologies for astronomy. However, less attention has been given to the production of standardised datasets for assessing the performance of different machine learning algorithms within astronomy and astrophysics. Here we describe in detail the MiraBest dataset, a publicly available batched dataset of 1256 radio-loud AGN from NVSS and FIRST, filtered to $0.03 < z < 0.1$, manually labelled by Miraghaei and Best (2017) according to the Fanaroff-Riley morphological classification, created for machine learning applications and compatible for use with standard deep learning libraries. We outline the principles underlying the construction of the dataset, the sample selection and pre-processing methodology, dataset structure and composition, as well as a comparison of MiraBest to other datasets used in the literature. Existing applications that utilise the MiraBest dataset are reviewed, and an extended dataset of 2100 sources is created by cross-matching MiraBest with other catalogues of radio-loud AGN that have been used more widely in the literature for machine learning applications.
Original languageEnglish
Pages (from-to)293-306
Number of pages14
JournalRAS Techniques and Instruments
Volume2
Issue number1
Early online date19 Jun 2023
DOIs
Publication statusPublished - 24 Jun 2023

Keywords

  • astro-ph.IM
  • cs.LG
  • machine learning
  • astronomical data bases
  • radio continuum: galaxies

Fingerprint

Dive into the research topics of 'MiraBest: a data set of morphologically classified radio galaxies for machine learning'. Together they form a unique fingerprint.

Cite this