OpenCrystalData: an open-access particle image database to facilitate learning, experimentation, and development of image analysis models for crystallization processes

Yash Barhate, Christopher Boyle, Hossein Salami, Wei-Lee Wu, Nina Taherimakhsousi, Charlie Rabinowitz, Andreas Bommarius, Javier Cardona, Zoltan K. Nagy, Ronald Rousseau, Martha Grover

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
18 Downloads (Pure)

Abstract

Imaging and image-based process analytical technologies (PAT) have revolutionized the design, development, and operation of crystallization processes, providing greater process understanding through the characterization of particle size, shape and crystallization mechanisms in real-time. The performance of corresponding PAT models, including machine learning/artificial intelligence (ML/AI)-based approaches, is highly reliant on the data quality used for training or validation. However, acquiring high quality data is often time consuming and a major roadblock in developing image analysis models for crystallization processes.

To address the lack of diverse, high-quality, and publicly available particle image datasets, this paper presents an initiative to create an open-access crystallization-related image database: OpenCrystalData (OCD, at www.kaggle.com/opencrystaldata/datasets). The datasets consist of images from different crystallization systems with different particle sizes and shapes captured under various conditions. The initial release consists of four different datasets, addressing the estimation of particle size distribution using in-situ images for different categories of particles and detection of anomalous particles for process monitoring purposes. Images are collected using various instruments, followed by case-specific processing steps, such as ground-truth labeling and particle size characterization using offline microscopy. Datasets are released on the online collaborative platform Kaggle, along with specific guidelines for each dataset. These datasets are aimed to serve as a resource for researchers to enable learning, experimentation, development, and evaluation and comparison of different analytical approaches and algorithms. Another goal of this initiative is to encourage researchers to contribute new datasets focusing on various systems and problem statements. Ultimately, OpenCrystalData is intended to facilitate and inspire new developments in imaging-based PAT for crystallization processes, encouraging a shift from time-consuming offline analysis towards comprehensive real-time process insights that drive product quality.
Original languageEnglish
Article number100150
Number of pages7
JournalDigital Chemical Engineering
Volume11
Early online date9 Apr 2024
DOIs
Publication statusPublished - 30 Jun 2024

Keywords

  • crystallization
  • process analytical technology
  • imaging
  • open-access database
  • machine learning

Fingerprint

Dive into the research topics of 'OpenCrystalData: an open-access particle image database to facilitate learning, experimentation, and development of image analysis models for crystallization processes'. Together they form a unique fingerprint.

Cite this