Abstract
Imaging and image-based process analytical technologies (PAT) have revolutionized the design, development, and operation of crystallization processes, providing greater process understanding through the characterization of particle size, shape and crystallization mechanisms in real-time. The performance of corresponding PAT models, including machine learning/artificial intelligence (ML/AI)-based approaches, is highly reliant on the data quality used for training or validation. However, acquiring high quality data is often time consuming and a major roadblock in developing image analysis models for crystallization processes.
To address the lack of diverse, high-quality, and publicly available particle image datasets, this paper presents an initiative to create an open-access crystallization-related image database: OpenCrystalData (OCD, at www.kaggle.com/opencrystaldata/datasets). The datasets consist of images from different crystallization systems with different particle sizes and shapes captured under various conditions. The initial release consists of four different datasets, addressing the estimation of particle size distribution using in-situ images for different categories of particles and detection of anomalous particles for process monitoring purposes. Images are collected using various instruments, followed by case-specific processing steps, such as ground-truth labeling and particle size characterization using offline microscopy. Datasets are released on the online collaborative platform Kaggle, along with specific guidelines for each dataset. These datasets are aimed to serve as a resource for researchers to enable learning, experimentation, development, and evaluation and comparison of different analytical approaches and algorithms. Another goal of this initiative is to encourage researchers to contribute new datasets focusing on various systems and problem statements. Ultimately, OpenCrystalData is intended to facilitate and inspire new developments in imaging-based PAT for crystallization processes, encouraging a shift from time-consuming offline analysis towards comprehensive real-time process insights that drive product quality.
To address the lack of diverse, high-quality, and publicly available particle image datasets, this paper presents an initiative to create an open-access crystallization-related image database: OpenCrystalData (OCD, at www.kaggle.com/opencrystaldata/datasets). The datasets consist of images from different crystallization systems with different particle sizes and shapes captured under various conditions. The initial release consists of four different datasets, addressing the estimation of particle size distribution using in-situ images for different categories of particles and detection of anomalous particles for process monitoring purposes. Images are collected using various instruments, followed by case-specific processing steps, such as ground-truth labeling and particle size characterization using offline microscopy. Datasets are released on the online collaborative platform Kaggle, along with specific guidelines for each dataset. These datasets are aimed to serve as a resource for researchers to enable learning, experimentation, development, and evaluation and comparison of different analytical approaches and algorithms. Another goal of this initiative is to encourage researchers to contribute new datasets focusing on various systems and problem statements. Ultimately, OpenCrystalData is intended to facilitate and inspire new developments in imaging-based PAT for crystallization processes, encouraging a shift from time-consuming offline analysis towards comprehensive real-time process insights that drive product quality.
Original language | English |
---|---|
Article number | 100150 |
Number of pages | 7 |
Journal | Digital Chemical Engineering |
Volume | 11 |
Early online date | 9 Apr 2024 |
DOIs | |
Publication status | Published - 30 Jun 2024 |
Keywords
- crystallization
- process analytical technology
- imaging
- open-access database
- machine learning