TY - GEN
T1 - Content-based classification of construction drawings
T2 - EG-ICE 2025: International Workshop on Intelligent Computing in Engineering
AU - Carrara, Andrea
AU - Nousias, Stavros
AU - Borrmann, André
PY - 2025/7/1
Y1 - 2025/7/1
N2 - Automated classification of construction drawings is essential for improving efficiency and reducing errors in Architecture, Engineering, and Construction (AEC) workflows. While Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown success in image-based tasks, they often fall short in capturing the relational and symbolic structures inherent in technical drawings. This paper presents a comparative study of Vision Transformers and Graph Attention Networks (GAT) using a real-world dataset of 450 professional construction drawings, each labeled in four standardized categories: Project Phase, Discipline, Representation, and Level. Drawings are represented in two formats: rasterized images and structured vector-based graphs and processed through dedicated deep learning pipelines. Experimental results reveal that GNNs outperform ViTs in overall accuracy, particularly in structure-sensitive categories like Level, by leveraging spatial and topological relationships. While pretrained ViTs demonstrate strong performance, particularly in visually distinct categories, and offer faster training throughput, GNNs provide superior generalization and interpretability. The study highlights the trade-offs between accuracy, computational efficiency, and practical deployment, offering valuable insights for integrating deep learning into real-world AEC classification systems.
AB - Automated classification of construction drawings is essential for improving efficiency and reducing errors in Architecture, Engineering, and Construction (AEC) workflows. While Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have shown success in image-based tasks, they often fall short in capturing the relational and symbolic structures inherent in technical drawings. This paper presents a comparative study of Vision Transformers and Graph Attention Networks (GAT) using a real-world dataset of 450 professional construction drawings, each labeled in four standardized categories: Project Phase, Discipline, Representation, and Level. Drawings are represented in two formats: rasterized images and structured vector-based graphs and processed through dedicated deep learning pipelines. Experimental results reveal that GNNs outperform ViTs in overall accuracy, particularly in structure-sensitive categories like Level, by leveraging spatial and topological relationships. While pretrained ViTs demonstrate strong performance, particularly in visually distinct categories, and offer faster training throughput, GNNs provide superior generalization and interpretability. The study highlights the trade-offs between accuracy, computational efficiency, and practical deployment, offering valuable insights for integrating deep learning into real-world AEC classification systems.
KW - construction drawing classification
KW - graph neural network
KW - vision transformer
KW - transformer based models
KW - comparative analysis
U2 - 10.17868/strath.00093309
DO - 10.17868/strath.00093309
M3 - Conference contribution book
SN - 9781914241826
BT - EG-ICE 2025
A2 - Moreno-Rangel, Alejandro
A2 - Kumar, Bimal
CY - Glasgow
Y2 - 1 July 2025 through 3 July 2025
ER -