The field of automated mathematical reasoning has captured the interest of the AI
community since the last century, acknowledged as a key step towards achieving true
artificial intelligence. This research domain’s evolution transits through rule-based approaches, semantic parsing, statistical machine learning, and recently, deep learning
techniques. Moreover, automated mathematical reasoning has found extensive commercial applications. Educational enterprises have begun leveraging AI models for intelligent tutoring systems to assist students with mathematical problems. In the financial sector, it aids in analysing complex financial reports, with firms like JP Morgan incorporating AI to enhance their analysis capabilities. This thesis concentrates on two distinct tasks within automated mathematical reasoning: text-based numerical reasoning and automated geometry maths problem solving. Current methods face challenges in addressing complex mathematical reasoning tasks, evident in the lengthy and diverse solutions required. Additionally, in solving geometry maths problems, there is a noticeable deficiency in models’ abilities to accurately interpret geometric relationships from diagrams, which compromises their effectiveness. Furthermore, the advent of large language models (LLMs) and multi-modal models (MMs) underscores the need for a standardised benchmark to evaluate these models’ abilities in geometry problem-solving. To address these issues, we introduce the ELASTIC model in this thesis, designed for text-based numerical reasoning task. ELASTIC uniquely separates the generation of operators and operands to minimise errors from complex reasoning chains and is versatile enough to accommodate a varying number of operands per operator. This makes it broadly applicable across different domains. Our experimental results show ELASTIC’s superior performance, significantly outperforming prior models. Furthermore, we extend the application of the ELASTIC model to tackle geometry maths problems, which are inherently more complex due to the inclusion of geometric diagrams and a broader variety of problem types. To navigate these complexities, we propose the Geometry-Aware Problem Solver (GAPS), a model specifically crafted to solve diverse types of geometric maths problems by generating tailored solution programs. Our experiments validate GAPS’s advancement over existing methods. However, we observed that direct vector representation of geometric diagrams fails to capture the complex geometric relationships, which are critical in solving geometry maths problems. To overcome this, we propose converting geometric relationships into natural language, integrating them with the textual problem descriptions. This method
not only improves the interpretability and effectiveness of the models but also allows
for the utilisation of LLMs in generating reasoning programs. Lastly, despite the impressive capabilities of recent LLMs and MMs, their proficiency in solving geometry problems, requiring an integrated understanding of textual and visual information, remains unexplored. To fill this gap, we introduce the GeoEval benchmark in this thesis. Through extensive evaluation with GeoEval, we provide a comprehensive quantitative evaluation of the latest LLMs and MMs in geometry problem-solving task. This research marks a significant step forward in assessing the capabilities of state-of-the-art AI models in the realm of geometry problem-solving task.
Date of Award | 19 Jun 2024 |
---|
Original language | English |
---|
Awarding Institution | - University Of Strathclyde
|
---|
Sponsors | University of Strathclyde |
---|
Supervisor | Yashar Moshfeghi (Supervisor) & Crawford Revie (Supervisor) |
---|