The field of computational neuroscience has seen significant growth, driven by the
development of sophisticated machine learning algorithms. These advancements allow
for detailed analysis of brain signals, helping researchers to discover new properties and
phenomena within the human brain. Progress in machine learning, especially with the
introduction of the transformer architecture and Large Language Models (LLMs), has
revolutionized Natural Language Processing (NLP) by achieving unprecedented results.
The continuous evolution of these methods highlights the ongoing advancements in NLP
technologies and applications.
Beyond NLP, machine learning models like Data2Vec and Wav2Vec2 have broadened possibilities in image and video processing by integrating multiple data modalities.
These models improve multi-modal frameworks, enhancing the capability to interpret
various inputs, such as combining textual and visual data in question-answering systems. This integration signals a paradigm shift in machine learning, exemplified by
advancements like voice-activated assistants and Google Lens, which offer innovative
interaction methods. This thesis explores the potential of brain-computer powered interfaces for user interaction through cognitive processes as an alternative way of interacting with a computer system, recognizing the need to address foundational challenges
to advance this cutting-edge field. Due to the early stages of this innovative discipline,
substantial groundwork is required to identify and systematically resolve the multifaceted challenges intrinsic to the development of such advanced interaction systems,
thereby establishing a robust foundation for future advancements in this intriguing
domain.
In the Chapter 1 of this thesis, a thoroughly developed introduction is presented This section fulfils several critical functions: it outlines the structure of the thesis, providing readers with a coherent roadmap of the thesis’s contents and the trajectory of
the forthcoming discussion. Additionally, it concisely summarizes the research achievements to date, offering a retrospective overview of the progress attained during the
investigation. The chapter also examines the foundational motivation driving the research effort, clarifying the reasoning behind the development of the proposed system.
Central to this section is the expression of the core research questions that the thesis
aims to explore, which are essential in steering the scholarly inquiry and contributing
to the broader academic dialogue.
Within Chapter 2 of the present thesis, a comprehensive literature review has been
rigorously executed to encompass a broad spectrum of the most esteemed brain imaging
modalities, along with their recent advancements within the multidisciplinary sphere
of neuroscience. This extensive analysis also highlights certain contemporary developments in machine learning that are pertinent to the field of neuroscience. Moreover, it
presents particular tools, such as Data2Vec, which primarily are not being specifically
intended for interaction with neural data, bear potential utility in the conceptualization
and design of such a complex system.
In Chapter 3, the proposed system is introduced in detail. This chapter offers an
exhaustive description of the system, articulating all of its essential components and
anticipating potential challenges that may arise during its development. A schematic
high-level design of the system is presented. It is crucial to emphasize that the full
implementation of such an ambitious system lies beyond the scope of this thesis. Chapters 4 through 6 provide a thorough investigation into three critical components of this
complex system. Each component is meticulously examined to clarify the challenges
encountered, the recent progress made, and the subsequent improvements achieved to
integrate these components seamlessly into the comprehensive system framework.
In Chapter 4, we employ a rigorous analytical methodology to evaluate the congruence of advanced natural language processing models with empirical neuroscientific
data. This evaluation is imperative for the prospective implementation of models capable of interpreting human cognitive processes. Our findings validate that while certain models do not achieve complete congruence with brain data, they demonstrate a significant level of alignment, thereby affirming the proficiency of current state-of-the-art
models in acquiring intricate representations.
Chapter 5 embarks on a comprehensive investigation into the domain of text generation derived from neurological data. This investigation is motivated by two primary
considerations. Firstly, text serves as one of the most prevalent means through which
individuals interact with computational systems. A variety of mechanisms have been
developed to facilitate this interaction with both precision and security. Adhering to
the principle of eschewing the duplication of existing mechanisms, it was hypothesized
that if a model could be developed to generate text from neural signals, it could be
seamlessly integrated with existing systems. The brain’s capacity to provide precise lexical terms could, therefore, enhance search engine inputs by mitigating issues related
to query disambiguation.
Secondly, within the domain of Neuroscience, the ambition to generate text from
brain activity, particularly through non-invasive neuroimaging techniques, has persisted
as a longstanding intellectual venture among many researchers. As presented in Chapter 5, the preliminary attempt to actualize this type of generation involved utilizing
transformer models to synthesize brain features and subsequently applying a large language model to produce text. Diverse activation functions were employed to optimize
the performance of the processing pipeline. This approach was based on the observation that conventional activation functions predominantly assume linearity within
the data at some stage of the training process, as demonstrated by their graphical
representations.
Nonetheless, empirical studies have indicated that the aforementioned assumption
fails to hold in real-world data situations, notably within meteorological datasets. The
empirical assessment of various activation functions identified that those based on polynomials, along with polynomial functions featuring adjustable constants fine-tuned during the training phase, emerged as the most effective. These results represent a significant advancement in the neural data-to-text transformation framework, presenting a
novel dimension to the fields of computational and neuroscientific research.
In the course of advancing this thesis, particularly in the forthcoming Chapter 6, an
exploration is undertaken into the development of a novel brain encoder. This encoder
seeks to establish a comprehensive and generalized modelling framework capable of
effectively learning and encapsulating general features pertinent to the cognitive processes by which the human brain interprets language. The impetus for this research
trajectory stems from the findings of Chapter 5, wherein, despite surpassing existing
baseline results, there remained a significant gap in achieving authentic brain-to-text
decoding capabilities. The hypothesis informing this investigation posits that the limitation does not reside in the incorporation of specific neural features derived from the
transformer encoder model. Chapter 6 further explores how advanced methodologies
such as Data2Vec and Wav2Vec2 might be harnessed to formulate such holistic brain
’embeddings.’ Previous efforts in this domain have predominantly focused on utilizing
generic embeddings for mental state classification, with minimal focus on their potential in the generation domain. Consequently, Chapter 6 articulates and implements
a systematic pipeline designed to construct these generic embeddings, systematically
applying them to the complex domain of brain-to-text conversion processes, thereby
offering a novel perspective and making a significant contribution to the field.
In Chapter 7, we present the design of a novel interface that offers a dual contribution to both the Neuroscience and Machine Learning communities. The primary
objective of this thesis was to develop a software system that enables users to interact
through brain activity. We have designed and implemented a chatbot interface capable of concurrently capturing brain data while functioning as a conventional chatbot.
Furthermore, this chatbot is engineered to be adaptable and highly customizable with
millisecond precision, allowing it to serve as a bridge between machine learning and
neuroscience, as well as a platform for further neuroscience-focused data collection.
In the final Chapter 8 of this thesis, a comprehensive and detailed synthesis of the
research findings is meticulously articulated, with particular emphasis on the systematic
addressing of each research question. Additionally, this chapter scrupulously delineates
the current limitations that hinder the development of such an innovative system. A
thorough and comprehensive report is subsequently presented, providing robust and
practical guidelines for leveraging the insights derived from this research. Such a report
is crucial for constructing a framework upon which future scholarly research can build
to ultimately achieve the completion and realization of this sophisticated system.
Date of Award | 22 May 2025 |
---|
Original language | English |
---|
Awarding Institution | - University Of Strathclyde
|
---|
Sponsors | University of Strathclyde |
---|
Supervisor | Yashar Moshfeghi (Supervisor) & George Weir (Supervisor) |
---|