Comparing text-based and dependence-based approaches for determining the origins of bugs

Steven Davies, Marc Roper, Murray Wood

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness.
LanguageEnglish
Pages107-139
Number of pages23
JournalJournal of Software: Evolution and Process
Volume26
Issue number1
Early online date4 Oct 2013
DOIs
Publication statusPublished - Jan 2014

Fingerprint

Software engineering

Keywords

  • software maintenance
  • bug tracking systems
  • version control
  • program dependence graph
  • mining software repositories
  • bug origins

Cite this

@article{a625a3d6c0bd4f82b3db9161eb3c564b,
title = "Comparing text-based and dependence-based approaches for determining the origins of bugs",
abstract = "Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79{\%} precision and 40–70{\%} recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness.",
keywords = "software maintenance, bug tracking systems, version control, program dependence graph, mining software repositories, bug origins",
author = "Steven Davies and Marc Roper and Murray Wood",
note = "This is the accepted version of the following article: Davies, S., Roper, M. and Wood, M. (2014), Comparing text-based and dependence-based approaches for determining the origins of bugs. J. Softw. Evol. and Proc., 26: 107–139. doi: 10.1002/smr.1619, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/smr.1619/abstract",
year = "2014",
month = "1",
doi = "10.1002/smr.1619",
language = "English",
volume = "26",
pages = "107--139",
journal = "Journal of Software: Evolution and Process",
issn = "2047-7481",
number = "1",

}

Comparing text-based and dependence-based approaches for determining the origins of bugs. / Davies, Steven; Roper, Marc; Wood, Murray.

In: Journal of Software: Evolution and Process, Vol. 26, No. 1, 01.2014, p. 107-139.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Comparing text-based and dependence-based approaches for determining the origins of bugs

AU - Davies, Steven

AU - Roper, Marc

AU - Wood, Murray

N1 - This is the accepted version of the following article: Davies, S., Roper, M. and Wood, M. (2014), Comparing text-based and dependence-based approaches for determining the origins of bugs. J. Softw. Evol. and Proc., 26: 107–139. doi: 10.1002/smr.1619, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/smr.1619/abstract

PY - 2014/1

Y1 - 2014/1

N2 - Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness.

AB - Identifying bug origins – the point where erroneous code was introduced – is crucial for many software engineering activities, from identifying process weaknesses to gathering data to support bug detection tools. Unfortunately, this information is not usually recorded when fixing bugs, and recovering it later is challenging. Recently, the text approach and the dependence approach have been developed to tackle this problem. Respectively, they examine textual and dependence-related changes that occurred prior to a bug fix. However, only limited evaluation has been carried out, partially because of a lack of available implementations and of datasets linking bugs to origins. To address this, origins of 174 bugs in three projects were manually identified and compared to a simulation of the approaches. Both approaches were partially successful across a variety of bugs – achieving 29–79% precision and 40–70% recall. Results suggested the precise definition of program dependence could affect performance, as could whether the approaches identified a single or multiple origins. Some potential improvements are explored in detail and identify pragmatic strategies for combining techniques along with simple modifications. Even after adopting these improvements, there remain many challenges: large commits, unrelated changes and long periods between origins and fixes all reduce effectiveness.

KW - software maintenance

KW - bug tracking systems

KW - version control

KW - program dependence graph

KW - mining software repositories

KW - bug origins

UR - http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2047-7481

U2 - 10.1002/smr.1619

DO - 10.1002/smr.1619

M3 - Article

VL - 26

SP - 107

EP - 139

JO - Journal of Software: Evolution and Process

T2 - Journal of Software: Evolution and Process

JF - Journal of Software: Evolution and Process

SN - 2047-7481

IS - 1

ER -