Identification of MIR-Flickr near-duplicate images: a benchmark collection for near-duplicate detection

Richard Connor, Stewart MacKenzie-Leigh, Franco Alberto Cardillo, Robert Moss

Research output: Contribution to conferencePaper

2 Citations (Scopus)

Abstract

There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in which to do so. Published sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR- Flickr 1M image set contains a significant number of near-duplicate images, we have discovered the majority of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is the identification of these images, which may then be used by other authors to make comparisons as they see fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity and specificity over a general collection of images.

Conference

Conference10th International Conference on Computer Vision Theory and Applications (VISAPP 2015)
CountryGermany
CityBerlin
Period11/03/1514/03/15

Fingerprint

Image quality

Keywords

  • near-duplicate image detection
  • benchmark
  • forensic image detection
  • image similarity function

Cite this

Connor, R., MacKenzie-Leigh, S., Cardillo, F. A., & Moss, R. (2015). Identification of MIR-Flickr near-duplicate images: a benchmark collection for near-duplicate detection. 565-571. Paper presented at 10th International Conference on Computer Vision Theory and Applications (VISAPP 2015), Berlin, Germany. https://doi.org/10.5220/0005359705650571
Connor, Richard ; MacKenzie-Leigh, Stewart ; Cardillo, Franco Alberto ; Moss, Robert . / Identification of MIR-Flickr near-duplicate images : a benchmark collection for near-duplicate detection. Paper presented at 10th International Conference on Computer Vision Theory and Applications (VISAPP 2015), Berlin, Germany.7 p.
@conference{f5fd13586c81461eb3f4a5b2e0cfe58d,
title = "Identification of MIR-Flickr near-duplicate images: a benchmark collection for near-duplicate detection",
abstract = "There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in which to do so. Published sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR- Flickr 1M image set contains a significant number of near-duplicate images, we have discovered the majority of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is the identification of these images, which may then be used by other authors to make comparisons as they see fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity and specificity over a general collection of images.",
keywords = "near-duplicate image detection, benchmark, forensic image detection, image similarity function",
author = "Richard Connor and Stewart MacKenzie-Leigh and Cardillo, {Franco Alberto} and Robert Moss",
year = "2015",
month = "3",
day = "14",
doi = "10.5220/0005359705650571",
language = "English",
pages = "565--571",
note = "10th International Conference on Computer Vision Theory and Applications (VISAPP 2015) ; Conference date: 11-03-2015 Through 14-03-2015",

}

Connor, R, MacKenzie-Leigh, S, Cardillo, FA & Moss, R 2015, 'Identification of MIR-Flickr near-duplicate images: a benchmark collection for near-duplicate detection' Paper presented at 10th International Conference on Computer Vision Theory and Applications (VISAPP 2015), Berlin, Germany, 11/03/15 - 14/03/15, pp. 565-571. https://doi.org/10.5220/0005359705650571

Identification of MIR-Flickr near-duplicate images : a benchmark collection for near-duplicate detection. / Connor, Richard; MacKenzie-Leigh, Stewart; Cardillo, Franco Alberto; Moss, Robert .

2015. 565-571 Paper presented at 10th International Conference on Computer Vision Theory and Applications (VISAPP 2015), Berlin, Germany.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Identification of MIR-Flickr near-duplicate images

T2 - a benchmark collection for near-duplicate detection

AU - Connor, Richard

AU - MacKenzie-Leigh, Stewart

AU - Cardillo, Franco Alberto

AU - Moss, Robert

PY - 2015/3/14

Y1 - 2015/3/14

N2 - There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in which to do so. Published sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR- Flickr 1M image set contains a significant number of near-duplicate images, we have discovered the majority of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is the identification of these images, which may then be used by other authors to make comparisons as they see fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity and specificity over a general collection of images.

AB - There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in which to do so. Published sets of near-duplicate images exist, but are typically small, specialist, or generated. Here, we give a new test set based on a large, serendipitously selected collection of high quality images. Having observed that the MIR- Flickr 1M image set contains a significant number of near-duplicate images, we have discovered the majority of these. We disclose a set of 1,958 near-duplicate clusters from within the set, and show that this is very likely to contain almost all of the near-duplicate pairs that exist. The main contribution of this publication is the identification of these images, which may then be used by other authors to make comparisons as they see fit. In particular however, near-duplicate classification functions may now be accurately tested for sensitivity and specificity over a general collection of images.

KW - near-duplicate image detection

KW - benchmark

KW - forensic image detection

KW - image similarity function

UR - http://www.visapp.visigrapp.org/?y=2015

UR - http://mir-flickr-near-duplicates.appspot.com/

U2 - 10.5220/0005359705650571

DO - 10.5220/0005359705650571

M3 - Paper

SP - 565

EP - 571

ER -

Connor R, MacKenzie-Leigh S, Cardillo FA, Moss R. Identification of MIR-Flickr near-duplicate images: a benchmark collection for near-duplicate detection. 2015. Paper presented at 10th International Conference on Computer Vision Theory and Applications (VISAPP 2015), Berlin, Germany. https://doi.org/10.5220/0005359705650571