Extracting partition statistics from semistructured data

John N. Wilson, Richard Gourlay, Robert Japp, Mathias Neumüller

Research output: Contribution to conferencePaper

2 Citations (Scopus)
35 Downloads (Pure)

Abstract

The effective grouping, or partitioning, of semistructured data is of fundamental importance when providing support for queries. Partitions allow items within the data set that share common structural properties to be identified efficiently. This allows queries that make use of these properties, such as branching path expressions, to be accelerated. Here, we evaluate the effectiveness of several partitioning techniques by establishing the number of partitions that each scheme can identify over a given data set. In particular, we explore the use of parameterised indexes, based upon the notion of forward and backward bisimilarity, as a means of partitioning semistructured data; demonstrating that even restricted instances of such indexes can be used to identify the majority of relevant partitions in the data.
Original languageEnglish
Pages497-506
Number of pages9
Publication statusPublished - 4 Sep 2006
Event17th International Workshop on Database and Expert Systems Applications (DEXA 2006) - Krakow, Poland
Duration: 4 Sep 20068 Sep 2006

Conference

Conference17th International Workshop on Database and Expert Systems Applications (DEXA 2006)
CityKrakow, Poland
Period4/09/068/09/06

Fingerprint

Structural properties
Statistics

Keywords

  • semistructured data
  • data management
  • partitions
  • indexes
  • statistics

Cite this

Wilson, J. N., Gourlay, R., Japp, R., & Neumüller, M. (2006). Extracting partition statistics from semistructured data. 497-506. Paper presented at 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, .
Wilson, John N. ; Gourlay, Richard ; Japp, Robert ; Neumüller, Mathias. / Extracting partition statistics from semistructured data. Paper presented at 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, .9 p.
@conference{54ad65d9d5e44240925ddcc362947927,
title = "Extracting partition statistics from semistructured data",
abstract = "The effective grouping, or partitioning, of semistructured data is of fundamental importance when providing support for queries. Partitions allow items within the data set that share common structural properties to be identified efficiently. This allows queries that make use of these properties, such as branching path expressions, to be accelerated. Here, we evaluate the effectiveness of several partitioning techniques by establishing the number of partitions that each scheme can identify over a given data set. In particular, we explore the use of parameterised indexes, based upon the notion of forward and backward bisimilarity, as a means of partitioning semistructured data; demonstrating that even restricted instances of such indexes can be used to identify the majority of relevant partitions in the data.",
keywords = "semistructured data, data management, partitions, indexes, statistics",
author = "Wilson, {John N.} and Richard Gourlay and Robert Japp and Mathias Neum{\"u}ller",
year = "2006",
month = "9",
day = "4",
language = "English",
pages = "497--506",
note = "17th International Workshop on Database and Expert Systems Applications (DEXA 2006) ; Conference date: 04-09-2006 Through 08-09-2006",

}

Wilson, JN, Gourlay, R, Japp, R & Neumüller, M 2006, 'Extracting partition statistics from semistructured data' Paper presented at 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, 4/09/06 - 8/09/06, pp. 497-506.

Extracting partition statistics from semistructured data. / Wilson, John N.; Gourlay, Richard; Japp, Robert; Neumüller, Mathias.

2006. 497-506 Paper presented at 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, .

Research output: Contribution to conferencePaper

TY - CONF

T1 - Extracting partition statistics from semistructured data

AU - Wilson, John N.

AU - Gourlay, Richard

AU - Japp, Robert

AU - Neumüller, Mathias

PY - 2006/9/4

Y1 - 2006/9/4

N2 - The effective grouping, or partitioning, of semistructured data is of fundamental importance when providing support for queries. Partitions allow items within the data set that share common structural properties to be identified efficiently. This allows queries that make use of these properties, such as branching path expressions, to be accelerated. Here, we evaluate the effectiveness of several partitioning techniques by establishing the number of partitions that each scheme can identify over a given data set. In particular, we explore the use of parameterised indexes, based upon the notion of forward and backward bisimilarity, as a means of partitioning semistructured data; demonstrating that even restricted instances of such indexes can be used to identify the majority of relevant partitions in the data.

AB - The effective grouping, or partitioning, of semistructured data is of fundamental importance when providing support for queries. Partitions allow items within the data set that share common structural properties to be identified efficiently. This allows queries that make use of these properties, such as branching path expressions, to be accelerated. Here, we evaluate the effectiveness of several partitioning techniques by establishing the number of partitions that each scheme can identify over a given data set. In particular, we explore the use of parameterised indexes, based upon the notion of forward and backward bisimilarity, as a means of partitioning semistructured data; demonstrating that even restricted instances of such indexes can be used to identify the majority of relevant partitions in the data.

KW - semistructured data

KW - data management

KW - partitions

KW - indexes

KW - statistics

UR - http://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_1545.pdf

M3 - Paper

SP - 497

EP - 506

ER -

Wilson JN, Gourlay R, Japp R, Neumüller M. Extracting partition statistics from semistructured data. 2006. Paper presented at 17th International Workshop on Database and Expert Systems Applications (DEXA 2006), Krakow, Poland, .