Data value storage for compressed semi-structured data

Brian Grieve Tripney, Isla Ross, Francis Wilson, John Wilson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Growing user expectations of anywhere, anytime access to information require new types of data representations to be considered. While semi-structured data is a common exchange format, its verbose nature makes files of this type too large to be transferred quickly, especially where only a small part of that data is required by the user. There is consequently a need to develop new models of data storage to support the sharing of small segments of semi-structured data since existing XML compressors require the transfer of the entire compressed structure as a single unit.
This paper examines the potential for bisimilarity-based partitioning (i.e. the grouping of items with similar structural patterns) to be combined with dictionary compression methods to produce a data storage model that remains directly accessible for query processing whilst facilitating the sharing of individual data segments.
Study of the effects of differing types of bisimilarity upon the storage of data values identified the use of both forwards and backwards bisimilarity as the most promising basis for a dictionary-compressed structure. A query strategy is detailed that takes advantage of the compressed structure to reduce the number of data segments that must be accessed (and therefore transferred) to answer a query. A method to remove redundancy within the data dictionaries is also described and shown to have a positive effect on memory usage.
LanguageEnglish
Title of host publicationDatabase and Expert Systems Applications
Subtitle of host publicationProceedings of the 24th International Conference on Database and Expert Systems Applications
EditorsH Decker, Lenka Lhotská, Sebastian Link, Josef Basl, A Min Tjoa
Place of PublicationBerlin
Pages174-188
Number of pages15
Volume8056
DOIs
StatePublished - 14 Aug 2013

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume8056
ISSN (Print)0302-9743

Fingerprint

Semistructured data
Query
Nature
User expectations
Grouping
Query processing
Redundancy
Partitioning
Compression

Keywords

  • data
  • data storage
  • data value

Cite this

Tripney, B. G., Ross, I., Wilson, F., & Wilson, J. (2013). Data value storage for compressed semi-structured data. In H. Decker, L. Lhotská, S. Link, J. Basl, & A. M. Tjoa (Eds.), Database and Expert Systems Applications: Proceedings of the 24th International Conference on Database and Expert Systems Applications (Vol. 8056, pp. 174-188). (Lecture Notes in Computer Science; Vol. 8056). Berlin. DOI: 10.1007/978-3-642-40173-2_16
Tripney, Brian Grieve ; Ross, Isla ; Wilson, Francis ; Wilson, John. / Data value storage for compressed semi-structured data. Database and Expert Systems Applications: Proceedings of the 24th International Conference on Database and Expert Systems Applications. editor / H Decker ; Lenka Lhotská ; Sebastian Link ; Josef Basl ; A Min Tjoa. Vol. 8056 Berlin, 2013. pp. 174-188 (Lecture Notes in Computer Science).
@inproceedings{a9ba27135da448e69d4701ff619363f7,
title = "Data value storage for compressed semi-structured data",
abstract = "Growing user expectations of anywhere, anytime access to information require new types of data representations to be considered. While semi-structured data is a common exchange format, its verbose nature makes files of this type too large to be transferred quickly, especially where only a small part of that data is required by the user. There is consequently a need to develop new models of data storage to support the sharing of small segments of semi-structured data since existing XML compressors require the transfer of the entire compressed structure as a single unit.This paper examines the potential for bisimilarity-based partitioning (i.e. the grouping of items with similar structural patterns) to be combined with dictionary compression methods to produce a data storage model that remains directly accessible for query processing whilst facilitating the sharing of individual data segments.Study of the effects of differing types of bisimilarity upon the storage of data values identified the use of both forwards and backwards bisimilarity as the most promising basis for a dictionary-compressed structure. A query strategy is detailed that takes advantage of the compressed structure to reduce the number of data segments that must be accessed (and therefore transferred) to answer a query. A method to remove redundancy within the data dictionaries is also described and shown to have a positive effect on memory usage.",
keywords = "data, data storage, data value",
author = "Tripney, {Brian Grieve} and Isla Ross and Francis Wilson and John Wilson",
year = "2013",
month = "8",
day = "14",
doi = "10.1007/978-3-642-40173-2_16",
language = "English",
isbn = "9783642401725",
volume = "8056",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "174--188",
editor = "H Decker and Lhotsk{\'a}, {Lenka } and Link, {Sebastian } and Josef Basl and Tjoa, {A Min}",
booktitle = "Database and Expert Systems Applications",

}

Tripney, BG, Ross, I, Wilson, F & Wilson, J 2013, Data value storage for compressed semi-structured data. in H Decker, L Lhotská, S Link, J Basl & AM Tjoa (eds), Database and Expert Systems Applications: Proceedings of the 24th International Conference on Database and Expert Systems Applications. vol. 8056, Lecture Notes in Computer Science, vol. 8056, Berlin, pp. 174-188. DOI: 10.1007/978-3-642-40173-2_16

Data value storage for compressed semi-structured data. / Tripney, Brian Grieve; Ross, Isla; Wilson, Francis; Wilson, John.

Database and Expert Systems Applications: Proceedings of the 24th International Conference on Database and Expert Systems Applications. ed. / H Decker; Lenka Lhotská; Sebastian Link; Josef Basl; A Min Tjoa. Vol. 8056 Berlin, 2013. p. 174-188 (Lecture Notes in Computer Science; Vol. 8056).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Data value storage for compressed semi-structured data

AU - Tripney,Brian Grieve

AU - Ross,Isla

AU - Wilson,Francis

AU - Wilson,John

PY - 2013/8/14

Y1 - 2013/8/14

N2 - Growing user expectations of anywhere, anytime access to information require new types of data representations to be considered. While semi-structured data is a common exchange format, its verbose nature makes files of this type too large to be transferred quickly, especially where only a small part of that data is required by the user. There is consequently a need to develop new models of data storage to support the sharing of small segments of semi-structured data since existing XML compressors require the transfer of the entire compressed structure as a single unit.This paper examines the potential for bisimilarity-based partitioning (i.e. the grouping of items with similar structural patterns) to be combined with dictionary compression methods to produce a data storage model that remains directly accessible for query processing whilst facilitating the sharing of individual data segments.Study of the effects of differing types of bisimilarity upon the storage of data values identified the use of both forwards and backwards bisimilarity as the most promising basis for a dictionary-compressed structure. A query strategy is detailed that takes advantage of the compressed structure to reduce the number of data segments that must be accessed (and therefore transferred) to answer a query. A method to remove redundancy within the data dictionaries is also described and shown to have a positive effect on memory usage.

AB - Growing user expectations of anywhere, anytime access to information require new types of data representations to be considered. While semi-structured data is a common exchange format, its verbose nature makes files of this type too large to be transferred quickly, especially where only a small part of that data is required by the user. There is consequently a need to develop new models of data storage to support the sharing of small segments of semi-structured data since existing XML compressors require the transfer of the entire compressed structure as a single unit.This paper examines the potential for bisimilarity-based partitioning (i.e. the grouping of items with similar structural patterns) to be combined with dictionary compression methods to produce a data storage model that remains directly accessible for query processing whilst facilitating the sharing of individual data segments.Study of the effects of differing types of bisimilarity upon the storage of data values identified the use of both forwards and backwards bisimilarity as the most promising basis for a dictionary-compressed structure. A query strategy is detailed that takes advantage of the compressed structure to reduce the number of data segments that must be accessed (and therefore transferred) to answer a query. A method to remove redundancy within the data dictionaries is also described and shown to have a positive effect on memory usage.

KW - data

KW - data storage

KW - data value

UR - http://www.scopus.com/inward/record.url?scp=84884393855&partnerID=8YFLogxK

UR - http://www.springer.com/computer/database+management+%26+information+retrieval/book/978-3-642-40172-5

U2 - 10.1007/978-3-642-40173-2_16

DO - 10.1007/978-3-642-40173-2_16

M3 - Conference contribution

SN - 9783642401725

VL - 8056

T3 - Lecture Notes in Computer Science

SP - 174

EP - 188

BT - Database and Expert Systems Applications

CY - Berlin

ER -

Tripney BG, Ross I, Wilson F, Wilson J. Data value storage for compressed semi-structured data. In Decker H, Lhotská L, Link S, Basl J, Tjoa AM, editors, Database and Expert Systems Applications: Proceedings of the 24th International Conference on Database and Expert Systems Applications. Vol. 8056. Berlin. 2013. p. 174-188. (Lecture Notes in Computer Science). Available from, DOI: 10.1007/978-3-642-40173-2_16