Sleep apnea detection via depth video and audio feature learning

Cheng Yang, Gene Cheung, Vladimir Stankovic, Kevin Chan, Nobutaka Ono

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Obstructive sleep apnea, characterized by repetitive obstruction in the upper airway during sleep, is a common sleep disorder that could significantly compromise sleep quality and quality of life in general. The obstructive respiratory events can be detected by attended in-laboratory or unattended ambulatory sleep studies. Such studies require many attachments to a patient’s body to track respiratory and physiological changes, which can be uncomfortable and compromise the patient’s sleep quality. In this paper, we propose to record depth video and audio of a patient using a Microsoft Kinect camera during his/her sleep, and extract relevant features to correlate with obstructive respiratory events scored manually by a scientific officer based on data collected by Philips system Alice6 LDxS that is commonly used in sleep clinics. Specifically, we first propose an alternating-frame video recording scheme, where different 8 of the 11 available bits in captured depth images are extracted at different instants for H.264 video encoding. At the decoder, the uncoded 3 bits in each frame can be recovered via block-based search. Next, we perform temporal denoising on the decoded depth video using a motion vector graph smoothness prior, so that undesirable flickering can be removed without blurring sharp edges. Given the denoised depth video, we track a patient’s chest and abdominal movements based on a dual-ellipse model. Finally, we extract ellipse model features via a wavelet packet transform (WPT), extract audio features via non-negative matrix factorization (NMF), and insert them as input to a classifier to detect respiratory events. Experimental results show first that our depth video compression scheme outperforms a competitor that records only the 8 most significant bits. Second, we show that our graph-based temporal denoising scheme reduces the flickering effect without over-smoothing. Third, we show that using our extracted depth video and audio features, our trained classifiers can deduce respiratory events scored manually based on data collected by system Alice6 LDxS with high accuracy.
LanguageEnglish
Pages822-835
Number of pages15
JournalIEEE Transactions on Multimedia
Volume19
Issue number4
Early online date9 Nov 2016
DOIs
Publication statusPublished - 1 Apr 2017

Fingerprint

Flickering
Classifiers
Video recording
Sleep
Image compression
Factorization
Cameras

Keywords

  • obstructive sleep apnea
  • sleep quality
  • obstructive respiratory event
  • sleep study
  • sleep monitoring

Cite this

Yang, Cheng ; Cheung, Gene ; Stankovic, Vladimir ; Chan, Kevin ; Ono, Nobutaka. / Sleep apnea detection via depth video and audio feature learning. In: IEEE Transactions on Multimedia. 2017 ; Vol. 19, No. 4. pp. 822-835.
@article{5f5738a7cd7a495ca1493e2f26d79ae5,
title = "Sleep apnea detection via depth video and audio feature learning",
abstract = "Obstructive sleep apnea, characterized by repetitive obstruction in the upper airway during sleep, is a common sleep disorder that could significantly compromise sleep quality and quality of life in general. The obstructive respiratory events can be detected by attended in-laboratory or unattended ambulatory sleep studies. Such studies require many attachments to a patient’s body to track respiratory and physiological changes, which can be uncomfortable and compromise the patient’s sleep quality. In this paper, we propose to record depth video and audio of a patient using a Microsoft Kinect camera during his/her sleep, and extract relevant features to correlate with obstructive respiratory events scored manually by a scientific officer based on data collected by Philips system Alice6 LDxS that is commonly used in sleep clinics. Specifically, we first propose an alternating-frame video recording scheme, where different 8 of the 11 available bits in captured depth images are extracted at different instants for H.264 video encoding. At the decoder, the uncoded 3 bits in each frame can be recovered via block-based search. Next, we perform temporal denoising on the decoded depth video using a motion vector graph smoothness prior, so that undesirable flickering can be removed without blurring sharp edges. Given the denoised depth video, we track a patient’s chest and abdominal movements based on a dual-ellipse model. Finally, we extract ellipse model features via a wavelet packet transform (WPT), extract audio features via non-negative matrix factorization (NMF), and insert them as input to a classifier to detect respiratory events. Experimental results show first that our depth video compression scheme outperforms a competitor that records only the 8 most significant bits. Second, we show that our graph-based temporal denoising scheme reduces the flickering effect without over-smoothing. Third, we show that using our extracted depth video and audio features, our trained classifiers can deduce respiratory events scored manually based on data collected by system Alice6 LDxS with high accuracy.",
keywords = "obstructive sleep apnea, sleep quality, obstructive respiratory event, sleep study, sleep monitoring",
author = "Cheng Yang and Gene Cheung and Vladimir Stankovic and Kevin Chan and Nobutaka Ono",
note = "(c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.",
year = "2017",
month = "4",
day = "1",
doi = "10.1109/TMM.2016.2626969",
language = "English",
volume = "19",
pages = "822--835",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
number = "4",

}

Sleep apnea detection via depth video and audio feature learning. / Yang, Cheng; Cheung, Gene; Stankovic, Vladimir; Chan, Kevin; Ono, Nobutaka.

In: IEEE Transactions on Multimedia, Vol. 19, No. 4, 01.04.2017, p. 822-835.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Sleep apnea detection via depth video and audio feature learning

AU - Yang, Cheng

AU - Cheung, Gene

AU - Stankovic, Vladimir

AU - Chan, Kevin

AU - Ono, Nobutaka

N1 - (c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.

PY - 2017/4/1

Y1 - 2017/4/1

N2 - Obstructive sleep apnea, characterized by repetitive obstruction in the upper airway during sleep, is a common sleep disorder that could significantly compromise sleep quality and quality of life in general. The obstructive respiratory events can be detected by attended in-laboratory or unattended ambulatory sleep studies. Such studies require many attachments to a patient’s body to track respiratory and physiological changes, which can be uncomfortable and compromise the patient’s sleep quality. In this paper, we propose to record depth video and audio of a patient using a Microsoft Kinect camera during his/her sleep, and extract relevant features to correlate with obstructive respiratory events scored manually by a scientific officer based on data collected by Philips system Alice6 LDxS that is commonly used in sleep clinics. Specifically, we first propose an alternating-frame video recording scheme, where different 8 of the 11 available bits in captured depth images are extracted at different instants for H.264 video encoding. At the decoder, the uncoded 3 bits in each frame can be recovered via block-based search. Next, we perform temporal denoising on the decoded depth video using a motion vector graph smoothness prior, so that undesirable flickering can be removed without blurring sharp edges. Given the denoised depth video, we track a patient’s chest and abdominal movements based on a dual-ellipse model. Finally, we extract ellipse model features via a wavelet packet transform (WPT), extract audio features via non-negative matrix factorization (NMF), and insert them as input to a classifier to detect respiratory events. Experimental results show first that our depth video compression scheme outperforms a competitor that records only the 8 most significant bits. Second, we show that our graph-based temporal denoising scheme reduces the flickering effect without over-smoothing. Third, we show that using our extracted depth video and audio features, our trained classifiers can deduce respiratory events scored manually based on data collected by system Alice6 LDxS with high accuracy.

AB - Obstructive sleep apnea, characterized by repetitive obstruction in the upper airway during sleep, is a common sleep disorder that could significantly compromise sleep quality and quality of life in general. The obstructive respiratory events can be detected by attended in-laboratory or unattended ambulatory sleep studies. Such studies require many attachments to a patient’s body to track respiratory and physiological changes, which can be uncomfortable and compromise the patient’s sleep quality. In this paper, we propose to record depth video and audio of a patient using a Microsoft Kinect camera during his/her sleep, and extract relevant features to correlate with obstructive respiratory events scored manually by a scientific officer based on data collected by Philips system Alice6 LDxS that is commonly used in sleep clinics. Specifically, we first propose an alternating-frame video recording scheme, where different 8 of the 11 available bits in captured depth images are extracted at different instants for H.264 video encoding. At the decoder, the uncoded 3 bits in each frame can be recovered via block-based search. Next, we perform temporal denoising on the decoded depth video using a motion vector graph smoothness prior, so that undesirable flickering can be removed without blurring sharp edges. Given the denoised depth video, we track a patient’s chest and abdominal movements based on a dual-ellipse model. Finally, we extract ellipse model features via a wavelet packet transform (WPT), extract audio features via non-negative matrix factorization (NMF), and insert them as input to a classifier to detect respiratory events. Experimental results show first that our depth video compression scheme outperforms a competitor that records only the 8 most significant bits. Second, we show that our graph-based temporal denoising scheme reduces the flickering effect without over-smoothing. Third, we show that using our extracted depth video and audio features, our trained classifiers can deduce respiratory events scored manually based on data collected by system Alice6 LDxS with high accuracy.

KW - obstructive sleep apnea

KW - sleep quality

KW - obstructive respiratory event

KW - sleep study

KW - sleep monitoring

UR - http://ieeexplore.ieee.org/document/7740082/

U2 - 10.1109/TMM.2016.2626969

DO - 10.1109/TMM.2016.2626969

M3 - Article

VL - 19

SP - 822

EP - 835

JO - IEEE Transactions on Multimedia

T2 - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 4

ER -