Heavy-tailed distributions: data, diagnostics, and new developments

Roger M Cooke, Daan Nieboer

Research output: Working paperDiscussion paper

Abstract

This monograph is written for the numerate nonspecialist, and hopes to serve three purposes.
First it gathers mathematical material from diverse but related fields of order statistics, records,
extreme value theory, majorization, regular variation and subexponentiality. All of these are
relevant for understanding fat tails, but they are not, to our knowledge, brought together in
a single source for the target readership. Proofs that give insight are included, but for fussy
calculations the reader is referred to the excellent sources referenced in the text. Multivariate
extremes are not treated. This allows us to present material spread over hundreds of pages in
specialist texts in twenty pages. Chapter 5 develops new material on heavy tail diagnostics and
gives more mathematical detail.
Second, it presents a new measure of obesity. The most popular definitions in terms of
regular variation and subexponentiality invoke putative properties that hold at infinity, and this
complicates any empirical estimate. Each definition captures some but not all of the intuitions
associated with tail heaviness. Chapter 5 studies two candidate indices of tail heaviness based
on the tendency of the mean excess plot to collapse as data are aggregated. The probability that
the largest value is more than twice the second largest has intuitive appeal but its estimator has
very poor accuracy. The Obesity index is defined for a positive random variable X as:
Ob(X) = P (X1 + X4 > X2 + X3|X1 ≤ X2 ≤ X3 ≤ X4), Xi
independent copies of X.
For empirical distributions, obesity is defined by bootstrapping. This index reasonably captures
intuitions of tail heaviness. Among its properties, if α > 1 then Ob(X) < Ob(X). However,
it does not completely mimic the tail index of regularly varying distributions, or the extreme
value index. A Weibull distribution with shape 1/4 is more obese than a Pareto distribution
with tail index 1, even though this Pareto has infinite mean and the Weibull’s moments are all
finite. Chapter 5 explores properties of the Obesity index.
Third and most important, we hope to convince the reader that fat tail phenomena pose
real problems; they are really out there and they seriously challenge our usual ways of thinking
about historical averages, outliers, trends, regression coefficients and confidence bounds among
many other things. Data on flood insurance claims, crop loss claims, hospital discharge bills,
precipitation and damages and fatalities from natural catastrophes drive this point home.
LanguageEnglish
Place of PublicationWashington
Number of pages65
Publication statusPublished - 2011

Fingerprint

Diagnostics
Heavy-tailed distribution
Obesity
Fat tails
Tail index
Pareto
Coefficients
Weibull distribution
Heavy tails
Insurance
Extreme value theory
Order statistics
Natural catastrophes
Outliers
Confidence
Majorization
Damage
Empirical distribution
Estimator
Bootstrapping

Keywords

  • heavy-tailed distributions
  • data
  • new developments
  • diagnostics

Cite this

Cooke, R. M., & Nieboer, D. (2011). Heavy-tailed distributions: data, diagnostics, and new developments. Washington.
Cooke, Roger M ; Nieboer, Daan. / Heavy-tailed distributions : data, diagnostics, and new developments. Washington, 2011.
@techreport{f4e9daedbf4f4645a75ec850b01ff160,
title = "Heavy-tailed distributions: data, diagnostics, and new developments",
abstract = "This monograph is written for the numerate nonspecialist, and hopes to serve three purposes.First it gathers mathematical material from diverse but related fields of order statistics, records,extreme value theory, majorization, regular variation and subexponentiality. All of these arerelevant for understanding fat tails, but they are not, to our knowledge, brought together ina single source for the target readership. Proofs that give insight are included, but for fussycalculations the reader is referred to the excellent sources referenced in the text. Multivariateextremes are not treated. This allows us to present material spread over hundreds of pages inspecialist texts in twenty pages. Chapter 5 develops new material on heavy tail diagnostics andgives more mathematical detail.Second, it presents a new measure of obesity. The most popular definitions in terms ofregular variation and subexponentiality invoke putative properties that hold at infinity, and thiscomplicates any empirical estimate. Each definition captures some but not all of the intuitionsassociated with tail heaviness. Chapter 5 studies two candidate indices of tail heaviness basedon the tendency of the mean excess plot to collapse as data are aggregated. The probability thatthe largest value is more than twice the second largest has intuitive appeal but its estimator hasvery poor accuracy. The Obesity index is defined for a positive random variable X as:Ob(X) = P (X1 + X4 > X2 + X3|X1 ≤ X2 ≤ X3 ≤ X4), Xiindependent copies of X.For empirical distributions, obesity is defined by bootstrapping. This index reasonably capturesintuitions of tail heaviness. Among its properties, if α > 1 then Ob(X) < Ob(X). However,it does not completely mimic the tail index of regularly varying distributions, or the extremevalue index. A Weibull distribution with shape 1/4 is more obese than a Pareto distributionwith tail index 1, even though this Pareto has infinite mean and the Weibull’s moments are allfinite. Chapter 5 explores properties of the Obesity index.Third and most important, we hope to convince the reader that fat tail phenomena posereal problems; they are really out there and they seriously challenge our usual ways of thinkingabout historical averages, outliers, trends, regression coefficients and confidence bounds amongmany other things. Data on flood insurance claims, crop loss claims, hospital discharge bills,precipitation and damages and fatalities from natural catastrophes drive this point home.",
keywords = "heavy-tailed distributions, data , new developments, diagnostics",
author = "Cooke, {Roger M} and Daan Nieboer",
year = "2011",
language = "English",
type = "WorkingPaper",

}

Heavy-tailed distributions : data, diagnostics, and new developments. / Cooke, Roger M; Nieboer, Daan.

Washington, 2011.

Research output: Working paperDiscussion paper

TY - UNPB

T1 - Heavy-tailed distributions

T2 - data, diagnostics, and new developments

AU - Cooke, Roger M

AU - Nieboer, Daan

PY - 2011

Y1 - 2011

N2 - This monograph is written for the numerate nonspecialist, and hopes to serve three purposes.First it gathers mathematical material from diverse but related fields of order statistics, records,extreme value theory, majorization, regular variation and subexponentiality. All of these arerelevant for understanding fat tails, but they are not, to our knowledge, brought together ina single source for the target readership. Proofs that give insight are included, but for fussycalculations the reader is referred to the excellent sources referenced in the text. Multivariateextremes are not treated. This allows us to present material spread over hundreds of pages inspecialist texts in twenty pages. Chapter 5 develops new material on heavy tail diagnostics andgives more mathematical detail.Second, it presents a new measure of obesity. The most popular definitions in terms ofregular variation and subexponentiality invoke putative properties that hold at infinity, and thiscomplicates any empirical estimate. Each definition captures some but not all of the intuitionsassociated with tail heaviness. Chapter 5 studies two candidate indices of tail heaviness basedon the tendency of the mean excess plot to collapse as data are aggregated. The probability thatthe largest value is more than twice the second largest has intuitive appeal but its estimator hasvery poor accuracy. The Obesity index is defined for a positive random variable X as:Ob(X) = P (X1 + X4 > X2 + X3|X1 ≤ X2 ≤ X3 ≤ X4), Xiindependent copies of X.For empirical distributions, obesity is defined by bootstrapping. This index reasonably capturesintuitions of tail heaviness. Among its properties, if α > 1 then Ob(X) < Ob(X). However,it does not completely mimic the tail index of regularly varying distributions, or the extremevalue index. A Weibull distribution with shape 1/4 is more obese than a Pareto distributionwith tail index 1, even though this Pareto has infinite mean and the Weibull’s moments are allfinite. Chapter 5 explores properties of the Obesity index.Third and most important, we hope to convince the reader that fat tail phenomena posereal problems; they are really out there and they seriously challenge our usual ways of thinkingabout historical averages, outliers, trends, regression coefficients and confidence bounds amongmany other things. Data on flood insurance claims, crop loss claims, hospital discharge bills,precipitation and damages and fatalities from natural catastrophes drive this point home.

AB - This monograph is written for the numerate nonspecialist, and hopes to serve three purposes.First it gathers mathematical material from diverse but related fields of order statistics, records,extreme value theory, majorization, regular variation and subexponentiality. All of these arerelevant for understanding fat tails, but they are not, to our knowledge, brought together ina single source for the target readership. Proofs that give insight are included, but for fussycalculations the reader is referred to the excellent sources referenced in the text. Multivariateextremes are not treated. This allows us to present material spread over hundreds of pages inspecialist texts in twenty pages. Chapter 5 develops new material on heavy tail diagnostics andgives more mathematical detail.Second, it presents a new measure of obesity. The most popular definitions in terms ofregular variation and subexponentiality invoke putative properties that hold at infinity, and thiscomplicates any empirical estimate. Each definition captures some but not all of the intuitionsassociated with tail heaviness. Chapter 5 studies two candidate indices of tail heaviness basedon the tendency of the mean excess plot to collapse as data are aggregated. The probability thatthe largest value is more than twice the second largest has intuitive appeal but its estimator hasvery poor accuracy. The Obesity index is defined for a positive random variable X as:Ob(X) = P (X1 + X4 > X2 + X3|X1 ≤ X2 ≤ X3 ≤ X4), Xiindependent copies of X.For empirical distributions, obesity is defined by bootstrapping. This index reasonably capturesintuitions of tail heaviness. Among its properties, if α > 1 then Ob(X) < Ob(X). However,it does not completely mimic the tail index of regularly varying distributions, or the extremevalue index. A Weibull distribution with shape 1/4 is more obese than a Pareto distributionwith tail index 1, even though this Pareto has infinite mean and the Weibull’s moments are allfinite. Chapter 5 explores properties of the Obesity index.Third and most important, we hope to convince the reader that fat tail phenomena posereal problems; they are really out there and they seriously challenge our usual ways of thinkingabout historical averages, outliers, trends, regression coefficients and confidence bounds amongmany other things. Data on flood insurance claims, crop loss claims, hospital discharge bills,precipitation and damages and fatalities from natural catastrophes drive this point home.

KW - heavy-tailed distributions

KW - data

KW - new developments

KW - diagnostics

UR - http://www.rff.org/documents/RFF-DP-11-19.pdf

M3 - Discussion paper

BT - Heavy-tailed distributions

CY - Washington

ER -