Semiparametric regression in Stata
The boxplot is probably the most commonly used tool to represent the distribution of the data and identify atypical observations in a univariate dataset. The problem with the standard boxplot is that as soon as asymmetry or tail heaviness appears, the percentage of values identified as atypical becomes excessive. To cope with this issue, Hubert and Vandervieren (2008) proposed an adjusted boxplot for skewed data. Their idea is to move the whiskers of the boxplot according to the degree of asymmetry of the data. The rule to set the whiskers of the adjusted boxplot was found by running a large number of simulations using a wide range of (moderately) skewed distributions. The idea was to find a rule that guaranteed that 0.7% of the observations would lie outside the interval delimited by the whiskers. Even if their rule works satisfactorily for most commonly used distributions, it suffers from some limitations: (i) the adjusted boxplot is not appropriate for severely skewed distributions and for distributions with heavy tails; (ii) it is specifically related to a theoretical rejection rate of 0.7%; (iii) it is extremely sensitive to the estimated value of the asymmetry parameter; and (iv) it requires a substantial computational complexity, O(n \log n). To tackle these drawbacks, we propose a much simpler method to find the whiskers of the boxplot in case of (eventually) skewed and heavy-tailed data. We apply a simple rank-preserving transformation on the original data so that the transformed data can be adjusted by a so-called Tukey g-and-h distribution. Using the quantiles of this distribution, we can easily recover whiskers of the boxplot related to the original data. The computational complexity of the proposed method is O(n), the same as the standard boxplot.
Year of publication: |
2014-09-28
|
---|---|
Authors: | Verardi, Vincenzo |
Institutions: | Stata User Group |
Saved in:
freely available
Saved in favorites
Similar items by person
-
Verardi, Vincenzo, (2008)
-
Robust principal component analysis in Stata
Verardi, Vincenzo, (2009)
-
Verardi, Vincenzo, (2012)
- More ...