Annals of Occupational Hygiene Advance Access originally published online on September 22, 2004
Annals of Occupational Hygiene 2004 48(7):617-622; doi:10.1093/annhyg/meh071
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© British Occupational Hygiene Society Published by Oxford University Press;
The 4-Parameter Lognormal (SB) Model of Human Exposure
CB7431 Rosenau Hall, Department of Environmental Sciences and Engineering, School of Public Health, University of North Carolina, Chapel Hill, NC 27599-7431, USA
* Tel: +1-919-966-3473; fax: +1-919-966-7911; e-mail: mike-flynn{at}unc.edu
Received 25 October 2003; in final form 3 June 2004
| ABSTRACT |
|---|
|
|
|---|
This paper explores the 4-parameter lognormal distribution (or Johnson SB distribution) as a model for occupational exposures to airborne contaminants. This model can incorporate extreme values when they are known a priori, or alternatively, they can be estimated from the data. This additional flexibility may be of value in estimating background and/or maximum exposures, as well as improving the fitting process and subsequent estimation of mean exposures. In addition, the model is physically consistent with the definition of concentration and provides a basis for linking stochastic and deterministic exposure modeling approaches. There is some additional computational burden in estimating the mean and variance of exposure relative to the usual 2-parameter lognormal model.
Keywords: exposure modeling Johnson SB distribution 4-parameter lognormal distribution
| INTRODUCTION |
|---|
|
|
|---|
The estimation and control of human exposure to toxic airborne contaminants is an important problem in occupational hygiene and environmental risk assessment. Epidemiological investigations are often constrained by the lack of suitable exposure assessment, thus increasing the risk of misclassification, and subsequent attenuation of doseresponse relationships. Decisions regarding the selection of controls are also adversely impacted by inadequate exposure characterization, and by variability in performance of the selected interventions. In each case, the underlying problem is to model quantitatively the random variable of exposure.
A vast literature exists on the use of probability distribution functions to characterize random variables in general (Johnson and Kotz, 1970
) and airborne concentrations in particular (Georgopoulos and Seinfeld, 1982). The 2-parameter lognormal model seems especially useful for occupational exposure data (Rappaport, 1994
), and a theoretical justification has been attempted (Esmen and Hammad, 1977
). However, probability distribution functions with infinite, unbounded tails present a problem as models for concentration (exposure) data, since concentrations are always constrained to lie within a normalized range of 01. There is always a maximum and minimum value possible for the concentration of any substance. Values for these extremes can be determined, a priori, from theory and information on the process (Flynn, in press
). While this approach defines the absolute maximum and minimum possible, better estimates of the probable extremes may be made from the data and additional physical considerations.
As exposure models, bounded probability distributions have the advantage of being better able to incorporate the physics of the problem into the model development. This quasi-deterministic approach provides a method for linking exposure distributions to the parameters that govern it in a way that can help improve control decisions, and exposure estimates when measurements are unavailable. A recent paper presents a stochastic form of the dilution ventilation equation as a simple exposure model (Flynn, 2004
). The solution for this equation, obtained with a closure consistent with concentration being constrained between a minimum and a maximum, is a beta distribution. Like the beta distribution, the 4-parameter lognormal is also bounded. It provides flexibility for fitting data, and has some of the attractive features associated with normal distributions. Estimation and inference are a bit more involved with this model, but computers help minimize this drawback. This distribution, one in the Johnson system (Johnson, 1949
), is often called the SB distribution, indicating it as the bounded member of this family.
| THEORY AND BACKGROUND |
|---|
|
|
|---|
Human exposure to an airborne contaminant is traditionally defined as the time-weighted average concentration over a specified interval. The concentration (c) of a gas or vapor is defined as the volume of pure contaminant (Vc) contained within a given volume of the mixture of pure air (Va) and contaminant:
![]() | (1) |
![]() | (2) |
The definitive study of the 4-parameter lognormal distribution was the seminal work of Johnson (1949)
who explored a series of transformations for bounded (SB) and unbounded (SU) distributions. He considered transformations on a random variable, c, of the form:
![]() | (3) |
and
+
are the minimum and maximum values for c, respectively. Within the context of exposure, y is a normalized concentration. A unit normal variable z can be defined as
![]() | (4) |
![]() | (5) |
and
are
![]() | (6) |
is the sample mean and
.
If the logit of the normalized concentration (y) follows a normal distribution, then concentration is distributed as a 4-parameter lognormal or (SB) variate. It is the logarithm of the ratio of contaminant volume to air volume that is normally distributed in the 4-parameter model, not the logarithm of concentration, as is the case with the 2-parameter model. This is consistent with the physics of the problem since the ratio can assume any positive value, but concentration must be less than some maximum. At low values of concentration relative to the maximum and when the minimum concentration is zero, we have
![]() | (7) |
Exposure results from contaminant generation and subsequent transport to the breathing zone by the air flowfield. This process is governed by nonlinear differential equations and is subject to many uncertainties and random events. As noted above, one method of constructing a model for exposure relies on treating it as a random process, governed by a stochastic differential equation. An alternative stochastic differential equation for concentration (exposure) to the one presented in Flynn (2004
) is
![]() | (8) |
| ESTIMATION |
|---|
|
|
|---|
The SB distribution has been applied widely in forestry work, where it is used as a diameter distribution model for growth-and-yield forecasts (Parresol, 2003
As noted above, when the extremes are known a priori, estimates for the mean and variance are straightforward. However, when estimating one or more extremes from the data, the method of maximum likelihood is not recommended (see e.g., Lambert, 1970
; Vroon, 1981
). Tsionas (2001)
has made progress with Bayesian methods for estimation problems with this distribution, but difficulties still exist, in particular, when both extremes are estimated from the data. Although there will be exposure assessments where it is desirable to estimate both minimum and maximum values, the examples used here are restricted to cases where the a priori minimum is zero, and the maximum is either known a priori or estimated from the data.
Using the method of percentiles to estimate the maximum concentration when the lower limit is known a priori requires determination of the median (C0), lower (C1) and upper (C2) percentile points from the data. The upper and lower percentile values are symmetric about the median (e.g. if the upper value is the 99th percentile, then the lower value is the first percentile; (Johnson, 1949
). The estimate for the maximum when the minimum is set at zero is:
![]() | (9) |
In the application illustrated below, it was found that by using the 99th and 1st percentiles as C2 and C1, this latter problem was avoided. Siekierski (1992)
gives some additional methods of stabilizing the estimation of extreme values, including adding or subtracting arbitrary, but plausible, amounts from the sample extremes. Although this latter method is not employed here, it was found that for small sample sizes, this approach was often more stable than the percentile approach. The most promising method to avoid these estimation problems appears to be the Bayesian approach of Tsionas (2001)
.
As with the 2-parameter lognormal model, estimation and inference for the 4-parameter model are complicated by dependence of the exposure mean on both the mean and the variance of the logarithms of the transformed data. Invariably, the question arises as to confidence intervals for the mean exposure. Although several methods are potential candidates, parametric bootstraps are used here to estimate
95% confidence intervals for the mean exposure. This procedure involves estimating the population parameters from the initial sample and then subsequently taking random, bootstrap samples from this fitted parametric distribution. When the maximum is estimated from the data, the bootstrap technique employed here re-estimates a new maximum for each bootstrap sample using equation (9). When a maximum value is assigned a priori, the same maximum is kept for all bootstrap samples. The mean and standard deviation of exposure are estimated using the equations in the Appendix and the maximum value for the bootstrap sample. In this way a sampling distribution for the mean is generated. The 95% confidence interval for the mean is determined by selecting the 0.975 and 0.025 quantiles of this re-sampling distribution of the mean.
| APPLICATIONS |
|---|
|
|
|---|
This section presents application of the 4-parameter lognormal model using three different sets of exposure measurements. The first is a series of 33 benzene exposures from a group of petrochemical workers (Tolentino et al., 2003
For each data set, summary statistics are calculated and the ShapiroWilk test for normality is used to evaluate the fit of four different statistical distributions. The ShapiroWilk test is performed on the appropriate data transform to normality, i.e. in the case of the 4-parameter model, the data tested consisted of the ln(y) values, not the raw concentration data. The distributions tested are: the 2-parameter lognormal model; the 4-parameter lognormal with a priori maximum; the 4-parameter with maximum estimated from the data; and finally the normal distribution. In fitting the 4-parameter models, the a priori maximums are the saturation vapor concentrations (98 684 ppm for benzene, 28 026 mg/m3 for styrene and 43 421 ppm for IPA).
The data and results of the fit tests summarized in Table 1 suggest that the 4-parameter lognormal model with estimated maximum is a good fit for all three data sets. The normal distribution fits two of the three data sets well (IPA and styrene), while the 2-parameter lognormal and the 4-parameter lognormal model with a priori maximum fit only the benzene data well. They do provide comparable fits to one another for all three sets of exposures. When data do not fit the 2-parameter lognormal model particularly well, the added flexibility of the 4-parameter model may be useful. Figures 1 and 2 illustrate the cumulative distributions for the IPA data with the associated 2- and 4-parameter a priori fits. However, when the maximum concentration for this operation is estimated from the data, a much better fit to the 4-parameter model is observed (see Fig. 3). There was a bit of autocorrelation in the IPA sample, but it is ignored here for illustrative purposes.
|
|
|
|
Further analysis was performed on the IPA data. For the 4-parameter models point estimates for the means and standard deviations of exposure were calculated using the equations in the AppendixPart I. Estimates for
and
used in these equations were obtained with equation (6) using the sample data after normalization with the maximum value. This was done for both cases, i.e. a priori estimates of the maximum and when the maximum is estimated from the data. It is acknowledged that only in the former case is the desirable maximum likelihood property preserved for the estimates. Concentration (exposure) is recovered by multiplying the mean or standard deviation of y by the maximum value. In the case of the 2-parameter lognormal model, the minimum variance unbiased estimators (MVUEs) for the mean and variance were obtained by using Finney's (1941)
Confidence intervals for the 4-parameter models were determined using a parametric bootstrap by re-sampling the corresponding fitted distribution with 100 000 samples of size n = 105. There was virtually no difference between results obtained using 200 000 samples. Table 2 presents the summary data from this exercise. The confidence interval reported for the normal distribution is the usual one that is based on the t-distribution. In the case of the 2-parameter lognormal model, the confidence limits are calculated using a table (Armstrong, 1992
) based on Land's exact method.
|
Table 2 and Figs 1
| CONCLUDING REMARKS |
|---|
|
|
|---|
The 4-parameter lognormal distribution is a physically consistent representation of exposure that can incorporate the minimum and maximum values that bound it. It should provide a fit comparable to that of the 2-parameter model for most exposure data when the a priori maximum is large relative to the sample values. The additional flexibility to estimate extremes may prove valuable in estimating background concentrations, maximum likely exposures and mean exposures. This estimation procedure can be informed by knowledge of the process and chemical and by any existing data. Compliance sampling, or worst case sampling, may be particularly useful here in estimating maximums. Estimation of the mean and variance is more involved here than with the 2-parameter lognormal model, but can be readily programmed or implemented in a spreadsheet.
A decision to use the 4-parameter lognormal model versus an unbounded distribution like the 2-parameter, for inference, involves a trade-off. In the latter case, one accepts finite probabilities of impossible exposures and hopes these errors produce negligible results on the subsequent inference. In the former case, one hopes that the errors in estimating the probable extremes do not unduly influence the results. The major difficulty in using the 4-parameter lognormal model observed here was in estimating the extremes from the data and assuring reasonable estimates. Further research is needed to explore this issue and the use of the Bayesian method appears promising here.
| APPENDIX |
|---|
|
|
|---|
Part I: 4-parameter lognormal equations
The mean, of y is (Johnson, 1949
![]() | (A1) |
![]() | (A2) |
![]() | (A3) |
![]() | (A4) |
![]() | (A5) |
![]() | (A6) |
is the second moment of y about the origin and
![]() | (A7) |
![]() | (A8) |
![]() | (A9) |
![]() | (A10) |
![]() | (A11) |
Part II: Finney's equations
The MVUEs for the mean (m) and variance (v) from a sample of size (n) according to Finney (1941)
are
![]() | (A12) |
![]() | (A13) |
![]() | (A14) |
and s2 are the sample mean and variance of the natural logarithms of concentration, respectively. The complex Bessel function (equation A14) is evaluated with the IMSL subroutine (DCBJS). This eliminates the need to monitor convergence for the infinite series form that is often used for Finney's method. | ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The author wishes to thank Dr Norman L. Johnson and the reviewers for their extremely helpful comments.
| REFERENCES |
|---|
|
|
|---|
Armstrong BG. (1992) Confidence intervals for arithmetic means of lognormally distributed exposures. Am Ind Hyg Assoc J; 53(8): 4815.
Bukac J. (1972) Fitting Sb curves using symmetrical percentile points. Biometrika; 59: 68890.
Esmen NA, Hammad YY. (1977) Log-normality of environmental sampling data. J Environ Sci Health; A12(1&2): 2941.
Finney DJ. (1941) On the distribution of a variate whose logarithm is normally distributed. J R Stat Soc Suppl; 7(2): 15561.[CrossRef]
Flynn MR. (in press) The beta distributiona physically consistent model for human exposure to airborne contaminants. Stochastic Environ Res Risk Assess.
Flynn MR. (2004) A stochastic differential equation for exposure yields a beta distribution. Ann Occup Hyg; 48: 4917.
George DK, Flynn MR, Harris RH. (1995) Autocorrelation of interday exposures at an automobile assembly plant. Am Ind Hyg Assoc J; 56: 118794.
Georgopoulos PG, Seinfeld JH. (1982) Statistical distributions of air pollutant concentrations. Environ Sci Technol; 16(7): 401A416A.
Johnson NL. (1949) Systems of frequency curves generated by methods of translation. Biometrika; 36: 14976.
Johnson NL, Kotz S. (1970) Continuous univariate distributions2. Boston, MA: Houghton Mifflin Company.
Kiester AR, Barakat R. (1974) Exact solutions to certain stochastic differential equation models of population growth. Theor Popul Biol; 6: 199216.[CrossRef][Web of Science][Medline]
Lambert JA. (1970). Estimation of parameters in the four-parameter lognormal distribution. Aust J Stat; 12(1): 3343.
Lefante JJ, Shah, AK. (2002) Robustness properties of lognormal confidence intervals for lognormal and gamma distributed data. Commun Stat Theory Meth; 31(11): 193957.[CrossRef]
Mage, DT. (1980) An explicit solution for SB parameters using four percentile points. Technometrics; 22(2): 24751.[CrossRef]
Parresol BR. (2003) Recovering parameters of Johnson's SB distribution. U.S. Department of Agriculture, Forest Service, Southern Research Station, Paper SRS-31.
Rappaport SM. (1994) Interpreting levels of exposures to chemical agents. In Harris, RL, Cralley, LJ and Cralley, LV, editors. Patty's industrial hygiene and toxicology. Vol. III. Part A, Chapter 8, pp. 395. New York: John Wiley & Sons.
Siekierski K. (1992) Comparison and evaluation of three methods of estimation of the johnson Sb distribution. Biom J; 34(7): 87995.
Slifker JF, Shapiro SS. (1980) The Johnson system: Selection and parameter estimation. Technometrics; 22(2): 23946.[CrossRef]
Tolentino D, Zenari E, Dall'Olio M et al. (2003) Application of statistical models to estimate the correlation between urinary benzene as a biological indicator of exposure and air concentrations determined by personal monitoring. Am Ind Hyg Assoc J; 64: 6259.
Tuckwell HC. (1974) A study of some diffusion models of population growth. Theor Popul Biol; 6: 199216.[CrossRef][Web of Science][Medline]
Tsionas EG. (2001) Likelihood and posterior shapes in Johnson's SB system. Sankhya. Ser B; 63(1): 39.
Vroon WJ. (1981) A class of variate transformation causing unbounded likelihood. J Austral Stat; 76(375): 70912.
This article has been cited by other articles:
![]() |
M. R. Flynn Analysis of Exposure Biomarker Relationships with the Johnson SBB Distribution Ann. Hyg., August 9, 2007; (2007) mem033v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


























