Skip Navigation


Annals of Occupational Hygiene Advance Access originally published online on September 22, 2004
Annals of Occupational Hygiene 2004 48(7):617-622; doi:10.1093/annhyg/meh071
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
48/7/617    most recent
meh071v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by FLYNN, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by FLYNN, M. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?


© British Occupational Hygiene Society Published by Oxford University Press;

The 4-Parameter Lognormal (SB) Model of Human Exposure

MICHAEL R. FLYNN*

CB7431 Rosenau Hall, Department of Environmental Sciences and Engineering, School of Public Health, University of North Carolina, Chapel Hill, NC 27599-7431, USA

* Tel: +1-919-966-3473; fax: +1-919-966-7911; e-mail: mike-flynn{at}unc.edu

Received 25 October 2003; in final form 3 June 2004


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
This paper explores the 4-parameter lognormal distribution (or Johnson SB distribution) as a model for occupational exposures to airborne contaminants. This model can incorporate extreme values when they are known a priori, or alternatively, they can be estimated from the data. This additional flexibility may be of value in estimating background and/or maximum exposures, as well as improving the fitting process and subsequent estimation of mean exposures. In addition, the model is physically consistent with the definition of concentration and provides a basis for linking stochastic and deterministic exposure modeling approaches. There is some additional computational burden in estimating the mean and variance of exposure relative to the usual 2-parameter lognormal model.

Keywords: exposure modeling • Johnson SB distribution • 4-parameter lognormal distribution


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
The estimation and control of human exposure to toxic airborne contaminants is an important problem in occupational hygiene and environmental risk assessment. Epidemiological investigations are often constrained by the lack of suitable exposure assessment, thus increasing the risk of misclassification, and subsequent attenuation of dose–response relationships. Decisions regarding the selection of controls are also adversely impacted by inadequate exposure characterization, and by variability in performance of the selected interventions. In each case, the underlying problem is to model quantitatively the random variable of exposure.

A vast literature exists on the use of probability distribution functions to characterize random variables in general (Johnson and Kotz, 1970Go) and airborne concentrations in particular (Georgopoulos and Seinfeld, 1982). The 2-parameter lognormal model seems especially useful for occupational exposure data (Rappaport, 1994Go), and a theoretical justification has been attempted (Esmen and Hammad, 1977Go). However, probability distribution functions with infinite, unbounded tails present a problem as models for concentration (exposure) data, since concentrations are always constrained to lie within a normalized range of 0–1. There is always a maximum and minimum value possible for the concentration of any substance. Values for these extremes can be determined, a priori, from theory and information on the process (Flynn, in pressGo). While this approach defines the absolute maximum and minimum possible, better estimates of the probable extremes may be made from the data and additional physical considerations.

As exposure models, bounded probability distributions have the advantage of being better able to incorporate the physics of the problem into the model development. This quasi-deterministic approach provides a method for linking exposure distributions to the parameters that govern it in a way that can help improve control decisions, and exposure estimates when measurements are unavailable. A recent paper presents a stochastic form of the dilution ventilation equation as a simple exposure model (Flynn, 2004Go). The solution for this equation, obtained with a closure consistent with concentration being constrained between a minimum and a maximum, is a beta distribution. Like the beta distribution, the 4-parameter lognormal is also bounded. It provides flexibility for fitting data, and has some of the attractive features associated with normal distributions. Estimation and inference are a bit more involved with this model, but computers help minimize this drawback. This distribution, one in the Johnson system (Johnson, 1949Go), is often called the SB distribution, indicating it as the bounded member of this family.


    THEORY AND BACKGROUND
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
Human exposure to an airborne contaminant is traditionally defined as the time-weighted average concentration over a specified interval. The concentration (c) of a gas or vapor is defined as the volume of pure contaminant (Vc) contained within a given volume of the mixture of pure air (Va) and contaminant:

(1)
Equivalently,

(2)
A similar volume fraction for particles and/or mixed phases is also possible (see e.g. Flynn, 2004Go). From equation (1), we note that concentration is restricted to the range [0, 1] and maximum values may be considerably lower (e.g. saturation vapor pressure limitations).

The definitive study of the 4-parameter lognormal distribution was the seminal work of Johnson (1949)Go who explored a series of transformations for bounded (SB) and unbounded (SU) distributions. He considered transformations on a random variable, c, of the form:

(3)
where {xi} and {xi} + {lambda} are the minimum and maximum values for c, respectively. Within the context of exposure, y is a normalized concentration. A unit normal variable z can be defined as

(4)
The mean and variance of ln[y/(1 – y)] are related directly by

(5)
The mean and variance of y are a bit more involved and are given by equations (A1) and (A8) in the Appendix. When extreme values are known a priori, maximum likelihood estimates for {gamma} and {delta} are

(6)
where f = ln[y/(1 y)], is the sample mean and .

If the logit of the normalized concentration (y) follows a normal distribution, then concentration is distributed as a 4-parameter lognormal or (SB) variate. It is the logarithm of the ratio of contaminant volume to air volume that is normally distributed in the 4-parameter model, not the logarithm of concentration, as is the case with the 2-parameter model. This is consistent with the physics of the problem since the ratio can assume any positive value, but concentration must be less than some maximum. At low values of concentration relative to the maximum and when the minimum concentration is zero, we have

(7)
In this case, which is often approximately true for occupational exposures, one would expect 2- and 4-parameter lognormal models to provide very comparable fits for the same data.

Exposure results from contaminant generation and subsequent transport to the breathing zone by the air flowfield. This process is governed by nonlinear differential equations and is subject to many uncertainties and random events. As noted above, one method of constructing a model for exposure relies on treating it as a random process, governed by a stochastic differential equation. An alternative stochastic differential equation for concentration (exposure) to the one presented in Flynn (2004Go) is

(8)
Here r(t) is a Gaussian process with nonzero mean, and concentration has been normalized by the maximum. This is the logistic growth equation (Perls–Verhulst equation), which has been studied in the context of population growth. The solution is a 4-parameter lognormal distribution with time-dependent mean and variance (Kiester and Barakat, 1974Go; Tuckwell, 1974Go). A steady-state distribution is apparently unattainable for this formulation of the problem. Further research is needed to specify the explicit form of r(t) relevant to a given exposure scenario.


    ESTIMATION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
The SB distribution has been applied widely in forestry work, where it is used as a diameter distribution model for growth-and-yield forecasts (Parresol, 2003Go). Within this context, a comparison of point estimation methods was done by Siekierski (1992)Go. He concluded that the method of maximum likelihood could produce ‘preposterous estimates’ of the extremes and recommended a method based on moments. Mage (1980)Go appears to have been the first to investigate its application to environmental airborne concentration measurements (carbon monoxide). He used an approach for parameter estimation related to work by Bukac (1972)Go, and similar to Slifker and Shapiro (1980)Go.

As noted above, when the extremes are known a priori, estimates for the mean and variance are straightforward. However, when estimating one or more extremes from the data, the method of maximum likelihood is not recommended (see e.g., Lambert, 1970Go; Vroon, 1981Go). Tsionas (2001)Go has made progress with Bayesian methods for estimation problems with this distribution, but difficulties still exist, in particular, when both extremes are estimated from the data. Although there will be exposure assessments where it is desirable to estimate both minimum and maximum values, the examples used here are restricted to cases where the a priori minimum is zero, and the maximum is either known a priori or estimated from the data.

Using the method of percentiles to estimate the maximum concentration when the lower limit is known a priori requires determination of the median (C0), lower (C1) and upper (C2) percentile points from the data. The upper and lower percentile values are symmetric about the median (e.g. if the upper value is the 99th percentile, then the lower value is the first percentile; (Johnson, 1949Go). The estimate for the maximum when the minimum is set at zero is:

(9)
This methodology will result in different estimates for the maximum value, depending upon which percentiles are selected, and in some cases, the maximum determined in this way may be less than the largest observed sample value.

In the application illustrated below, it was found that by using the 99th and 1st percentiles as C2 and C1, this latter problem was avoided. Siekierski (1992)Go gives some additional methods of stabilizing the estimation of extreme values, including adding or subtracting arbitrary, but plausible, amounts from the sample extremes. Although this latter method is not employed here, it was found that for small sample sizes, this approach was often more stable than the percentile approach. The most promising method to avoid these estimation problems appears to be the Bayesian approach of Tsionas (2001)Go.

As with the 2-parameter lognormal model, estimation and inference for the 4-parameter model are complicated by dependence of the exposure mean on both the mean and the variance of the logarithms of the transformed data. Invariably, the question arises as to confidence intervals for the mean exposure. Although several methods are potential candidates, parametric bootstraps are used here to estimate ~95% confidence intervals for the mean exposure. This procedure involves estimating the population parameters from the initial sample and then subsequently taking random, bootstrap samples from this fitted parametric distribution. When the maximum is estimated from the data, the bootstrap technique employed here re-estimates a new maximum for each bootstrap sample using equation (9). When a maximum value is assigned a priori, the same maximum is kept for all bootstrap samples. The mean and standard deviation of exposure are estimated using the equations in the Appendix and the maximum value for the bootstrap sample. In this way a sampling distribution for the mean is generated. The 95% confidence interval for the mean is determined by selecting the 0.975 and 0.025 quantiles of this re-sampling distribution of the mean.


    APPLICATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
This section presents application of the 4-parameter lognormal model using three different sets of exposure measurements. The first is a series of 33 benzene exposures from a group of petrochemical workers (Tolentino et al., 2003Go). The second data set is taken from Rappaport (1994)Go and consists of mean styrene exposures for 18 workers with each mean value based upon three replicates per individual. The final set contains 105 exposure measurements of isopropyl alcohol (IPA) taken on a single worker performing a manual wipe-down operation of automobile bodies prior to priming at an assembly plant (George et al., 1995Go).

For each data set, summary statistics are calculated and the Shapiro–Wilk test for normality is used to evaluate the fit of four different statistical distributions. The Shapiro–Wilk test is performed on the appropriate data transform to normality, i.e. in the case of the 4-parameter model, the data tested consisted of the ln(y) values, not the raw concentration data. The distributions tested are: the 2-parameter lognormal model; the 4-parameter lognormal with a priori maximum; the 4-parameter with maximum estimated from the data; and finally the normal distribution. In fitting the 4-parameter models, the a priori maximums are the saturation vapor concentrations (98 684 ppm for benzene, 28 026 mg/m3 for styrene and 43 421 ppm for IPA).

The data and results of the fit tests summarized in Table 1 suggest that the 4-parameter lognormal model with estimated maximum is a good fit for all three data sets. The normal distribution fits two of the three data sets well (IPA and styrene), while the 2-parameter lognormal and the 4-parameter lognormal model with a priori maximum fit only the benzene data well. They do provide comparable fits to one another for all three sets of exposures. When data do not fit the 2-parameter lognormal model particularly well, the added flexibility of the 4-parameter model may be useful. Figures 1 and 2 illustrate the cumulative distributions for the IPA data with the associated 2- and 4-parameter a priori fits. However, when the maximum concentration for this operation is estimated from the data, a much better fit to the 4-parameter model is observed (see Fig. 3). There was a bit of autocorrelation in the IPA sample, but it is ignored here for illustrative purposes.


View this table:
[in this window]
[in a new window]
 
Table 1. Summary statistics for the benzene and styrene exposure data

 


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1. Comparison of cumulative distributions for IPA data, raw data and fitted 2-parameter lognormal distribution.

 


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2. Comparison of cumulative distributions for IPA data, raw data and fitted 4-parameter lognormal distribution with a priori values of the minimum (0) and maximum (43 421 ppm).

 


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3. Comparison of cumulative distributions for IPA data, raw data and fitted 4-parameter lognormal distribution with a priori values of the minimum (0) and an estimated maximum (124.9 ppm).

 
Further analysis was performed on the IPA data. For the 4-parameter models point estimates for the means and standard deviations of exposure were calculated using the equations in the Appendix—Part I. Estimates for {gamma} and {delta} used in these equations were obtained with equation (6) using the sample data after normalization with the maximum value. This was done for both cases, i.e. a priori estimates of the maximum and when the maximum is estimated from the data. It is acknowledged that only in the former case is the desirable maximum likelihood property preserved for the estimates. Concentration (exposure) is recovered by multiplying the mean or standard deviation of y by the maximum value. In the case of the 2-parameter lognormal model, the minimum variance unbiased estimators (MVUEs) for the mean and variance were obtained by using Finney's (1941)Go approach (details in the Appendix—Part II).

Confidence intervals for the 4-parameter models were determined using a parametric bootstrap by re-sampling the corresponding fitted distribution with 100 000 samples of size n = 105. There was virtually no difference between results obtained using 200 000 samples. Table 2 presents the summary data from this exercise. The confidence interval reported for the normal distribution is the usual one that is based on the t-distribution. In the case of the 2-parameter lognormal model, the confidence limits are calculated using a table (Armstrong, 1992Go) based on Land's exact method.


View this table:
[in this window]
[in a new window]
 
Table 2. Estimated means, standard deviations and 95% CI for the mean of the IPA exposures using various statistical distributions

 
Table 2 and Figs 1Go3 suggest that the 4-parameter lognormal model, with maximum exposure estimated from the data, provides the best fit to the IPA exposures. The standard normal distribution is a close second. The estimated mean, standard deviation and confidence interval for the mean calculated with the normal distribution are in very good agreement with the estimates of the 4-parameter lognormal model when the maximum is estimated from the data. These estimates of the mean and standard deviation are less than the corresponding estimates generated with the 2-parameter lognormal model and the 4-parameter model. As expected, the confidence intervals for the mean exposure are narrower with the better fitting distributions.


    CONCLUDING REMARKS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
The 4-parameter lognormal distribution is a physically consistent representation of exposure that can incorporate the minimum and maximum values that bound it. It should provide a fit comparable to that of the 2-parameter model for most exposure data when the a priori maximum is large relative to the sample values. The additional flexibility to estimate extremes may prove valuable in estimating background concentrations, maximum likely exposures and mean exposures. This estimation procedure can be informed by knowledge of the process and chemical and by any existing data. Compliance sampling, or ‘worst case’ sampling, may be particularly useful here in estimating maximums. Estimation of the mean and variance is more involved here than with the 2-parameter lognormal model, but can be readily programmed or implemented in a spreadsheet.

A decision to use the 4-parameter lognormal model versus an unbounded distribution like the 2-parameter, for inference, involves a trade-off. In the latter case, one accepts finite probabilities of impossible exposures and hopes these errors produce negligible results on the subsequent inference. In the former case, one hopes that the errors in estimating the probable extremes do not unduly influence the results. The major difficulty in using the 4-parameter lognormal model observed here was in estimating the extremes from the data and assuring reasonable estimates. Further research is needed to explore this issue and the use of the Bayesian method appears promising here.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
Part I: 4-parameter lognormal equations
The mean, of y is (Johnson, 1949Go)

(A1)
and

(A2)

(A3)

(A4)

(A5)

The variance of y is

(A6)
where is the second moment of y about the origin and

(A7)
Application of the chain rule to the expression (A7) and some algebraic simplification yields the following explicit expression for the variance of y:

(A8)
The partial derivatives are

(A9)

(A10)

(A11)

Part II: Finney's equations
The MVUEs for the mean (m) and variance (v) from a sample of size (n) according to Finney (1941)Go are

(A12)

(A13)
where

(A14)
Within the exposure context, and s2 are the sample mean and variance of the natural logarithms of concentration, respectively. The complex Bessel function (equation A14) is evaluated with the IMSL subroutine (DCBJS). This eliminates the need to monitor convergence for the infinite series form that is often used for Finney's method.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
The author wishes to thank Dr Norman L. Johnson and the reviewers for their extremely helpful comments.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY AND BACKGROUND
 ESTIMATION
 APPLICATIONS
 CONCLUDING REMARKS
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 

Armstrong BG. (1992) Confidence intervals for arithmetic means of lognormally distributed exposures. Am Ind Hyg Assoc J; 53(8): 481–5.

Bukac J. (1972) Fitting Sb curves using symmetrical percentile points. Biometrika; 59: 688–90.[Abstract/Free Full Text]

Esmen NA, Hammad YY. (1977) Log-normality of environmental sampling data. J Environ Sci Health; A12(1&2): 29–41.

Finney DJ. (1941) On the distribution of a variate whose logarithm is normally distributed. J R Stat Soc Suppl; 7(2): 155–61.[CrossRef]

Flynn MR. (in press) The beta distribution—a physically consistent model for human exposure to airborne contaminants. Stochastic Environ Res Risk Assess.

Flynn MR. (2004) A stochastic differential equation for exposure yields a beta distribution. Ann Occup Hyg; 48: 491–7.[Abstract/Free Full Text]

George DK, Flynn MR, Harris RH. (1995) Autocorrelation of interday exposures at an automobile assembly plant. Am Ind Hyg Assoc J; 56: 1187–94.

Georgopoulos PG, Seinfeld JH. (1982) Statistical distributions of air pollutant concentrations. Environ Sci Technol; 16(7): 401A–416A.

Johnson NL. (1949) Systems of frequency curves generated by methods of translation. Biometrika; 36: 149–76.[Free Full Text]

Johnson NL, Kotz S. (1970) Continuous univariate distributions—2. Boston, MA: Houghton Mifflin Company.

Kiester AR, Barakat R. (1974) Exact solutions to certain stochastic differential equation models of population growth. Theor Popul Biol; 6: 199–216.[CrossRef][Web of Science][Medline]

Lambert JA. (1970). Estimation of parameters in the four-parameter lognormal distribution. Aust J Stat; 12(1): 33–43.

Lefante JJ, Shah, AK. (2002) Robustness properties of lognormal confidence intervals for lognormal and gamma distributed data. Commun Stat Theory Meth; 31(11): 1939–57.[CrossRef]

Mage, DT. (1980) An explicit solution for SB parameters using four percentile points. Technometrics; 22(2): 247–51.[CrossRef]

Parresol BR. (2003) Recovering parameters of Johnson's SB distribution. U.S. Department of Agriculture, Forest Service, Southern Research Station, Paper SRS-31.

Rappaport SM. (1994) Interpreting levels of exposures to chemical agents. In Harris, RL, Cralley, LJ and Cralley, LV, editors. Patty's industrial hygiene and toxicology. Vol. III. Part A, Chapter 8, pp. 395. New York: John Wiley & Sons.

Siekierski K. (1992) Comparison and evaluation of three methods of estimation of the johnson Sb distribution. Biom J; 34(7): 879–95.

Slifker JF, Shapiro SS. (1980) The Johnson system: Selection and parameter estimation. Technometrics; 22(2): 239–46.[CrossRef]

Tolentino D, Zenari E, Dall'Olio M et al. (2003) Application of statistical models to estimate the correlation between urinary benzene as a biological indicator of exposure and air concentrations determined by personal monitoring. Am Ind Hyg Assoc J; 64: 625–9.

Tuckwell HC. (1974) A study of some diffusion models of population growth. Theor Popul Biol; 6: 199–216.[CrossRef][Web of Science][Medline]

Tsionas EG. (2001) Likelihood and posterior shapes in Johnson's SB system. Sankhya. Ser B; 63(1): 3–9.

Vroon WJ. (1981) A class of variate transformation causing unbounded likelihood. J Austral Stat; 76(375): 709–12.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
ANN OCCUP HYGHome page
M. R. Flynn
Analysis of Exposure Biomarker Relationships with the Johnson SBB Distribution
Ann. Hyg., August 9, 2007; (2007) mem033v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
48/7/617    most recent
meh071v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (3)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by FLYNN, M. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by FLYNN, M. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?