Skip Navigation


Annals of Occupational Hygiene Advance Access originally published online on February 8, 2006
Annals of Occupational Hygiene 2006 50(4):371-377; doi:10.1093/annhyg/mei078
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
50/4/371    most recent
mei078v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by LAMPA, E. G.
Right arrow Articles by BERGDAHL, I. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by LAMPA, E. G.
Right arrow Articles by BERGDAHL, I. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?


© 2006 British Occupational Hygiene Society Published by Oxford University Press


Original Article

Optimizing Occupational Exposure Measurement Strategies When Estimating the Log-Scale Arithmetic Mean Value—An Example from the Reinforced Plastics Industry

ERIK G. LAMPA1,*, LEIF NILSSON2, INGRID E. LILJELIND1 and INGVAR A. BERGDAHL1

1 Occupational Medicine, Department of Public Health and Clinical Medicine, Umeå University, 901 87 Umeå, Sweden; 2 Department of Mathematical Statistics, Umeå University, 901 87 Umeå, Sweden

* Author to whom correspondence should be addressed. E-mail: erik.lampa{at}envmed.umu.se


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
When assessing occupational exposures, repeated measurements are in most cases required. Repeated measurements are more resource intensive than a single measurement, so careful planning of the measurement strategy is necessary to assure that resources are spent wisely. The optimal strategy depends on the objectives of the measurements. Here, two different models of random effects analysis of variance (ANOVA) are proposed for the optimization of measurement strategies by the minimization of the variance of the estimated log-transformed arithmetic mean value of a worker group, i.e. the strategies are optimized for precise estimation of that value. The first model is a one-way random effects ANOVA model. For that model it is shown that the best precision in the estimated mean value is always obtained by including as many workers as possible in the sample while restricting the number of replicates to two or at most three regardless of the size of the variance components. The second model introduces the ‘shared temporal variation’ which accounts for those random temporal fluctuations of the exposure that the workers have in common. It is shown for that model that the optimal sample allocation depends on the relative sizes of the between-worker component and the shared temporal component, so that if the between-worker component is larger than the shared temporal component more workers should be included in the sample and vice versa. The results are illustrated graphically with an example from the reinforced plastics industry. If there exists a shared temporal variation at a workplace, that variability needs to be accounted for in the sampling design and the more complex model is recommended.

Keywords: measurement strategy • variance components • exposure assessment


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
Measurement strategies for hazard control have to be efficient and effective (Kromhout, 2002Go). Efficient in the sense that as much information about the exposure as possible is obtained by a minimal spending of resources, and effective in the sense that it should give valid information.

Kromhout et al. (1993)Go published the first comprehensive evaluation of temporal and personal variability of a large number of chemicals and workplaces and stated that exposures within the same industry tend to vary much, both between individuals and over time. Rappaport et al. (1995)Go published a theoretical framework for exposure assessment which accounted for these variances and introduced a Wald-type test (described in detail in Lyles et al., 1997Go) for testing whether the workers are likely to be overexposed (Tornero-Velez et al., 1997Go) in the long run or not.

An important aspect of the statistical methodology presented in Rappaport et al. (1995)Go is the need for repeated measurements (replicates) on the same worker over time. Of course, repeated measurements require more resources than single measurements, so how can a hygienist decide which the best combination of workers and replicates is? The answer to that question depends on what the objective of the measurements is. Rappaport et al. (1995)Go proposed an expression for determining the optimal sample size derived from the power of the Wald-type test. However, the Wald-type test is based on large-sample properties and may be unreliable for the small samples that are common today.

The aim of this study is to determine how to prioritize between spending resources on including many workers or many sampling days when striving to obtain the most precise estimate of the arithmetic mean (AM) value for a worker group. Elementary sampling theory states that the best estimate of a mean value is always obtained when one measurement is obtained from as many workers as possible (Cochran, 1977Go). That is however only valid when exposures are normally distributed, which is generally not the case as occupational exposures tend to be log-normally distributed. To explore this problem we investigate two models; the one-way random effects model previously used for assessing occupational exposures (see e.g. Kromhout et al., 1993Go and Rappaport et al., 1995Go) and a two-way random effects model which introduces the concept of a temporal variability of the exposure common to all workers.

Here we derive the necessary equations and illustrate the application of the models by an example of exposure to styrene from the reinforced plastics industry.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
The one-way random effects model
Let {xij}, i = 1, ..., k workers, j = 1, ..., n replicates, be shift long measurements on the random continuous exposure variable X. By assuming Formula, X is then assumed to be log-normally distributed with mean Formula and variance Formula. The log-scale mean value, µy, can be estimated by Formula which is an estimator for the log-scale geometric mean value (GM). The GM of {xij} is defined as Formula. Taking logarithms yields Formula, which is the estimator Formula as stated above.

The one-way random effects analysis of variance (ANOVA) model can be written as

Formula 1(1)
where Yij is the jth log-transformed measurement xij on worker i, µy is the log-transformed true mean value, ßi is the random effect of worker i and {varepsilon}ij is the residual term. It is assumed in the model that Formula 1 and that Formula 1 and that they are mutually independent. Formula 1 and Formula 1 are called the between- and within-worker variance components, respectively, and Formula 1. We see that for Model 1 E(Yij) = µy, i.e. equation (1) implies a stationary behaviour of the mean value, which means that there should be no systematic changes in the exposure during the time period that the samples are collected. The variance components can be estimated by several methods (Searle et al., 1992Go) but our method applies to variance components estimated by the ANOVA estimators. Methods in statistical software, e.g. PROC MIXED in SAS, estimate the variance components by restricted maximum likelihood (REML) by default, but in the case of completely balanced designs, as we focus on here, the estimators are identical (Searle et al., 1992Go).

The AM value is estimated by Formula 1 with the total variance estimated as the sum of the estimated variance components, i.e. Formula 1. For calculations to be relatively easy we will focus our interest to the log-transformed AM. Maximizing the precision is equivalent to minimizing the variance so our problem is to find a set of k and n which minimizes Formula 1 which can be written as,

Formula 2(2)
It can be shown that Formula 2 and the variance component estimators are statistically independent (Searle et al., 1992Go) so the last term in equation (2) is equal to zero. We then have according to Cochran (1977)Go and Searle and Fawcett (1970)Go.

Formula 2

Here K is the worker population total and N is the number of replicates theoretically available. We can think of N as a very large number and therefore approximate Formula 2 and Formula 2. In the derivation, the worker effects are assumed to be a random sample from an infinite population and although the worker population, unless very large, is finite we approximate the first pair of parentheses in Formula 2 to unity. The variance of the variance estimator can be expanded to Formula 2. Searle et al. (1992)Go give explicit expressions for the sampling variances of the variance components and we can write equation (2) as

Formula 3(3)

The two-way random effects model
For the one-way model E[Yij|j] = µy regardless of the sampling occasion. If we assume that the choice of sampling day might bias the mean estimate, i.e. E[Yij|j] = µy + {tau}j for sampling day j, we must account for that in the sampling design. Since the within-worker variability is the sum of all variability in the data not explained by the between-worker variance component we can partition the within-worker variance into a temporal variance assumed common for all workers and the residual variance, i.e. Formula 3. Again assuming shift long measurements and no interaction between factors the resulting model can be written as,

Formula 4(4)
where Yij is the logged measurement on the ith worker on the jth day of sampling. It should be noted that in equation (1) j represents the measurement occasion and here in equation (4) j represents the jth day. We assume here that Formula 4, where Formula 4 represents the shared temporal variability, and that Formula 4. The derivation for the variance of the estimated log-scale mean value for the two-way model is in Appendix A.

A practical example
To illustrate the use of the two models styrene exposure data (described in Liljelind et al., 2001Go) from five industries were used. The sixth industry in the original data was not used due to the small number of workers included (K = 3). Data were analysed with the one-way and the two-way random effects model, respectively, and the variance components were estimated. Distinct variance components were assumed for each industry. The estimates were then used to graphically illustrate the effect of the sample allocation on the precision of the estimated mean value. The plots were constructed using MATLAB 6.5 (The MathWorks, Inc.). We have not taken into account that sampling costs are often different for the inclusion of more workers as compared to more replicates.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
Table 1 shows the results of the analyses. We see that in the first three industries, the between-worker variance is larger than the within-worker variance. For those three industries, the shared temporal variance Formula 4 is only slightly larger than zero indicating that only a minor part of the variability is a day to day variation that the workers have in common. For the other two industries the situation is reversed. Here the within-worker component is larger than the between-worker component and the shared temporal component is a large part of the total variability. As expected, the shared temporal variance reduces the within-worker/residual variances. In industry number 4 we also note that the between-worker variance decreased and that there was a relatively large residual variance. Such effects are not uncommon when using REML as the estimation procedure. In all cases the distributional assumptions of the random effects in the models could not be rejected on the 5%-level using the Shapiro–Wilk test on the estimated random effects. For further graphical illustrations we chose industries 1 and 5 since they reflect two different conditions.


View this table:
[in this window]
[in a new window]
 
Table 1. Variance component estimates for the two random effects models

 
In order to obtain the most precise estimate of the mean value in a group of workers, using the simpler model (Model 1), as many subjects as possible should be included in the sample, regardless of the sizes of the between-worker and the within-worker variances. This is illustrated in Figs 1 and 2 where the solid lines indicate equivariance lines. The numbers on the lines indicate the variance of the estimated log-transformed mean value. The lower the variance is, the more precise is the estimate of the mean. The broken lines indicate possible alternatives for distribution of 8, 16 and 32 measurements. We have here chosen to show continuous lines, though in practice only integers are possible on the Subjects as well as the Replicates axes. By following the broken lines it is obvious that the lowest variance, and thus the best precision in the estimate of the mean, is obtained when as many subjects as possible are studied.


Figure 1
View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1. Contour plot of the variance of the estimated log scale arithmetic mean (AM) value for the one-way model in the ranges 2 ≤ k ≤ 10 and 2 ≤ n ≤ 10. The x- and y-axes are represented by the number of workers and the number of replicates. The solid lines represent equivariance lines and the broken lines indicate possible distributions for 8, 16 and 32 measurements. The values of the between- and within-worker variance components were set to 1.53 and 0.35, respectively, as was the case in Industry 1.

 

Figure 2
View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2. Contour plot of the variance of the estimated log scale AM value for the one-way model. The ranges are the same as in Fig. 1 with the between- and within-worker variances set to 0.02 and 0.27, respectively, as was the case in Industry 5.

 
However, when the workers' exposure on a particular calendar day to some extent covaries, e.g. when the workers are in the same hall with a single source of exposure in common or when exposure is influenced by day-to-day variability in production, we need to block for that effect in the sampling design and use the two-way model. In this case it is intuitively expected that if the shared temporal variance (i.e. the covarying part of the within-worker variance) is larger than the between-worker variance, then increasing the number of replicates contributes more to the precision than does increasing the number of subjects, as is the case in Fig. 3.


Figure 3
View larger version (13K):
[in this window]
[in a new window]
 
Fig. 3. Contour plot of the variance of the estimated AM value in log-scale for the two-way model. The between-worker, shared temporal and residual variances were set to 0.04, 0.18 and 0.10, respectively, as was the case in Industry 5.

 
In Fig. 4 the shared temporal variance is estimated as almost zero and then we see a similar pattern to that of Fig. 1. This is also easy to see from the model formulation in equation (4); if Formula 4 the two-way model collapses into the one-way model.


Figure 4
View larger version (17K):
[in this window]
[in a new window]
 
Fig. 4. Contour plot of the variance of the estimated AM value in log-scale for the two-way model. The between-worker, shared temporal and residual variances were set to 1.55, 0.04 and 0.32, respectively, as was the case in Industry 1.

 
From the figures we see the impact of the between-worker and the shared temporal variances on the precision of the estimated mean value. The residual variance determines the spacing between the solid lines and the curvature of them. A large residual variance will result in equivariance lines farther apart and less curved than would a smaller residual variance.

A wide range of values for the variance components were applied, three-dimensional plots examined, and the results were in line with the expectations (data not shown).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
In this study we have described two models for optimization of a measurement strategy by minimizing the variance of the estimated AM value, based on the different variance components (the one-way and the two-way random effects models). In order to do this we have further developed the one-way model by separating the shared temporal variability from the variability within workers. Thus, we end up with three variances: the between-worker variance, the shared temporal variance and the residual variance. The main findings are that the optimal sample allocation depends on the relative sizes of the between-worker component and the shared temporal component.

How do these variances affect the optimization? In order to answer this, we let Formula 4 denote the quotient of the between-worker variance and the shared temporal variance. A {varphi} > 1 thus means that measurements on many subjects should be obtained whereas a {varphi} < 1 means that efforts should be focused on obtaining many replicates from each worker. A {varphi} = 1 means that equal efforts should be put into obtaining samples from workers and replicates from them. If the residual variance is very large the optimum will be drawn towards a more equal weight for replicates and subjects. The proposed two-way random effects model and the concept of a shared temporal variation separated from the within-worker variance have, to the authors' knowledge, not previously been described in the literature.

This paper proposes a tool for determining the optimal sample size if the study objective is to determine the mean value as precisely as possible given certain restrictions and is not meant to be used in the analysis of determinants of exposure or for testing if workers are overexposed or not. The expressions for the variance of the mean value and the resulting graphs can be implemented in, for example, MATLAB and serve as a graphical tool for sample size determination.

We have assumed that the values of the estimated variance components reflect the true variances. A prerequisite for this is a large well-designed pilot study, which may not be available. We have chosen not to investigate the effect of poorly estimated variance components on the optimal sample allocation. The minimal sample size required to estimate all the variance components from the two-way model are two workers measured twice with both workers measured on the same day at least once but we do support Rappaport et al. (1995)Go's recommendation that a minimal sample size for a pilot study should be at least five workers measured twice.

For the hygienist our results imply that even when the within-worker variance is large, there are certain situations when the optimal number of replicates for a precise estimation of the mean value may not be large. For example, if the shared temporal variance is zero or small compared to the other variance components, the two-way model collapses into the one-way model and then increasing the number of replicates does very little to contribute to the precision.

It should be mentioned that our estimates of the variance components are somewhat different from those in Liljelind et al. (2001)Go, as we have not assumed that industries could be pooled due to similar variance component characteristics.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
The optimal numbers of workers and replicates for a sampling strategy is closely linked to the variance components. If no shared temporal variance exists, the best precision in the mean value is always obtained by including the maximum number of workers available in the sample. If there is a shared temporal variance, the optimal design is determined by the sizes of the variance components. If the between-worker component is larger than the shared temporal component, more workers should be included in the sample and vice versa. However, this is only valid when the costs for inclusion of more workers are the same as those for the inclusion of more replicates. Of course, more advanced cost models could be applied to the equations.


    APPENDIX A—THE DERIVATION FOR THE VARIANCE OF THE ESTIMATED LOG-SCALE MEAN VALUE FOR THE TWO-WAY MODEL
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
Let m be a vector of mean squares and let {sigma}2 be a vector of variance components of the same order as m. Suppose a matrix P is such that E(m) = P{sigma}2. The ANOVA estimator of {sigma}2 is then Formula . It follows that VarFormula .

Searle et al. (1992)Go show that under the normality assumption, for a mean square Mi and a degree of freedom fi; Formula 4 for which an unbiased estimator is Formula 4. If we define D = V(m) as a diagonal matrix with Formula 4 as the elements, then Formula .

For the two-way model, we can write

Formula 4

Formula 4
and

Formula 4

For unbiased estimation, D is replaced by

Formula

For the two-way model Formula 4 and Formula 4. An expression for the variance of the estimated log-scale mean value can then be obtained by noting that the covariance matrix Formula 4 has the form

Formula 4


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 
This study was funded by the Swedish Council for Working Life and Social Research and the Faculty of Medicine, Umeå University.

Received October 17, 2005; in final form December 8, 2005


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 CONCLUSIONS
 APPENDIX A--THE DERIVATION FOR...
 ACKNOWLEDGEMENTS
 REFERENCES
 

Cochran WG. (1977) Sampling techniques. Wiley series in probability and statistics. ISBN 047116240X.

Kromhout H, Symanski E, Rappaport SM. (1993) A comprehensive evaluation of within- and between-worker components of occupational exposure to chemical agents. Ann Occup Hyg; 37: 253–70.[Abstract/Free Full Text]

Kromhout H. (2002) Design of measurement strategies for workplace exposures. Occup Environ Med; 59: 349–54.[Free Full Text]

Liljelind IE, Rappaport SM, Järvholm BG et al. (2001) Comparison of self- and expert assessment of occupational exposure to chemicals. Scand J Work Environ Health; 27: 311–17.[Web of Science][Medline]

Lyles RH, Kupper LL, Rappaport SM. (1997) Assessing regulatory compliance of occupational exposures via the balanced one-way random effects ANOVA model. J Agri Bio Environ Stat; 2: 64–86.

Rappaport SM, Lyles RH, Kupper LL. (1995) An exposure-assessment strategy accounting for within- and between-worker sources of variability. Ann Occup Hyg; 39: 469–95.[Abstract/Free Full Text]

Searle SR, Fawcett RF. (1970) Expected mean squares in variance components models having finite populations. Biometrics; 26: 243–54.

Searle SR, Casella G, McGulloch CE. (1992) Variance components. New York: John Wiley & Sons, Inc. ISBN 0-471-62162-5.

Tornero-Velez R, Symanski E, Kromhout H et al. (1997) Compliance versus risk in assessing occupational exposures. Risk Anal; 17: 279–92.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
ANN OCCUP HYGHome page
C.-C. Chen, C.-L. Chuang, K.-Y. Wu, and C.-C. Chan
Sampling Strategies for Occupational Exposure Assessment under Generalized Linear Model
Ann. Hyg., July 1, 2009; 53(5): 509 - 521.
[Abstract] [Full Text] [PDF]


Home page
ANN OCCUP HYGHome page
T. MEIJSTER, E. TIELEMANS, N. D. PATER, and D. Heederik
Modelling Exposure in Flour Processing Sectors in The Netherlands: a Baseline Measurement in the Context of an Intervention Program
Ann. Hyg., April 1, 2007; 51(3): 293 - 304.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
50/4/371    most recent
mei078v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by LAMPA, E. G.
Right arrow Articles by BERGDAHL, I. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by LAMPA, E. G.
Right arrow Articles by BERGDAHL, I. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?