Annals of Occupational Hygiene Advance Access originally published online on January 30, 2006
Annals of Occupational Hygiene 2006 50(3):281-288; doi:10.1093/annhyg/mei076
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 British Occupational Hygiene Society Published by Oxford University Press
Original Article |
Mixed Models and Empirical Bayes Estimation for Retrospective Exposure Assessment of Dust Exposures in Canadian Sawmills
1 School of Occupational & Environmental Hygiene, University of British Columbia, 372-2206 East Mall, Vancouver, British Columbia, Canada V6T 1Z3; 2 Center for Healthcare Innovation and Improvement, #E417-4480 Oak Street, Shy Building, Vancouver, BC, Canada V6H 3V4; 3 Department of Health Care & Epidemiology, University of British Columbia, 5804 Fairview Avenue, Vancouver, British Columbia, Canada V6T 1Z3
* Author to whom correspondence should be addressed. Tel: +1-604-822-8960; fax: +1-604-822-9588; e-mail: melissaf{at}interchange.ubc.ca
| ABSTRACT |
|---|
|
|
|---|
Objectives: Data on job histories is commonly available from study subjects and worksites, therefore jobs are often used for assigning exposures in historical epidemiological studies. Exposure estimates are often derived by offering jobs as fixed effects in statistical models. An alternative approach would be to offer job as a random effect to obtain empirical Bayes estimates of exposure. This approach is more efficient since it weights exposure estimates according to the within-job and between-job variability and the number of measurements for each job. We assess three models for predicting historical dust exposures of sawmill workers. Methods: Models were developed using 407 inhalable dust measurements collected from 58 jobs in four sawmills. The first model incorporated all variables as fixed effects; the second added a random term to account for correlation within workers; and the third offered random terms for worker, job and mill (hierarchical model). Empirical Bayes estimates were used to calculate job-specific exposures from the hierarchical model. Results: The fixed effects and random worker mixed models performed nearly identically because there was low within-worker correlation (r = 0.26). The Bayesian exposure predictions from the hierarchical model were slightly more correlated with the observed mill-job arithmetic means than those from the models where jobs were fixed effects (0.74 versus 0.70). Conclusions: While we observed no large differences in exposure estimates by treating job as a fixed or random effect, treating job as a random effect allowed for job-specific coefficients to be estimated for every job while borrowing strength in the presence of sparse data by assuming that the job means are normally distributed around the group mean. In addition, empirical Bayes job estimates can be used for a posteriori job grouping. The use of this method for retrospective exposure assessment should continue to be examined.
Keywords: empirical Bayes statistical modelling retrospective exposure assessment dust sawmills
| INTRODUCTION |
|---|
|
|
|---|
Statistical models have been used as an objective tool to predict historical exposures for epidemiological studies for over two decades (Dement et al., 1983
More recently, a number of investigators have examined the hypothesis that correlation between measurements, unaccounted for by the fixed effects in the model, may also exist at higher levels within the work setting, i.e. at the machine, job, department, building or plant level (Symanski et al., 2001a
,b
; Mikkelsen et al., 2002
). If this is the case, incorporation of multiple random effect terms can be used to account for the hierarchical structure and the potential clustering of exposures based on when and where the measurements were collected (Symanski et al., 2001a
; Diggle et al., 2002
). In a study evaluating the exposure in the Danish furniture industry, Mikkelsen et al. (2002)
found that worker, machine and factory were sources of exposure clustering that were not explained by the fixed effects offered in the models, and, thus, random terms were included for each. A random term for department was also tested but provided negligible improvement in the modelling and was not included in the final model. In an evaluation of long-term exposure trends in the nickel industry, Symanski et al. (2001b)
found that accounting for the clustering of exposures by worker, job groups, building and plant (offered in the models as random effects) significantly impacted the magnitude and even the direction of the fixed effect estimate of the time trend compared with a model with only a random worker effect and a model with no random effect terms. In a study examining the sources of variability in workers' exposure to inorganic mercury, setting job as a random or fixed effect made no apparent difference in predicted exposures from a mixed model that also incorporated a random worker term (Symanski et al., 2001a
).
Within this context, there has been some debate about whether job should be considered as a random or fixed effect in exposure models (Kromhout and Heederik, 1995
; Symanski et al., 2001b
). Typically, variables have been offered as fixed effects when the focus is on estimating the mean value for specified groups represented in the data and as random effects when the focus is controlling correlation due to lack of independence unaccounted for by other variables in the model (Symanski et al., 2001a
,b
). Though job titles are not exposure determinants per se, they are frequently one of the only types of exposure-related data available for epidemiological studies and are, therefore, often used as surrogates for unmeasured job-related determinants. Because investigators want to obtain exposure estimates for each job, jobs have normally been entered as fixed effects in determinants of exposure models.
This argument overlooks an advantage of treating job as a random effect, beyond controlling within-job correlation. In historical cohort studies there may be a very large number of jobs, and some of them may have only a few exposure measurements. In building a model to predict historical exposures for each job, there are a number of possible options. All jobs can be entered in the model; but degrees of freedom will be high (no. of jobs 1) requiring a large dataset. A second approach is to remove any job with a high P-value in a backwards regression procedure. This method assumes that any job removed from the model has a coefficient of 0, which often results in a reduction of the standard errors of the coefficient estimate for the variables remaining in the models. However, Greenland et al. (1999
, 2000a
) suggest that the smaller standard errors are artificial and do not reflect the true variability in the coefficient estimate of that variable. Another option would be to group jobs based on work tasks and job location, but grouping may obscure important differences between jobs within the same group. Additionally, the mean of the job group will be dependent on the clustering of exposures by job over time.
A more efficient approach is to offer job as a random effect in a mixed model and to use the empirical Bayes estimates for each job to calculate the predicted exposure (Greenland, 2000a
; Verbeke and Molenberghs, 2000
). Offering job as a random effect borrows strength from the whole dataset by assuming that the job means are normally distributed around the group mean with the distribution of the means equal to the between-job variance. Empirical Bayes estimation gives more weight to the group mean when there are sparse measurements for a job and when the within-job variability is large compared with the between-job variability. More weight is given to the observed job mean if the opposite is true. Using empirical Bayes estimates of the coefficients has become more common in modelling exposureresponse relationships in epidemiology (Greenland, 2000b
) but has not yet been used to predict historical exposure.
In this paper, we compare three models for predicting historical dust exposure in sawmills in British Columbia, Canada: a fixed effect model; a mixed model with worker as a random effect; and a more complex hierarchical model with worker, job and mill offered as random effects.
| METHODS |
|---|
|
|
|---|
Description of dataset and exposure determinants
The dataset consisted of 423 personal full-shift inhalable dust measurements collected at four sawmills in 1996 using a GSP sampler (Deha-Haan & Wittmer GmbH, Friolzheim, Germany). The protocol involved randomly sampling all jobs over two seasons at each mill but did not specifically attempt to capture repeated measurements on the same worker. The average number of measurements collected per worker in the same job was 1.2 (range 14). Of 327 workers measured, 68 workers were sampled two or more times. Measurements were excluded (n = 16) if the worker sampled spent <75% of the day at one job since the aim was to estimate full-shift exposures at the job level. Analyses were based on all 407 remaining measurements, representing 327 different workers.
Variables that might predict exposure levels were identified from information collected concurrently with the dust measurements and from the mills' technological histories (production reports, schematics, walk through surveys and interviews with key staff during site visits). Variables that could not be assessed with confidence for all jobs, mills and the entire work history of the sawmill cohort (19091995) were excluded from consideration in the models. A total of 58 jobs were identified (median 4 measurements per job, range 126). Each job was assigned to one of 16 process areas defined by work area and tasks.
Statistical analyses
Prior to modelling, correlations between independent variables were examined. Correlations between ordinal and indicator variables were examined using Spearman's rho. Categorical variables with no intrinsic order were evaluated for correlations using Cramer's V and contingency coefficient (Fisher and van Belle, 1993
). Among variable pairs with r > 0.7, only the determinant considered more logically related to exposure and/or more consistently and reliably available over the time-span of the study was retained for modelling. All exposure measurements were log-transformed (base e) because the data were positively skewed.
Only exposure determinants correlated with dust concentration (P-values <0.2) in simple linear regression were considered for inclusion in the predictive models (GLM procedure in SPSS version 10.1 (SPSS Inc., Chicago, IL, USA)). Several variables were nested a priori if they were only relevant for certain job tasks or if the magnitude of their effect was expected to change within specific process areas. For instance, whether a planermill was on site was only nested in process areas that could potentially receive planed lumber. Mill department, indoor job location, planermill and wood condition were nested within process groups. These nested terms were evaluated in simple linear regression analyses with their corresponding main effect (e.g. process area + process area x indoor job location) and only those with a P-value < 0.2 were offered in the models.
Hierarchical model
The structure of the more complex hierarchical model is described first. The random worker mixed model and the fixed effects model are simpler models that have specific random effect terms removed. All models were estimated using the PROC MIXED restricted maximum likelihood method in SAS version 8.02 (SAS Institute Inc., Cary, NC, USA).
The multilevel model employs a hierarchical structure with random worker, job and mill effects [equation (1)].
![]() | (1) |
The dependent variable
is the log-transformed dust concentration Yghij measured on the jth day on the ith worker in the hth job at the gth mill. µ is the intercept. Jgh(p) and Wghi are the random effects corresponding to job and worker, respectively. Note that the random job effect is nested within the pth process group (group of jobs with same work area).
mg,
jgh and
wghij are the summation of the fixed effects related to mill-specific, job-specific and worker-specific determinants, respectively. For example,
jgh = sawing process group + indoor job + job in booth for the hth job in the gth mill.
ghij is the residual. Jgh, Wghi and
ghij are assumed to be statistically independent and approximately normally distributed with a mean value of 0 and with variances
,
and
, respectively.
is the within-worker variance.
and
are the between-job and between-worker variance that are not explained by the fixed effects. Estimates of the coefficient for the job random effect were calculated using an empirical Bayes procedure where the unknown parameters are replaced by their restricted maximum likelihood estimates in the Bayesian calculations (Verbeke and Molenberghs, 2000
). All variance components were assumed equal across worker and job with a uniform variance structure (compound symmetry). We tested this assumption by comparing the model fit statistics of this model with pooled variance components against a model that allowed distinct variance components by job location (indoors versus outdoors) or by process area.
The multilevel model was developed using a multi-step process to reflect the assumption that lower level effects (i.e. worker and job) are more likely to explain exposure levels than higher level effects (i.e. mill) and to avoid saturating the model (Mikkelsen et al., 2002
). Step 1 offered all random effects plus all job-level fixed effects (except job itself) into the model. Variables with the highest P-value were removed one by one until all fixed effects had a P-value
0.10. At the end of Step 1, if the random job effect and worker effect was not significant with a P-value >0.10 they would be removed. In Step 2, all fixed effects at the mill level were offered into the model.
Random worker mixed model
For the random worker mixed model, the random job effect and its corresponding variance components from the hierarchical model were not included in the model structure [equation (2)]. The random worker effect and its variance components
and
remained. Variables were entered in the model in two stages. In Step 1, the random worker effect and all job-level and mill-level fixed effects, except job itself, were offered into the model. Variables with the highest P-value were removed one by one until all fixed effects had a P-value
0.10. In Step 2, individual jobs that had been significant in simple linear regressions were offered into the model and removed one by one until all fixed effects had a P-value
0.10. Because jobs were grouped into process groups (a categorical variable included in the models as a surrogate for work tasks and location) a job would only remain in the model if its mean was significantly different from the process group mean.
![]() | (2) |
Fixed effects model
In the fixed effects model, no random effects were included in the model structure [equation (3)]. The fixed effects model was developed in the same manner as the random worker mixed model, except that no random worker effect was included. Under this model, the log exposure was assumed normally distributed with mean µ +
mg +
jgh +
wghij and variance
![]() | (3) |
Model comparison
For the random worker mixed model and the fixed effects model, the predicted arithmetic mean exposure for each mill-job combination
was calculated from the models' predicted exposures
using equation (4).
![]() | (4) |
) where
for the mixed models and where
is calculated from the fixed effects in the model (Hornung et al., 1994
for the hth job in the gth mill includes the fixed effects (
) and the job-specific empirical Bayes estimates (Wh) [equation (5)].
![]() | (5) |
The correlations (Pearson r) between observed and predicted exposures from the three models were examined for both individual measurements and mill-job arithmetic means.
| RESULTS |
|---|
|
|
|---|
The dataset had an arithmetic mean wood dust concentration of 1.9 mg m3, geometric mean of 0.81 mg m3, geometric standard deviation 3.47 and a range of 0.0542 mg m3. The model parameters are provided in Tables 1 (fixed effects) and 2 (empirical Bayes job estimates). Lack of variability in mill-specific variables removed all but a few key fixed mill-level variables from consideration in the models. The random worker mixed model contained all but one of the variables (job: Tallyman) included in the fixed effects model and only very small differences in the magnitude of the parameter estimates were observed. The hierarchical model contained one additional nested variable (Mill Maintenance Process Group x Kiln Dried Lumber) and excluded the job in vehicle variable. The random effect term for mills was not significant in the hierarchical model, owing in part to the limited number of mills included in the dataset and the inclusion of the few mill-level fixed effects that explained key differences between the sawmills.
|
|
Compared with the fixed effects model, the standard errors were on average 2% higher for the random worker mixed model and 11% higher for the hierarchical model. The within-job variance (
) was nearly equal for the three models (Table 3). In all three models, 46% of the between-worker and within-worker variance was explained by the fixed effects when compared with the variance components of a model with only a random worker term and no fixed effects. The fixed effects in the random worker mixed model and hierarchical model preferentially reduced the between-worker variance (by 72 and 69%, respectively) over the within-worker variance (by 2 and 7%, respectively). The model fit statistics were nearly identical for the fixed effects model and random worker mixed model, not surprising given the low correlation between repeated measurements on the same worker (0.26). The hierarchical model fit statistics were 5% higher, with a low correlation between repeated measurements on the same worker (0.21) and moderate correlation between measurements within the same job (0.48). Allowing heterogeneity in the variance components across job location (indoor versus outdoor jobs) resulted in similar variance components and did not improve the model fit. Some differences in the magnitude of the variance components was observed when the model allowed for heterogeneity based on process group, but no process area-specific variance component estimates were statistically significant and the model resulted in a poorer model fit than the model that pooled variance components (data not shown).
|
The predicted exposures from the fixed effects model and the random worker mixed model were nearly identical owing to their nearly identical parameter estimates (r > 0.99). The hierarchical model's predicted exposures for individual measurements (r = 0.97) and for mill-job means (r = 0.92) were highly correlated with the predicted exposures from the other models. The predicted mean exposures of each job in each mill (for jobs with
3 measurements) from the hierarchical model were slightly more correlated with the measured arithmetic means than those of the other models (r = 0.74 versus r = 0.70). | DISCUSSION |
|---|
|
|
|---|
In this example, there was little difference between exposures predicted using the fixed effects and random worker mixed models. The hierarchical model demonstrated a stronger relationship with the observed measurements, but the improvement was slight. Differences in the models' predicted arithmetic means would be observed if the variables included in the models were different, if the magnitude of the model coefficients were different or if the remaining unexplained between-worker and within-worker variance components were different. The fixed effects model and the random worker mixed model demonstrated no substantive differences in any of these factors for the dataset evaluated here. The hierarchical model included empirical Bayes coefficients for every job in the study no matter how few measurements were available, whereas the other two models included only jobs that differed significantly from the process group means.
The hierarchical model improved the precision of the predicted exposures but at the expense of greater complexity. Although the random mill term was not significant, the random job effect explained additional variability in the dataset not accounted for by the fixed effects. The 11% increase in the standard errors of the coefficients over those of the fixed effects model was not unexpected owing to the greater complexity of the model. It may also be a better reflection of the true uncertainty in the coefficient estimates (Greenland, 1999
, 2000a
).
The models with jobs treated as a fixed effect provides greater simplicity when applying a model to predict historical exposures because jobs were included in the model only if their mean exposure significantly differed from that of their process group mean. However, this approach fails to borrow strength from the distribution of job means around the group mean and fails to account for potential clustering of exposures of jobs within a group. While more complex, including all jobs in the model by offering job as a random effect and using the empirical Bayes exposure estimates can aid in a posteriori grouping of jobs for retrospective exposure assessment (Peretz et al., 2002
).
While it is disappointing that the results from this specific dataset do not show strong differences between the modelling approaches, it is also reassuring that the models' predictions were robust to the models' specifications. Modelling job as a random effect becomes a more useful tool if heterogeneity is theorized, i.e. one could allow the between-job and within-job variance components to vary by process group or other factor. While we limited our analyses to an empirical Bayes estimate of the job effects, which are sensitive to deviations from normality assumptions (Verbeke and Molenberghs, 2000
), with the advent of software such as R and WinBugs, a full Bayesian analysis could be conducted to potentially offer even better precision for the predictions.
Differences in predicted exposures when treating job as a fixed versus a random effect may be more significant in models developed for other purposes or for datasets where the clustering of exposures by job is greater. For instance, more clustering of exposures may exist when data collected for different purposes or using different sampling strategies are combined. Our results are specific to this dataset; however, they illustrate the method, provide a starting point for comparisons, and suggest some areas where empirical Bayes estimates may be useful for retrospective exposure assessment.
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
The authors thank the management, unions and workers of the sawmills participating in the study. We thank Igor Burstyn for our discussions regarding calculating arithmetic means from mixed models and Chava Peretz for advice about presenting the empirical Bayes approach. This study was funded in part by the Canadian Institutes for Health Research and the British Columbia Lung Association. Trainee funding for MF was provided by the Michael Smith Foundation for Health Research and the Canadian Institutes for Health Research.
Received May 9, 2005; in final form October 4, 2005
| REFERENCES |
|---|
|
|
|---|
Burdorf A, van Tongeren M. (2003) Variability in workplace exposures and the design of efficient measurement and control strategies. Ann Occup Hyg; 47: 9599.
Burstyn I, Teschke K. (1999) Studying the determinants of exposure: a review of methods. Am Ind Hyg Assoc J; 60: 5772.[Web of Science][Medline]
Burstyn I, Kromhout H, Kauppinen T et al. (2000) Statistical modelling of the determinants of historical exposure to bitumen and polycyclic aromatic hydrocarbons among paving workers. Ann Occup Hyg; 44: 4356.
Dement JM, Harris RL Jr, Symons MJ et al. (1983) Exposures and mortality among chrysotile asbestos workers. Part I: Exposure estimates. Am J Ind Med; 4: 399419.[Web of Science][Medline]
Diggle PJ, Heagerty P, Liang KY et al. (2002) Analysis of longitudinal data. Oxford: Oxford University Press.
Eisen EA, Smith TJ, Wegman DH et al. (1984) Estimation of long term dust exposures in the Vermont granite sheds. Am Ind Hyg Assoc J; 45: 8994.[Medline]
Fisher LD, van Belle G. (1993) Biostatistics: a methodology for the health sciences. New York: John Wiley and Sons.
Greenland S. (1999) Multilevel modeling and model averaging. Scand J Work Environ Health; 25: 438.
Greenland S. (2000a) Principles of multilevel modelling. Int J Epidemiol; 29: 15867.
Greenland S. (2000b) When should epidemiologic regressions use random coefficients? Biometrics; 56: 91521.[CrossRef][Web of Science][Medline]
Hornung RW, Greife AL, Stayner LT et al. (1994) Statistical model for prediction of retrospective exposure to ethylene oxide in an occupational mortality study. Am J Ind Med; 25: 82536.[Medline]
Kromhout H, Heederik D. (1995) Occupational epidemiology in the rubber industry: Implications of exposure variability. Am J Ind Med; 27: 17185.[Web of Science][Medline]
Kromhout H, Swuste P, Boleij JS. (1994) Empirical modelling of chemical exposure in the rubber-manufacturing industry. Ann Occup Hyg; 38: 322.
Mikkelsen AB, Schlunssen V, Sigsgaard T et al. (2002) Determinants of wood dust exposure in the Danish furniture industry. Ann Occup Hyg; 46: 67385.
Peretz C, Goren A, Smid T et al. (2002) Application of mixed-effects models for exposure assessment. Ann Occup Hyg; 46: 6977.
Preller L, Kromhout H, Heederik D et al. (1995) Modeling long-term average exposure in occupational exposureresponse analysis. Scand J Work Environ Health; 21: 50412.[Web of Science][Medline]
Rappaport SM, Weaver M, Taylor D et al. (1999) Application of mixed models to assess exposures monitored by construction workers during hot processes. Ann Occup Hyg; 43: 45769.
Seixas NS, Checkoway H. (1995) Exposure assessment in industry specific retrospective occupational epidemiology studies. Occup Environ Med; 52: 62533.
Stewart PA, Lees PS, Francis M. (1996) Quantification of historical exposures in occupational cohort studies. Scand J Work Environ Health; 22: 40514.[Web of Science][Medline]
Symanski E, Chan W, Chang CC. (2001a) Mixed-effects models for the evaluation of long-term trends in exposure levels with an example from the nickel industry. Ann Occup Hyg; 45: 7181.
Symanski E, Sallsten G, Chan W et al. (2001b) Heterogeneity in sources of exposure variability among groups of workers exposed to inorganic mercury. Ann Occup Hyg; 45: 67787.
Symanski E, Savitz DA, Singer PC. (2004) Assessing spatial fluctuations, temporal variability, and measurement error in estimated levels of disinfection by-products in tap water: Implications for exposure assessment. Occup Environ Med; 61: 6572.
Teschke K, Demers PA, Davies HW et al. (1999) Determinants of exposure to inhalable particulate, wood dust, resin acids, and monoterpenes in a lumber mill environment. Ann Occup Hyg; 43: 24755.
Verbeke G, Molenberghs G. (2000) Linear mixed models for longitudinal data. New York: Springer-Verlag.
Yu RC, Tan W-Y, Mathew RM. (1990) A deterministic mathematical model for quantitative estimation of historical exposure. Am Ind Hyg Assoc J; 51: 194201.
This article has been cited by other articles:
![]() |
J M Dement, D Myers, D Loomis, D Richardson, and S Wolf Estimates of historical exposures by phase contrast and transmission electron microscopy in North Carolina USA asbestos textile plants Occup. Environ. Med., September 1, 2009; 66(9): 574 - 583. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sbihi, K. Teschke, Y. C. Macnab, and H. W. Davies Determinants of Use of Hearing Protection Devices in Canadian Lumber Mill Workers Ann. Hyg., July 1, 2009; (2009) mep043v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Tielemans, D. Noy, J. Schinkel, H. Heussen, D. Van Der Schaaf, J. West, and W. Fransman Stoffenmanager Exposure Model: Development of a Quantitative Algorithm Ann. Hyg., August 1, 2008; 52(6): 443 - 454. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






