Skip Navigation


Annals of Occupational Hygiene Advance Access originally published online on December 21, 2005
Annals of Occupational Hygiene 2006 50(3):271-279; doi:10.1093/annhyg/mei073
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
50/3/271    most recent
mei073v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by BURSTYN, I.
Right arrow Articles by YASUI, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by BURSTYN, I.
Right arrow Articles by YASUI, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?


© 2005 British Occupational Hygiene Society Published by Oxford University Press


Original Article

Metamodels of bias in Cox proportional-hazards and logistic regressions with heteroscedastic measurement error under group-level exposure assessment

I. BURSTYN*, H-M. KIM, N. CHERRY and Y. YASUI

Department of Public Health Sciences, The University of Alberta, Edmonton, Canada

* Author to whom correspondence should be addressed. E-mail: Igor.Burstyn{at}ualberta.ca


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
In occupational epidemiology, group-based exposure assessment entails estimating the average exposure level in a group of workers and assigning the average to all members of the group. The assigned exposure values can be used in epidemiological analyses and have been shown to produce virtually unbiased relative-risk estimates in many situations. Although the group-based exposure assessment continues to be used widely, it is unclear whether it produces unbiased relative-risk estimates in all circumstance, specifically in Cox proportional-hazards and logistic regressions when between-worker variance is not constant but proportional to the true group mean. This question is important because (i) between-worker variance has been shown to differ among exposure groups in occupational epidemiological studies and (ii) recent theoretical work has suggested that bias may exist in such situations. We conducted computer simulations of occupational epidemiological studies to address this question and analysed simulation results using ‘metamodelling’. The results indicate that small-to-negligible bias can be expected to result from heteroscedastic between-worker variance. Cox proportional-hazards models can produce attenuated risk estimates, while logistic regression may result in overestimation of risk gradient. Bias caused by ignoring the heteroscedastic measurement error is unlikely to be large enough to alter the conclusion about the direction of exposure-disease association in occupational epidemiology.

Keywords: occupational epidemiology • ecological variable • log–log exposure-response model • variance components • computer simulation


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
Accurate exposure assessment has been shown to be important for the elucidation of exposure-disease relations in occupational epidemiology and is essential to valid risk assessment (Kromhout et al., 1999Go; Burstyn et al., 2003Go). The key to selecting an optimal exposure assessment strategy is based primarily on the understanding of measurement error structures of the exposure variable (Armstrong, 1990Go, 1996Go, 1998Go). Group-based exposure assessment has been practiced in occupational epidemiology under an assumption that grouping leads to Berkson-type measurement error (Berkson, 1950Go) and aims to minimize attenuation in estimating exposure-disease relationship. In group-based exposure assessment, the exposure level for a group of workers is approximated by an average exposure among randomly sampled members of that group (Preller et al., 1995Go; Kromhout et al., 1996Go; Burstyn and Kromhout, 2002Go). This group-based exposure assessment results in semi-ecological design, because exposures are accessed on ecological (group) level, while health outcome (disease) information is available for each individual (Richardson and Best, 2003Go). It has been demonstrated that for the simple case of continuous exposure variables, the group-based exposure assessment leads to virtually unbiased estimate of the association parameter when some measurements are available from all subjects (Tielemans et al., 1998Go). For dichotomous measures of health outcomes, it is now recognized that grouping of exposure measurements/assessments can produce virtually non-attenuated exposure–response relations at the expense of loss of power when disease under study is rare (Armstrong, 1998Go). For more common diseases, both logistic and Cox proportional-hazards regression can produce attenuated estimates of exposure–disease relations when group-based exposure assessment is used (even though approximate Berkson-type error is induced), and the attenuation is negligible only in logistic regression (Kim et al., 2005Go). These conclusions were obtained under the assumptions that measurement error is homoscedastic across groups and the risk of disease is related to logarithmic transformation of lognormally distributed exposure levels (e.g. measured exposure for chemicals inhaled and/or deposited on skin that are known to fit lognormal distributions). It is unclear whether these conclusions hold when measurement error variance changes according to the exposure levels of various occupational groups.

The examination of this heteroscedasticity issue is especially pertinent because there is growing evidence that, within some occupational cohorts, different groups of workers have distinct exposure variances and, at least in some instances, these variances are proportional to true group means of exposure. Rappaport et al. (1999)Go were the first to observe that different jobs (boilermakers, ironworkers, pipe-fitters and welder-fitters) within construction industry in the US can have different between-worker variances even though day-to-day variance component could be considered constant across jobs. Heteroscedasticity of styrene exposure has also been shown for various occupational groups within the Danish boat manufacturing industry (Kolstad et al., 2005Go) and in the European Carbon-Black Industry (van Tongeren et al., 2005Go). It was shown that biomarkers of exposure to mercury at a chloralkali plant in the US had distinct within-worker and between-worker variance components across four occupational groups (Symanski et al., 2001Go). An examination of a relation between group mean and variance components in the largest database of occupational exposure measurements ever assembled for research purposes (Kromhout et al., 1993Go) revealed a positive correlation between logarithmic mean and between-worker standard deviation of logarithms of inhalable dusts ({rho} = 0.34; P = 0.008; 61 groups) (Burstyn and Kromhout, 2004Go).

We will restrict this paper to the consideration of heteroscedastic between-worker exposure variance and its implications for epidemiological analyses with Cox proportional-hazards or logistic regression. This restriction is motivated, in part, by the results indicating that attenuation due to measurement error is primarily related to between-worker variance when group-based exposure assessment is used, large number of workers are sampled per group, and the within-worker and between-worker variances do not differ by groups (Kim et al., 2005Go). More specifically, we will consider a situation where between-worker variance is proportional to true group means. This is the simplest, yet plausible, scenario in which heterogeneity of between-worker variance among exposure groups may arise, and is similar (though not identical) to the heteroscedastic additive measurement error problem considered by Deddens and Hornung (1994)Go. Deddens and Hornung (1994)Go and, subsequently, Steenland et al. (2000)Go examined a situation with Berkson-type error the variance of which is proportional to the observed exposure level, without considering between-worker and within-worker variances. Because of challenges encountered in developing a theoretical solution of the problem, we will, for now, study it through simulations and metamodels.

Metamodels are used in engineering to empirically explore the behaviour of computer-simulated systems, assess their sensitivity to input values, and make predictions in the absence of a theoretically-driven model (Kleijnen, 2005Go). This can be a useful approach in applied statistics when an exact mathematical solution to a problem is either intractable or difficult to obtain, but computer simulations can be conducted. It operates within a conceptual framework that enables an investigator to analyse results of computer simulations as they would model and interpret experiential data in the absence of theory. In metamodelling, simulation parameters are viewed as experimental conditions set by the investigator and the outcome of each simulation run (simulation realization) is considered to be equivalent to experimentally obtained observation. Each simulation realization within a simulation can be viewed as a replication of an experimental condition. We apply a metamodel approach to study the impact of between-worker variances that are proportional to true group means of exposure.

Exposure (measurement error) model
The following exposure model for lognormal exposure measurements is commonly employed in occupational epidemiology. A log-transformed exposure measurement (Ygij) is assumed to satisfy the following model:

Formula 1(1)
where g denotes groups (1, ... , G), i indicates workers within each group (1, ... , Kg) and j specifies repeated exposure measures of each worker (1, ... , Ngi). In this model, µg is the fixed parameter representing group mean exposure, {gamma}i(g) are the random effects for ith worker nested in gth group that are normally distributed with zero means and variances proportional to group mean Formula 1 (c is a constant referred to as a ‘heteroscedasticity parameter’ below; c ≥ 0 and µg ≥ 0 to ensure that Formula 1), and {varepsilon}gij is the random error term for jth day measurement on ith worker from gth group that is normally distributed with mean zero and variance Formula 1. It is assumed that {gamma}i(g) and {varepsilon}gij are mutually independent.

Cox proportional-hazards and logistic models
A hazard function depends on both the survival time and a set of covariates. The proportional-hazards model (without time-dependent covariate) separates these components by specifying the hazard rate at survival time t of ith subject as

Formula 2(2)
where h0(t) is the baseline hazard function and {alpha} is the association parameter of interest.

Logistic regression takes the following form:

Formula 3(3)
where Zgi is a binary health outcome and logit(w) = log{w/(1 – w)}. The baseline log-odds of health outcome are represented by the intercept parameter ß0. Note that the association parameter in logistic regression (ß) is not necessarily the same as the association parameter ({alpha}) in the Cox proportional-hazards model (2).

Simulations: methods and results
In order to study bias in logistic and Cox proportional-hazards models with group-based exposure assessment when between-worker variance is directly proportional to true group means, the following simulation studies were performed. We considered a hypothetical occupational cohort study with time invariant exposure that has five exposure groups, each group containing 1000 subjects and 100 exposed time periods (e.g. days, months) for each subject. We further assumed that disease risk depends only on exposure intensity, not duration of exposure. Finally, we assumed that measurements of exposure for a sample of k workers in each group are obtainable (k = 10, 50, 100) and that measurements from two different time periods will be available for each sampled worker. This is typical for an occupational exposure database that one might be able to obtain or construct for a given study.

The distribution of exposures was assumed to be lognormal with means (of log-transformed exposures) 1.1, 2.1, ... , 5.1 for the 1st to the 5th groups, respectively, with the heteroscedasticity parameter taking on values of 0, 0.1, 0.3, 0.5, 0.7 and 1.0, which produces between-worker variances with values between 0 and 5.1, and with within-worker standard deviation ({sigma}W) taking on values of 0, 0.5, 1.5 and 3. We chose the variance components on the basis of papers by Kromhout and co-workers, (Kromhout et al., 1993Go; Kromhout and Vermeulen, 2001Go) which summarized the between-worker and within-worker variance components generally observed for occupational exposures.

The true association parameter was set to 0.2, 0.4 or 0.6 for both logistic (ß) and Cox proportional-hazards ({alpha}) regression models, and –4 was used as the intercept parameter in logistic model (ß0). In the Cox proportional-hazards model, we set the constant baseline hazard [h0(t)] at 0.01 and assume that there is no time-dependence with exposures in order to make the survival functions in the logistic and Cox models approximately equivalent.

For each simulation with a fixed set of input parameters, 1000 simulation realizations were obtained. A Bernoulli random variable was generated using RANBIN function in SAS software (version 8) for the disease status in the logistic regression model with the true exposure covariates (µgi) determining the probability of the disease for each subject (either 1, diseased, or 0, healthy). In the case of Cox proportional-hazards models, the survival times were generated using RANEXP function in SAS software (version 8) as exponential random variables with subject-specific constant hazards 0.01 x exp({alpha}µgi); no censoring was permitted. The association parameters were estimated using PROC LOGISTIC (for logistic model: ß*) and PROC PHREG (for Cox proportional-hazards model: {alpha}*) procedures of SAS software (version 8), respectively, using the group-based exposure assessment procedure (i.e. assigning the observed group-mean exposure to all members of the group estimated from a sample of k workers whose exposures were measured on two randomly chosen occasions).

Bias was estimated as the ratio of the association parameter estimate ({alpha}*, ß*) obtained with the group-based exposure assessment strategy to the true parameter value. The ratios of the estimates based on the group-based exposure assessment strategy to those based on the true subject-specific exposures were calculated (not shown in the paper): they showed very similar behaviour to the ratios to the true parameter values.

Selected simulation results indicate that the extent of bias in estimates of association parameters depends both on heteroscedasticity parameter and true risk (Fig. 1). These associations have different trends for logistic and Cox proportional-hazards models. For Cox proportional-hazards models, it would appear that the extent of attenuation increases as the heteroscedasticity of between-worker variance increases and that the effect is more pronounced for larger true associations. However, for logistic regression, we observed positive bias in association-parameter estimates, especially when true associations are weak; the extent of bias increases with the degree of heteroscedasticity in between-worker variance. When the number of sampled workers per group is large, the observed association parameter is closer to true value in Cox proportional-hazards regression, but the reverse holds for logistic regression (Fig. 2): paradoxically, a smaller sample size in logistic regression leads to less biased estimates of association parameters. When the number of sampled workers in each group is large, within-worker variability has no effect on observed bias. However, when sample size is small, within-worker variability leads to increased attenuation in the association-parameter estimates (Fig. 3). Overall, Cox proportional-hazards regression appears to be more sensitive to the heteroscedasticity parameter and shows a tendency towards underestimation of the true association. Bias in logistic regression can be in either direction, but remains small in magnitude. As expected, when there is no between-worker variability in exposure groups (c = {sigma}2B(g) = 0), logistic and Cox proportional-hazards models yield almost identical unbiased results.


Figure 1
View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1. The impact of heteroscedasticity parameter (c) and the true association parameter 0.2 (solid), 0.4 (dot-dash) and 0.6 (dash) on the observed bias when the within-worker standard deviation is moderate ({sigma}W = 1.5) and the number of sampled subjects per group is large (k = 100) for logistic and Cox proportional-hazards models; results of simulation studies (see text for details).

 

Figure 2
View larger version (9K):
[in this window]
[in a new window]
 
Fig. 2. The impact of heteroscedasticity parameter (c) and different numbers of sampled worker per group [k = 100 (solid), 50 (dot-dash) and 10 (dash)] on the observed bias with moderate within-worker standard deviation {sigma}W = 1.5, given that true association parameter is 0.4, for logistic and Cox proportional-hazards models; results of simulation studies (see text for details).

 

Figure 3
View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3. The impact of heteroscedasticity parameter (c) and the within-worker standard deviation {sigma}W is 0.5 (solid), 1.5 (dot-dash) and 3 (dash) on the observed bias when 10 subjects were sampled per group (k) and given that true association parameter is 0.4 for logistic and Cox proportional-hazards models; results of simulation studies (see text for details).

 
Constructing and validating metamodels: methods
For Cox proportional-hazards and logistic regressions separately, we fit the following metamodel to simulation realizations:

Formula 4(4)

That is, we tested whether bias (ß*/ß or {alpha}*/{alpha}) is affected by (fixed effects of):

  • association parameter (ß or {alpha});
  • within-worker standard deviation ({sigma}W);
  • number of sampled workers from each group (k);
  • cubic polynomials of heteroscedasticity parameter (c); and
  • interaction(s) of ß (or {alpha}), {sigma}W, k and c (vector I),
with a random variation between ({varepsilon}bs) and within ({varepsilon}ws) simulations. Random terms are assumed to be mutually independent, and have normal distributions with zero means. All possible interactions among fixed effects were considered.

Linear mixed effects models were used to evaluate parameters of metamodels using maximum likelihood method in PROC MIXED (SAS version 8). First, we considered a model with all possible fixed effects including interactions and retained only those with P < 0.05 in the final model by backward elimination. Assumptions of the final model were assessed using plots of the residuals.

Cross-validation was used to assess the fit and validity of the final metamodels, as recommended by Kleijnen (2005)Go. A set of simulations (corresponding to one of the 207 unique sets of simulated conditions) was removed one at a time and parameters of the final metamodel re-estimated. The metamodel constructed on the reduced data-set was used to generate predictions for the removed simulation. These predictions were obtained using only estimates of fixed effects in the model. The procedure was repeated 207 times (i.e. for each set of simulations) and enabled us to examine the agreement between metamodels' predictions and simulation realizations (and respective means).

Constructing and validating metamodels: results
We were successful in developing a valid metamodel for predicting bias in Cox proportional-hazards models in our simulations: as can be seen from Fig. 4, there was an excellent agreement between simulation realizations and metamodel predictions generated during cross-validation (slope = 0.998; correlation = 0.853). The metamodel explained 98% [(0.0221–0.0005)/0.0221] of between-simulation variance of bias. The metamodel for bias in Cox proportional-hazards regression seems to account for the observed patterns in simulation.


Figure 4
View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4. Cross-validation of the metamodel of bias of association parameter in Cox proportional-hazards regression (207 simulations each with 1000 realizations).

 
The final metamodel parameters for Cox proportional-hazards regression are presented in Table 1. The cubic polynomials of the heteroscedasticity parameter were retained in the final model because they produced an improvement in model fit compared with having a quadratic term of the heteroscedasticity parameter. The relation between heteroscedasticity parameter and bias is complex, but generally, greater heteroscedasticity produces larger attenuation. The final metamodel suggests that attenuation in Cox proportional-hazards regression increases as the association parameter becomes larger, but this effect depends on the degree of heteroscedasticity in between-worker variability. There was also an interaction between the number of sampled workers and within-worker standard deviation. It would appear to make sense for attenuation to increase as the within-worker variability gets larger and for this effect to be counter-acted by larger samples of monitored workers. To apply metamodel to predict bias in Cox proportional-hazards regression, when (for example) {alpha} = 0.4, c = 0.1, k = 50 and {sigma}W = 0.5, we simply plug in values into the regression equation described in Table 1 (model input in bold font): {alpha}*/{alpha} = 1.043 – 0.142*0.4 – 0.316*0.1 + 0.383*0.12 – 0.125*0.13 + 0.0002*50 – 0.025*0.5 – 0.702*0.4*0.1 + 0.0002*50*0.5 = 0.933. If heteroscedasticity parameter was to increase to 0.5 (5-fold from 0.1) then the attenuation predicted by metamodel would also increase to 0.770 (i.e. great underestimation of true association parameter). This illustrates that under conditions considered in previous calculation, greater heteroscedasticity leads to greater attenuation.


View this table:
[in this window]
[in a new window]
 
Table 1. Equation for predicting ratio of observed to true association parameter in Cox proportional-hazards regression ({alpha}*/{alpha}) in a semi-ecological study when between-worker (within-group) variance is directly proportional to group mean ({sigma}B2 = cµg), but within-worker variance ({sigma}W) is homoscedastic; 207 simulations each repeated 1000 times (n = 207 000); five group means {1.1, 2.1, 3.1, 4.1, 5.1}, two repeats per subject; residual between-simulation variance: 0.0005; residual within-simulation variance: 0.0074

 
It was more difficult to develop a valid metamodel for predicting bias in logistic regression in our simulations. In cross-validation, the correlation between simulation realizations and predicted mean bias was poor (~0.2), but improved dramatically when we considered only means of simulation realizations: slope = 1.000 and correlation = 0.822 (Fig. 5). For a metamodel to be valid it is sufficient that there exists a good agreement between means of predicted and observed values (Kleijnen, 2005Go). Part of the challenge may come from the fact that there was much more within-simulation variability in logistic compared with Cox proportional-hazards models: 0.07 versus 0.03 (from metamodels with only random effects). Conversely, there was much less between-simulation variability in logistic (0.002) than Cox proportional-hazards models (0.022), making it easier to explain between-simulation differences in the metamodel of bias in Cox proportional-hazards model. Nonetheless, the metamodel of bias in logistic regression explained 65% [(0.002 – 0.0007)/0.002] of between-simulation variance. This information, combined with result of cross-validation (Fig. 5) lends credibility to the metamodel of bias in logistic regression. The metamodel for bias in logistic regression seems to account for the observed patterns in simulation.


Figure 5
View larger version (11K):
[in this window]
[in a new window]
 
Fig. 5. Cross-validation of the metamodel of bias of association parameter in logistic regression: points represent observed and predicted means of 207 simulations each with 1000 realizations.

 
The final metamodel parameters for logistic regression are shown in Table 2. They include the same interaction terms as the metamodel for bias in Cox proportional-hazards regression and similar effects of sample size and within-worker variability. The main striking difference is that bias appears to be independent of the association parameter except through its interaction with the heteroscedasticity parameter. (NB: The estimate of the effect of the association parameter per se is not shown in Table 2, because it was not significant in the final metamodel for logistic regression, bß = 0.007, standard error (bß = 0.02, P = 0.7, while other estimates were the same as shown in Table 2.) In the metamodel of bias in logistic regression, the heteroscedasticity parameter itself has the opposite effect than in Cox proportional-hazards models: to create positive bias. However, this is counter-balanced by the negative interaction between heteroscedasticity and association parameters. To apply the metamodel to predict bias in logistic regression, when (for example) ß = 0.4, c = 0.1, k = 50 and {sigma}W = 0.5, we simply plug in values into the regression equation described in Table 2 (model input in bold font): ß* = 0.995 + 0.128*0.1 + 0.0002*50 – 0.028*0.5 – 0.248*0.4*0.1 + 0.0003*50*0.5 = 1.001. If the heteroscedasticity parameter was to increase to 0.5 (5-fold from 0.1) then positive bias predicted by the metamodel would also increase to 1.013 (i.e. greater overestimation of true association parameter). This illustrates that under conditions considered in previous calculation, greater heteroscedasticity leads to greater positive bias.


View this table:
[in this window]
[in a new window]
 
Table 2. Equation for predicting ratio of observed to true association parameter in logistic regression (ß*/ß) in a semi-ecological study when between-worker (within-group) variance is directly proportional to group mean ({sigma}B2 = cµg), but within-worker variance ({sigma}W) is homoscedastic; 207 simulations each repeated 1000 times (n = 207 000); five group means {1.1, 2.1, 3.1, 4.1, 5.1}, two repeats per subject; residual between-simulation variance: 0.0007; residual within-simulation variance: 0.0266

 
The effects of sample size (k) and within-worker variability ({sigma}W) per se on bias are the same in both Cox proportional-hazards and logistic regression: to decrease the value of bias (ratio of estimate to true parameter, which is not the same as making estimate less biased) (Tables 1 and 2). However, these factors act against the background of heteroscedasicity that has opposite effects in the two models. Therefore in the case of Cox proportional-hazards, attenuated owing to heteroscedasicity, these factors further increase attenuation, while in the case of positively biased logistic regression they the move observed estimate closer to the true value.

Application of metamodels
Metamodels similar to the ones we developed can be constructed as part of the sensitivity analysis of an epidemiological study. We applied the resulting metamodels for bias in Cox proportional-hazards and logistic regressions to a situation observed by Burstyn and Kromhout (2004)Go in a large database of occupational dust exposures, in which the mean ration of between-worker logarithmic variance to group logarithmic mean was ~0.3. In addition, we imagined that in a hypothetical epidemiological study there was a linear relation between logarithm of dust exposure and health risk, five exposure groups, and that true association parameter (e.g. log-odds per unit of exposure) was in the range of 0.4–0.6. In practice, one may wish to use estimates of the association parameter obtained from the study data or from previous results (e.g. from a literature review) to guess the neighbourhood within which the true association parameter may lie.

The predicted biases in Cox proportion-hazards regression for different plausible scenarios are illustrated in Table 3. It is apparent that the metamodel predicts negligible attenuation when true risk is low, but the attenuation becomes more pronounced when the association parameter is large and the number of sampled workers small. Thus, we can expect to observe association parameter of only 0.435 (= 0.6 x 0.725) when true association parameter is 0.6 and only 10 workers were sampled per group. The measurement error model reflected in the metamodel would result in attenuated association parameter of 0.472 (= 0.6 x 0.787) even if 10 times more workers were monitored. We predicted neither overestimation nor the reversal of the direction of association: consider the upper confidence limits (<1) and the sign (positive) of the predicted bias.


View this table:
[in this window]
[in a new window]
 
Table 3. Predictions of metamodel for Cox proportional-hazards regression assuming five exposure groups, logarithmic within-worker standard deviation of 2, and a ratio of logarithmic between-worker variance to logarithmic group mean of 0.3; k = number of workers sampled on two random days per group

 

View this table:
[in this window]
[in a new window]
 
Table 4. Predictions of the metamodel for logistic regression assuming five exposure groups, logarithmic within-worker standard deviation of 2, and a ratio of logarithmic between-worker variance to logarithmic group mean of 0.3; k = number of workers sampled on two random days per group

 
The predicted biases in logistic regression for different plausible scenarios are illustrated in Table 4. It is apparent that the metamodel predicts negligible attenuation when the number of workers sampled in each groups is small. However, when the number of sampled workers increases, so does the likelihood that the association parameter will be overestimated. The effect is more pronounced when true association parameter is small. In this typical example, it is predicted that the estimated association parameter will correctly assess the direction of the association.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
This paper focused on investigating the amount and direction of bias in Cox proportional-hazards and logistic regressions that arises in semi-ecological studies (exposures assessed on group-level, but health outcome determined individually) in occupational epidemiology when the magnitude of between-worker variance is proportional to true mean exposure intensity in each exposure group. In simulations, we studied the dependence of bias on the extent of heteroscedasticity of between-worker variance, the magnitude of within-worker variance, the number of workers sampled from each group and the value of true association parameter. The simulation parameters we considered can be used to predict bias in the association parameters. We are encouraged by this finding and will use it to develop theoretical understanding of the observed patterns. Metamodels provide guidance to the general features that the theoretical solution to the problem may have, e.g. interaction between heteroscedasticity and true association parameters.

Limitations inherent in our work are typical of the simulations studies that are conducted in the absence of theory. Although we have attempted to conduct simulations over a wide range of possible parameters, it is possible that particular study conditions not considered in our investigation exist and that they would lead to an outcome not anticipated by conclusions drawn from simulations. For example, we rather optimistically assumed that exposure in each group will be assessed on the basis of the sample sizes of equal magnitude. Furthermore, in every practical situation, adjustment for confounding and effect modification must be considered, and if such covariates are measured with error they may affect or alter our conclusions. These considerations limit the generalizability of our findings.

The lack of theoretical grounding for the observed patterns makes it more difficult to interpret the results. It is still possible that some of the observed results are due to chance, although the fact that we obtained a large number of simulation realizations for each simulation and the fact that the cross-validation result was good should limit this possibility. Statistical significance of the parameters in metamodels depends on the size of the simulation study. By conducting a large simulation study we tried to ensure that we did not miss any important associations by overly relying on significance testing; none of the metamodels' parameters were excluded from the final model because they were ‘borderline significant’.

These metamodels focused on the ratios of the estimate of the association parameter to its true value. Consequently, the applications of our metamodels to adjusting naïve association-parameter estimates require knowledge of true association parameter. We adopted this approach because our goal was to study the extent and the direction of bias, instead of developing procedures to adjust for bias.

Despite these limitations, we believe that our study has practical implications. The findings indicate that for realistic situations, when a log–log exposure-response model is applicable in occupational epidemiology (Steenland and Deddens, 2004Go) and between-worker variance is proportional to true group mean, small to negligible bias can be expected to result from application of group-based exposure assessment with naïve epidemiological analysis. Cox proportional-hazards regression produces attenuated association-parameter estimates, while logistic regression can result in overestimation of the parameter. It appears unlikely that the conclusions about the direction of association will be erroneous when the heteroscedastic measurement error that we have studied is ignored in analysis.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 
We wish to thank Drs Ben Armstrong and James A. Deddens for their helpful comments. The financial support by a grant from the Canadian Cancer Etiology Research Network (http://www.ccern.org/) is gratefully acknowledged.

Received August 10, 2005; in final form November 1, 2005


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DISCUSSION
 ACKNOWLEDGEMENTS
 REFERENCES
 

Armstrong BG. (1990) The effects of measurement errors on relative risk regression. Am J Epidemiol; 132: 1176–84.[Abstract/Free Full Text]

Armstrong BG. (1996) Optimizing power in allocating resources to exposure assessment in an epidemiologic study. Am J Epidemiol; 144: 192–7.[Abstract/Free Full Text]

Armstrong BG. (1998) Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med; 55: 651–6.[Abstract/Free Full Text]

Berkson J. (1950) Are there two regressions? J Am Stat Assoc; 45: 164–80.[CrossRef][Web of Science]

Burstyn I, Kromhout H. (2002) The Babel of multicenter exposure assessment. Ann Occup Hyg; 46: 649–52.[Free Full Text]

Burstyn, I, Kromhout H. (2004) Is there a correlation between variability and level of exposure that should be taken into account in occupational epidemiology? Het Tijdschrift voor Toegepaste Arbowetenschap; suppl. 2: 57.

Burstyn I, Boffetta P, Kauppinen T et al. (2003) Performance of different exposure assessment approaches in a study of bitumen fume exposure and lung cancer mortality. Am J Ind Med; 43: 40–8.[CrossRef][Web of Science][Medline]

Deddens J, Hornung RW. (1994) Quantitative examples of continuous exposure measurement errors that bias risk estimates away from the null. In Smith C, Christiani D, Kelsey K, editors. Chemical risk assessment of occupational health, Auburn, London. pp. 77–85.

Kim H-M, Yasui Y, Cherry N et al. Attenuation in risk estimates in logistic and Cox proportional-hazards models due to group-based exposure assessment strategy. SER-CSEB meeting, June 27–30, 2005, Toronto, Canada. 2005.

Kleijnen JPC. (2005) An overview of the design and analysis of simulation experiments for sensitivity analysis. Eur J Oper Res; 164: 287–300.[CrossRef]

Kolstad HA, Sonderskov J, Burstyn I. (2005) Company-level, semi-quantitative assessment of occupational styrene exposure when individual data are not available. Ann Occup Hyg; 49: 155–65.[Abstract/Free Full Text]

Kromhout H, Vermeulen R. (2001) Temporal, personal and spatial variability in dermal exposure. Ann Occup Hyg; 45: 257–73.[Abstract/Free Full Text]

Kromhout H, Symanski E, Rappaport SM. (1993) A comprehensive evaluation of within-and between-worker components of occupational exposure to chemical agents. Ann Occup Hyg; 37: 253–70.[Abstract/Free Full Text]

Kromhout H, Tielemans E, Preller E et al. (1996) Estimates of individual dose from current measurements of exposure. Occup Hyg; 3: 23–39.

Kromhout H, Loomis D, Kleckner RC. (1999) Uncertainty in the relation between exposure to magnetic fields and brain cancer due to assessment and assignment of exposure and analytical methods in dose–response modeling. Ann NY Acad Sci; 895: 141–55.[CrossRef][Web of Science][Medline]

Preller L, Kromhout H, Heederik D et al. (1995) Modeling long-term average exposure in occupational exposure–response analysis. Scand J Work Environ Health; 12: 504–12.

Rappaport SM, Weaver M, Taylor D et al. (1999) Application of mixed models to assess exposures monitored by construction workers during hot processes. Ann Occup Hyg; 43: 457–69.[Abstract/Free Full Text]

Richardson S, Best N. (2003) Bayesian hierarchical models in ecological studies of health-environment effects. Environmetrics; 14: 129–47.[CrossRef]

Steenland K, Deddens JA. (2004) A practical guide to dose–response analyses and risk assessment in occupational epidemiology. Epidemiology; 15: 63–70.[CrossRef][Web of Science][Medline]

Steenland K, Deddens J, Zhao S. (2000) Biases in estimating the effect of cumulative exposure in log-linear models when estimated exposure levels are assigned. Scand J Work Environ Health; 26: 37–43.[Web of Science][Medline]

Symanski E, Sallsten G, Chan W et al. (2001) Heterogeneity in sources of exposure variability among groups of workers exposed to inorganic mercury. Ann Occup Hyg; 45: 677–87.[Abstract/Free Full Text]

Tielemans E, Kupper LL, Kromhout H et al. (1998) Individual-based and group-based occupational exposure assessment: Some equations to evaluate different strategies. Ann Occup Hyg; 42: 115–9.[Abstract/Free Full Text]

van Tongeren M, Burstyn I, Kromhout H et al. (2005) Are variance components of exposure heterogeneous between time periods and factories in the European Carbon Black Industry? Ann Occup Hyg. Advance Access published August 26, 2005, doi:10.1093/annhyg/mei041.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Occup. Environ. Med.Home page
I. Burstyn, H. Kromhout, C. Johansen, S. Langard, T. Kauppinen, J. Shaham, G. Ferro, and P. Boffetta
Bladder cancer incidence and exposure to polycyclic aromatic hydrocarbons among asphalt pavers
Occup. Environ. Med., August 1, 2007; 64(8): 520 - 526.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
50/3/271    most recent
mei073v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by BURSTYN, I.
Right arrow Articles by YASUI, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by BURSTYN, I.
Right arrow Articles by YASUI, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?