Skip Navigation


Annals of Occupational Hygiene Advance Access originally published online on May 2, 2006
Annals of Occupational Hygiene 2006 50(6):623-635; doi:10.1093/annhyg/mel021
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
50/6/623    most recent
mel021v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by KIM, H.-M.
Right arrow Articles by BURSTYN, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by KIM, H.-M.
Right arrow Articles by BURSTYN, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?


© The Author 2006. Published by Oxford University Press on behalf of the British Occupational Hygiene Society

Attenuation in Risk Estimates in Logistic and Cox Proportional-Hazards Models due to Group-Based Exposure Assessment Strategy

HYANG-MI KIM, YUTAKA YASUI and IGOR BURSTYN*

Department of Public Health Sciences, The University of Alberta Canada

*Author to whom correspondence should be addressed. Tel: +1-780-492-3240; fax: +1-780-492-9677; e-mail: igor.Burstyn@ualberta.ca


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
In occupational epidemiology, it is often possible to obtain repeated measurements of exposure from a sample of subjects (workers) who belong to exposure groups associated with different levels of exposure. Average exposures from a sample of workers can be assigned to all members of that group including those who are not sampled, leading to a group-based exposure assessment. We discuss how this group-based exposure assessment leads to approximate Berkson error model when the number of subjects with exposure measurements in each group is large, and how the error variance approximates the between-worker variability. Under the normality assumption of exposures and with moderately large number of workers in each group, there is attenuation in the estimate of the association parameter, the magnitude of which depends on the sizes of the between-worker variability and the true association parameter. Approximate equations for attenuation have been derived in logistic and Cox proportional-hazards models. These equations show that the attenuation in Cox proportional-hazards models is generally more severe than in logistic regression. Furthermore, when the between-worker variability is large, our simulation study found that the approximation by equation is poor for the Cox proportional-hazards model. If the number of subjects is small, the approximation does not hold for either model.

Keywords: Berkson type error structure • between and within-worker variability • bias • ecological variable • epidemiology • homogenous error


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
Appreciation for the effect of errors in exposure variables on the interpretation of epidemiological studies lies at the heart of modern epidemiology. In assessing occupational exposures, a group-based exposure assessment strategy is commonly used. In this strategy, subgroups of workers are constructed based on their job-titles, tasks or other features of the work environment; a sample of workers is measured within these groups on repeated days; and the group means are used as an estimate of exposure for all members of the corresponding groups. It is often necessary to employ a group-based strategy because study populations are typically large and it is impossible or not feasible to take measurements from individual workers whose health status is being studied. An early example of this strategy was a case–control study of brain cancer and leukemia in electric utility workers in which investigators collected repeated measurements on workers from groups expected to have different exposure levels (Kromhout et al., 1995).

It is recognized that non-differential errors in individual exposure variables lead to attenuation of exposure–response associations and that the use of grouped exposure data substantially eliminates the attenuation. Conventionally we deem it advantageous even though the group-based strategy results in larger standard errors compared with the individual-based strategy (Armstrong, 1990, 1998; Tielemans et al., 1998), but gains in accuracy (allowing small bias) often offset losses due to imprecision (Sexias and Shappard, 1996). This has been convincingly demonstrated where both exposure and outcome can be assumed to have normal distributions, and thus are amenable to ordinary least square regression analysis (Tielemans et al., 1998). Subsequently, it is often supposed that the association parameter estimates are unbiased in logistic and Cox proportional-hazards models when group-based exposure assessment is used.

However, it has been suggested that there may be attenuation in estimating association parameters with the group-based strategy in logistic and Cox proportional-hazards models even when the measurement error variance is homogeneous across groups. The same magnitude of attenuation for both models has been shown when the measurement error variance is heterogeneous (i.e. the variance of error depends on the observed value) (Deddens and Hornung, 1994).

Burr (1988), Reeves et al. (1998), and Heid et al. (2002) showed that probit-regression association parameter estimates, which approximate logistic-regression estimates, are asymptotically biased downward from the true association parameter when error structure is of the Berkson type. This indicates that even though the group means are known exactly, which is a true Berkson error, there is attenuation in the estimation of parameters for the logistic model. However, the attenuation is considered to be negligible if the error variance is small. It remains to be determined whether the group-based strategy gives attenuation in estimating association parameters in logistic and Cox proportional-hazards models with a homogenous error structure.

This article provides an approximate description of the behavior of association estimates with the group-based exposure assessment strategy in the logistic and Cox proportional-hazards models. First, we show the circumstances under which this strategy leads to the Berkson error structure and demonstrate that the exposure measurement error variance approximates the between-worker variability. Second, we confirm that there is attenuation in association estimates in logistic and Cox proportional-hazards models when the group-based exposure assessment is used. We then derive an approximate relationship for the attenuation between the association parameters, based on true exposures and observed exposures, as a function of the between-worker variance and the true association parameters in the two regression models. These results and their limitations are explored in simulations. The overall objective of the article is to better understand how group-based exposure assessment influences association estimates in a typical case–control or cohort study in occupational epidemiology.


    Methods and results
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
Exposure model
In occupational epidemiology, the log-transformed exposure level (Ygij) is commonly assumed to satisfy the model

Formula 1(1)
where g denotes groups (1, ... ,G), i denotes workers (1, ... , Kg) and j denotes repeated measures (1, ... , Ngi). In this model, µg is a fixed parameter representing the true group mean exposure, {gamma}gi is the random effect for i-th worker in g-th group that is normally distributed with a mean zero and between-worker variance Formula 1, and {varepsilon}gij is the random error term for j-th (day) level on i-th worker in the g-th group that is normally distributed with a mean zero and within-worker variance Formula 1. The error variance of the repeated observations provides the degree of measurement error in the classical error model. It is assumed that {gamma}gi and {varepsilon}gij are mutually independent, and by letting µgi = µg + {gamma}gi, we have the model as

Formula 1

Derivation of a Berkson error model
In the group-based exposure assessment strategy, an average exposure ( Formula 1) for a group g is taken to apply to all workers in the group (e.g. with the same job title), where Formula 1 and Formula 1, Kg = k is the number of workers, and Ngi = n is the number of repeated measurement of each worker in a randomly selected sample. For each worker, this group mean Formula 1 is an approximation of his/her true exposure (µgi).

The conditional expectation of true exposures given observed exposures is

Formula 2(2)
where Formula 2. The derivation is made under the assumption of classical measurement error model (Appendix 1). If the number of workers in each group is sufficient for the true mean and the estimated group mean to be close in value, that is, Formula 2, then we have

Formula 3(3)
This approximation may depend on the sample selected, but we need only a moderately large sample size to obtain this property, where the true exposure of each worker randomly varies about the group mean Formula 3 and its mean is approximately the group mean. The situation is analogous to the Berkson error model, where the true exposure given the observed exposure has an expected value equal to the observed exposure.

The Berkson error model was originally proposed for experimental situations, in which the experimenter attempts to set a variable at a target quantity, but because of an imprecise control its true value may be randomly distributed around the target value, which is the observed value of the variable (Berkson, 1950). If the experiment is replicated many times with the same target value, the true value will be randomly distributed with an estimated mean value approaching the target value, and the errors are assumed to be independent of the target value. By showing an approximate property Formula 3, we postulate a Berkson type error model with the assigned group mean (Formula 3) and true exposures (µgi), i.e.

Formula 4(4)
where one can derive that cov(Formula 4, egi) = 0 and cov(µgi,egi) = Vgi|Formula 4) != 0 (Appendix 2). We note that the condition of sufficiently large sample size in the classical model leads to the conditional expectation of the error given the group mean that is close to 0 in the Berkson error model. Also, the model (4) is not a truly Berkson error model since cov(Formula 4,egi) = 0 does not imply that the observed value and the error are independent, which is required for truly Berkson error model. We re-postulate the model as Berkson type by selecting a moderately large sample of workers.

The error variance under the Berkson error model
We next demonstrate a relationship between the error variance and the between-worker variance under the Berkson error model, equation (4). The variance of the true exposure conditioned on the observed value is the variance of the error term egi, Vgi|Formula 4) = V(egi) (Formula 4), and the conditional variance is given by

Formula 5(5)
using the property that cov(Formula 5gi) = V(Formula 5) (Appendix 2). This equation implies that the error variance is approximately equal to the between-worker variance when the number of sampled workers (k) is sufficiently large.

The effect of group-based exposure assessment on logistic and Cox regression: approximate equations and simulations
Attenuation in logistic model
Logistic regression takes the following form,

Formula 6(6)
where Zgi is a binary variable for health outcome and {Lambda}(w) = 1/[1 + exp(–w)]. The logistic regression model and the probit regression models agree closely over a range of values of the predictor variable (µgi) with an adjusting value c:

Formula 6
where 0.1 < p < 0.9, c = 0.588 and {Phi}(t) is the cumulative density function of the standard normal distribution (McCullagh and Nelder, 1983). When we have a Berkson error structure with normality in the exposure variable [i.e. the conditional (exposure) variable µgi|Formula 6 is normally distributed with mean Formula 6 and variance Formula 6], using the probit regression model, it can be shown that

Formula 7(7)
where Formula 7 and Formula 7 denote the intercept and the association parameters, respectively, based on the group exposure values. These are a function of error variance in exposure variable, Formula 7, i.e.

Formula 7
See Appendix 3 and Burr (1988), Reeves et al. (1998) and Heid et al. (2002) for details.

It follows that when the measurement error variance is constant, even in the presence of a Berkson error structure, the logistic regression association parameter is biased downward from the true value ß1. With the group-based strategy, the approximate relationship, between the association parameters based on the true values and the observed values, is given by

Formula 8(8)
and since the error variance (Formula 8) approximates the between-worker variance (Formula 8), the association parameter ( Formula 8) is also biased downward.

Attenuation in Cox proportional-hazards model
In general, the hazard function depends on both time and a set of covariates. The proportional-hazards model separates these components by specifying that the hazard rate for survival time of a subject is

Formula 9(9)
where h0(t) is the baseline hazard function when the true covariates value µgi = 0. The association parameter, {alpha}, is not necessarily the same as the association parameter (ß1) in the logistic model (Green and Symons, 1983). When the baseline hazard is a constant, {lambda}, then

Formula 9
holds (Prentice, 1982). Under the normality of the conditional distribution µgi| Formula 9 with the mean Formula 9 (the property of the Berkson error structure) and variance Formula 9, the expected Cox proportional hazard function is given (Deddens and Hornung, 1994) as

Formula 10(10)
One approach to finding the relationship between association parameters based on the true and observed exposures is to use the approximate equivalence of survival functions between the logistic and Cox proportional-hazards models (Appendix 4).

However, we need to be careful with this approach because two approximations are used: one linking the logistic and probit models, which requires that the probability of diseases should lie between 0.1 and 0.9, and the other linking the logistic and Cox proportional-hazards models, which requires a condition that neither the cumulative incident over the follow-up period nor the combined effects of the covariate can be too large: Formula 10 and ß0 = ln({lambda}T). Therefore, the association parameters in logistic model will be approximately equal to that of Cox proportional-hazards model when the baseline hazard is constant with low risk factor and when the follow-up period is not too long, so that the probability of disease is neither rare nor too common.

Taylor series expansion is a power expansion up to order n of a function at a point and is used to approximate a function by a series. The parameters of each survival function are a function of the error variance, i.e. Formula 10 = function (Formula 10) and {alpha}(Formula 10) equals; function (Formula 10). If there is no measurement error in exposure, the error variance is 0, and the parameter estimated with the observed values is equal to the true parameter. We expand both parameters at the point, Formula 10, in the first degree of the Taylor series and then let the coefficients of the series in the models be equal to find a relationship between parameters given the true exposures and the observed values with measurement error.

That is, by using (i) differentiation with respect to the observed exposure when it is zero and (ii) the Taylor series expansion of both survival functions [based on the equations (7) and (10)], around the error variance of zero, a relationship between the two parameters in a certain range of Formula 10 is given by

Formula 11(11)
where Formula 11 indicates the association parameter based on the observed values in the Cox proportional-hazards model, which depends on measurement error variance, i.e. the between-worker variance with the group-based strategy (Appendix 4). Therefore, the estimate of the association parameter in the Cox proportional-hazards model for a given observed group mean exposure ( Formula 11) is approximately equivalent to the estimate with the true exposures divided by the attenuation factor of Formula 11. It can be expected that this approximation will be poor when the between-worker variance is large and disease is either very rare or common.

Simulations
Simulations were performed to examine attenuation in association parameter estimates in logistic and Cox proportional-hazards models with group-based exposure assessment. We considered an occupational cohort study with time-invariant exposure that has five exposure groups, each group containing 1000 subjects and 100 exposed time periods (e.g. days, months) for each subject. We further assumed that disease risk depended only on exposure intensity, not on the duration of exposure. Lastly, we assumed that measurements of exposure for a sample of k workers in each group are obtainable (k = 10, 50, 100) and that measurements from n different time periods (e.g. days, n = 2) will be available for each sampled worker. We need to have repeated measurements in each sampled worker to construct a measurement error model of exposure assessment, since it is assumed that the true exposures of each worker are randomly varied among workers sampled with the between-worker variance. This is typical for an occupational exposure database that one might be able to obtain or construct for a given study.

The log of exposures was assumed to be normally distributed with means 1.1, 2.1, ... , 5.1 for the first group to fifthe group, respectively, with the between-worker standard deviation ({sigma}B) taking on values of 0, 0.1, 0.5, 1 and 1.5, and the within-worker standard deviation ({sigma}W) taking on values of 0, 0.5, 1.5 and 3. We chose the variance components on the basis of the paper by Kromhout et al. (1993), which summarized the between- and within-worker variance components generally observed for airborne contaminants in workplaces.

The true association parameter was set to 0.2, 0.4 and 0.6 for both logistic and Cox proportional-hazards regression models, and –4 was used as the intercept parameter in the logistic model, and we set the constant baseline hazard at 0.01 in the Cox proportional-hazards model in order to make the survival functions in the logistic and Cox proportional-hazards models approximately equivalent.

For each case, 1000 simulations were performed. The estimated mean exposure for each group, formulated in accordance with equation (1), was assigned to all workers in a given group. A Bernoulli random variable was used for the status of disease in the logistic regression model with the true exposure covariates gi). On the basis of the specific probability, each subject was assigned either 1 or 0 disease status using RANBIN function in SAS software (version 8), and in the case of Cox proportional-hazards model, the survival times were generated using an exponential random variable with true hazard 0.01 exp({alpha}µgi); we assumed that there was no censoring. The association parameters were estimated using LOGISTIC (for logistic model, Formula 11 ) and PHREG (for Cox proportional-hazards model, Formula 11) procedures of SAS software (version 8). Attenuation (for graphical presentation of the results) was estimated as the ratio of the association parameter estimate obtained with the group mean to the true parameter value, i.e. Formula 11 for logistic model and Formula 11 for Cox proportional-hazards model. The ratios of the estimates based on group means with true exposures (Formula 11 and Formula 11) were calculated (not shown in the article): they showed very similar behavior to the ratio with the true parameter.

The true logistic disease model is given by P(Zgi = 1 | µgi] = {Lambda}0 + ß1µgi) while the fitted disease model given the observed value, Formula 11, is Formula 11. The true Cox proportional-hazard model given the true exposure is h(t | µgi) = {lambda} exp({alpha} µgi) and the fitted proportional-hazard model given the observed value is Formula 11]. In Tables 1 and 2 we report association parameter estimates, absolute attenuation and mean square error (MSE) of the simulation based on the group exposure mean (Formula 11) in the logistic and Cox proportional-hazards models with ß1 = {alpha} = 0.6, varying within-worker variability ({sigma}W = 0, 0.5, 1.5 and 3) and the sampled number of workers (k = 10 and 100). The attenuation in association parameter estimates in the logistic model, when using the group exposure mean for each worker in the group, does not appear to be sensitive to the magnitude of the within-worker variability when the number of measured workers is sufficiently large. The attenuation is severe only when the between-worker variability is large and the number of measured workers is small. For example, if 100 workers are sampled per group, the true association parameter is 0.6 (ß1 = {alpha} = 0.6) and the within-worker standard deviation is 0.5, the attenuation is 4% if the between-worker standard deviation ({sigma}B) is equal to 1 and 17% if it is twice as large ({sigma}B = 2). However, the change in attenuation is much larger, from 6 to 23%, if only 10 workers are sampled per exposure group. Also, if the number of sampled workers per group is small, the attenuation depends on both between- and within-worker variability (i.e. measurement error assumes a structure that is more of classical than Berkson type). For the Cox proportional-hazards model, the tables show similar pattern of results as in the logistic model, but the attenuation is more severe. For example, when the within-worker standard deviation is 0.5 and the between-worker standard deviation is 1, the attenuation in the Cox model (19%) is ~5 times greater than that in the logistic model (4%). MSEs of the association parameter estimates tend to increase with the between-worker variability. Similar patterns were observed for smaller true association parameters (0.4 and 0.2).


View this table:
[in this window]
[in a new window]

 
Table 1. Estimates of association parameters in logistic (Formula 11) and Cox (Formula 11) regressions with different values of between- ({sigma}B) and within-worker ({sigma}W) standard deviations and 10 sampled workers per group, given that true association parameter is 0.6 (ß1 = {alpha}).

 

View this table:
[in this window]
[in a new window]

 
Table 2 Estimates of association parameters in logistic (Formula 11) and Cox (Formula 11) regressions with different values of between- ({sigma}B) and within-worker ({sigma}W) standard deviations and 100 sampled workers per group, given that true association parameter is 0.6 (ß1 = {alpha}).

 
Figures 1–4 further illustrate the patterns seen in Tables 1 and 2 and compare them with theoretical predictions of equation (8) for the logistic model and equation (11) for the Cox proportional-hazards model. As expected, the trend of attenuation is close to the theoretically predicted one if between-worker variability is small and for larger values of between-worker variability if the number of measured workers is large. Figure 1 indicates how vulnerable our theoretical results are to small sample size (k = 10). It would appear that attenuation is related mostly to increase in {sigma}B, as with large sample sizes. However, when {sigma}W > {sigma}B even at small {sigma}B we observe some attenuation that is greater than predicted by theory. For large sample size, the equation for the logistic model appears to be quite appropriate in the range of between-worker standard deviation <2, but the equation for attenuation in the Cox proportional-hazards model provides a good approximate solution in a narrower range: with between-worker standard deviation <1 (Fig. 2). Figures 3 and 4 show that the equation for logistic regression predicts the attenuation well when the number of measured workers is large (≥50) regardless of the magnitude of within-worker variability ({sigma}W = 0.5 and {sigma}W = 1.5). The situation is more complex with the Cox proportional-hazards model. The theoretically predicted and observed attenuation in Cox model are in close agreement when between-worker variability is small and the number of measured workers is large (≥50) regardless of the magnitude of within-worker variability (as with the logistic regression). However, as the between-worker variability increases, predictions of attenuation become progressively worse regardless of the number of measured workers, with the theory overestimating the actual attenuation. If the number of sampled workers is small, the within-worker variability affects the attenuation.


Figure 1
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1 Observed and the expected (solid) attenuation with the size of sample k = 10 when within-worker standard deviation {sigma}W is 0.5 (dot-dashed), 1.5 (dashed) and 3 (dotted) for logistic and Cox models, given that true parameter is 0.6.

 

Figure 2
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2 Observed and the expected (solid) attenuation with the size of sample k =100 when within-worker standard deviation {sigma}W is 0.5 (dot-dashed), 1.5 (dashed) and 3 (dotted) for logistic and Cox models, given that true parameter is 0.6.

 

Figure 3
View larger version (8K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3 Observed and the expected (solid) attenuation with within-worker standard deviation {sigma}W = 0.5 when the size of sample k = 100 (dot-dashed), 50 (dashed) and 10 (dotted) for logistic and Cox models, given that true parameter is 0.6.

 

Figure 4
View larger version (9K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 4 Observed and the expected (solid) attenuation with within-worker standard deviation {sigma}W = 1.5 when the size of sample k = 100 (dot-dashed), 50 (dashed) and 10 (doted) for logistic and Cox models, given that true parameter is 0.6.

 
Figure 5 shows the attenuation of various association parameter values (0.2, 0.4 and 0.6) for constant values of the within-worker variability ({sigma}W = 0.5) and the number of sampled workers (k = 100). In each case, the pattern of attenuation is similar, but smaller association parameters give smaller attenuation. Similar patterns were observed for large values of within-worker variability.


Figure 5
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 5 Observed attenuation with different parameter values for logistic and Cox models when the parameter is 0.2 (dashed), 0.4 (dot-dashed) and 0.6 (solid), given that within-worker standard deviation {sigma}W = 0.5 and the size of sample k = 100.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
We have demonstrated that the group-based strategy leads to a Berkson type error structure when there are large numbers of subjects with exposure measurements in each group, and the error variance approximates the between-worker variance. It should be noted that a pure Berkson error structure is not achieved through grouping in observational studies.We developed approximate equations for attenuation in association parameters that results from the application of a group-based exposure assessment. The approximation for the equations holds quite well in a small range of between-worker (subject) variances for both logistic and Cox proportional-hazards models. The association parameter estimates in Cox proportional-hazards models are more severely attenuated than those in logistic models.

There is a belief (Armstrong, 1990) that there is no attenuation in the estimation of the association parameters in the logistic and Cox proportional-hazards models if the error variance is constant for all observations and exposure is assessed on a group level, since the group-based strategy leads to the Berkson error model. However, when a random sample from a population is used for exposures assessment, it produces a classical error model, not a truly Berkson error model. Classical error structures arise when we wish to ascertain the value of some unknown quantity. We measure repeatedly from a population and the observed value is considered to be measured with unbiased error. When errors are independent of the true quantity, we have a classical error model. Classical measurement errors produce attenuation in the association parameter estimates in the linear regression models (Berskon, 1950) and in logistic regression models when the measurement error variance is the same for all subgroups of subjects (Carroll et al., 1984; Reeves et al., 1998; Spiegelman and Valanis, 1998; Heid et al., 2002) and it appears possible to adjust for bias in Cox proportional-hazards models (Nakamura, 1992; Wang et al., 1997; Kong, 1999; Augustin, 2004; Li and Ryan, 2004).

A true Berkson error structure arises when a nominal measured amount is used instead of the unknown true exposure, so that the assigned value is fixed by design and the true value varies due to errors (Carroll et al., 1995). Therefore, an exposure model that reflects the group-based exposure assessment strategy cannot lead to a truly Berkson error model, but it gives a property of the Berkson error model (Formula 11) when there are enough workers with exposure measurements in each group. With this property, the situation is approximately analogous to the Berkson type error model.

We found that exposure error variances in group-based exposure assessments (with a sufficient number of measured workers per group) can be approximated by the between-worker variance. This allows us to bridge the gap between the literature on group-based exposure assessment and measurement error papers that have not traditionally reflected the measurement error model that has emerged in occupational epidemiology. Usually, the parameters are unidentifiable in measurement error models, so that the error variance must be known a priori, or estimated from a separate validation dataset. Our results allow estimation of error variances in the group-based exposure assessment without resorting to additional assumptions or conducting supplementary experiments. Furthermore, we have derived an approximate equation for attenuation in Cox proportional-hazards model with Berkson type error structure. The equation for the attenuation enabled us to examine the approximate behavior of the association parameter estimates in populations with different characteristics using a group-based exposure assessment strategy. This may enable us in the future to adjust for attenuation in association parameter estimates obtained in such studies. Finally, we showed, for the first time, that the logistic model and the Cox proportional-hazards model could not be used equivalently with measurement errors in exposure.

However, if the number of sampled workers is small, the attenuation becomes severe when both between- and within-worker variabilities get large: since the properties of the Berkson error structure cannot be satisfied, this leads to the classical error model in which the attenuation can be expected to depend on both between- and within-worker variabilities. Also, if the probability of disease is rare for all groups (p < 0.02), the approximation relating the logistic model to the probit model is poor.

We note that the probability of disease should satisfy the approximation conditions (McCullagh and Nelder, 1983). The approximation from logistic to probit functions holds for 0.1 < p < 0.9 and the quantity Formula 11 should fall in the range of (0, 1) for the Taylor approximation from logistic to Cox proportional-hazards models. In the simulations, the expected true probability of diseases [according to P(Zgi = 1|µg) = {Lambda}0 + ß1µg)] among groups was between 0.03 and 0.28 for ß1 = 0.6, between 0.02 and 0.12 for ß1 = 0.4 and between 0.02 and 0.05 for ß1 = 0.2 (the attenuation is still close to the expected line, which is not shown in Fig. 5) with the same amount of true exposures.

We have demonstrated that the group-based strategy in logistic models works well, leading to little bias in estimating the association parameter of the model. The strategy is simple to implement with standard software using a sufficient statistic of the true common mean. For the Cox proportional-hazards models, the group-based strategy does not seem to address the problem effectively under the assumptions we considered, namely fixed baseline hazard, short length of following-up, time independent exposures and no censoring.

It is also noted that the group-based strategy does not lead to the common calibration method for individual-based main study/validation study design (Rosner et al., 1989; Spiegelman et al., 1997, 2001), because in our case only a sample of subjects has any measurements, unlike the classical calibration method in which all subjects have one measurement, but a subset also has repeated and/or gold standard measurements. The common calibration method considers neither sampling nor grouping.

Our results have the following important limitations. First, our results apply to the situations where logarithms of exposure are reasonable proxies for biologically relevant doses, i.e. with a log–log model, log(rate) = ß0 + ß1 x log(exposures). However, toxicological arguments indicate that log-linear models, log(rate) = ß0 + ß1 x (exposures), may be more plausible for many exposure-disease associations (Rappaport, 1991) and therefore they should lead to more realistic inferences (e.g. risk assessments) (Steenland and Deddens, 2004). Application of a group-based strategy when the true exposure-disease is log-linear is a subject of ongoing research. Second, the group means are considered fixed instead of random effects. Third, we consider constant within and between-worker variability. Fourth, although the condition of equal numbers of workers per group and measurements per worker are rather restrictive, the purpose of this study is to provide an evidence of impact of using the group-based strategy in logistic and Cox proportional-hazards models for further development of exposure assessment methodology. Fifth, we considered the case of homogenous variance components, but Burstyn et al. (2006) studied the heteroscedastic measurement error under group-level exposure assessment. The results indicate that small-to-negligible bias can be expected to result from heteroscedastic between-worker variances: Cox proportional-hazards models can produce attenuated risk estimates, while logistic regression may result in an overestimation of risk gradient. In the case of small number of workers observed in each group, we need more repeated measurements of each worker to lead to a Berkson error structure with the group-based assessment. If the number of workers in each group is different in such a case, the situation leads to the heteroscedastic variance components [suggested by equation (5)]. Sixth, the fundamental assumption about exposure assessment that we consider is that it is based on sufficient number of measurements from each group to satisfy the condition under which approximate Berkson type error emerges (3). We do not address the question of how many measurements (and of what type—more workers or repeats) are needed to meet this condition. Our simulations suggest that if we are able to monitor 50 workers on 2 days from each group, then the sample size is sufficient. A quantitative answer to this query awaits further research.

In summary, our results provide several new insights into the estimation of association parameters in the logistic and Cox proportional-hazards model when exposures are assessed with the group-based strategy. The understanding of the approximate behavior of the association parameter estimate in the models can help design efficient grouping strategy in practical situations. In general, it would seem that maximizing the number of workers with exposure measurements in each group is desirable. In collecting measurements for exposure assessment, the need to have a repeated measures design is apparent to enable the investigators to study measurement error structure.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
1. Proof of Properties under Classical Error Structure
The joint distribution of (µgi, Formula 11) is given by


Formula

Where

Formula

2. Proof of properties under Berkson error structure

Formula

3. Attenuation for Logistic Model
Formula 11, where Formula 11 c = 0.588 and 0.1 < p < 0.9. {Phi}(t) is the cumulative density function for the standard Normal distribution.

Formula
where

Formula

4. Attenuation for Cox Proportional-Hazard Model
The logistic regression function with true risk factor Z with coefficients ß0 and ß1 is given as

Formula (A1)
The model suggested by Cox (1972) is based on a hazard at positive time t having the form

Formula (A2)
A relationship between a logistic model and the special case of the Cox model with h0(t) = {lambda} is that the coefficients of the risk factor Z in the logistic regression will be approximately equal to the coefficient for the Cox model. This result holds when exp(ß0 + ß1Z) is small, implying that sum of ß0 = ln({lambda}T) and the linear combination of risk factors is small (Cox, 1972).

When the covariates Z are not available, but X are observable with error in the standard Cox proportional-hazards model, it has been shown that there exists a solution of maximum partial Cox likelihood function and that the naive estimate of {alpha} (based on X) is a consistent estimator of the solution, Formula 11 ,and Formula 11 is a function of Formula 11 , which reflects the average magnitude of the measurement error (Nakamura, 1992; Li and Ryan, 2004; Heid et al., 2002). We consider a range of Formula 11, where {Delta} ≥ 0 and [ß1 and Formula 11 ] and [{alpha} and Formula 11 ] are approximately equivalent. Then, an approach to find the relationship between estimates of {alpha} based on the true exposure and those of {alpha}* based on observed exposures is to use approximate equivalence of survival functions of the two models (logistic and Cox) given the observed exposures (X) when the disease risk is low in this range.

Suppose that true exposure conditioned on the observed exposure, Z|X, is normally distributed with mean E(Z|X) and variance Formula 11 , the expected probability of getting diseases given the observed value X in the logistic regression model (10) is approximately

Formula

Where


Formula

When the conditional probability function is concentrated (Prentice, 1982) the expected hazard in Cox regression model is

Formula

The logistic survival function and Cox model survival function will be approximately equivalent when

Formula (A3)
with constant underlying hazard, {lambda}.

In order to make a relationship of the slope parameters between two models when exposures are measured correctly, we need a condition that the baseline risk in the logistic is equivalent to the logarithm of the baseline risk times the duration of the follow-up: Formula 11 . Since we consider a small range of the error variance, where the parameter given the observed value is approximately equivalent to the parameter given the true value Formula 11 , the equivalence of two survival functions holds when


Formula 15(A4)

Then, the equivalence (A4) with Berkson error model, E(Z | X) = X and Formula 11 Formula 11 , can be rewritten as

Formula (A5)

where Formula 11

By taking derivative of the equation (A5) with respect to X and at X = 0, we obtain


Formula 17(A6)

By the Taylor series expansion at Formula 11 on the right side of (A6), the approximation

Formula (A7)
holds.

Therefore, an approximate expression of the relationship between the parameters {alpha} and Formula 11 in a range Formula 11 is

Formula (A8)
Together with (A8) and Formula 11 with large number of workers in each group, the equation (11) holds.


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 
The authors wish to thank Drs Nicola Cherry, Ben Armstrong and James A. Deddens for their helpful comments. This work was funded by the grant from the Canadian Cancer Etiology Research Network (http://www.ccern.org/).

Received November 4, 2005; in final form March 5, 2006


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 Methods and results
 DISCUSSION
 APPENDIX
 ACKNOWLEDGEMENTS
 REFERENCES
 

Armstrong B. (1990) The effect of measurement errors on relative risk regressions. Am J Epidemiol 132:1176–84.[Abstract/Free Full Text]

Armstrong B. (1998) Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occupa Environ Med 55:651–6.

Augustin T. (2004) An exact corrected log-likelihood function for Cox's proportional hazards model under measurement error and some extensions. Scand J Stat 31:43–50.[CrossRef]

Berkson J. (1950) Are there two regressions? J Am Stat Assoc 45:164–80.[CrossRef][Web of Science]

Burr D. (1988) On error-in-variables in binary regression—Berkson case. J Am Stat Assoc 83:739–43.[CrossRef]

Burstyn I, Kim H-M, Cherry N, et al. (2006) Metamodels of bias in Cox proportional-hazards and logistic regressions with heteroscedastic measurement error under group-level exposure assessment. Ann Occup Hyg (in press).

Carroll RJ, Spiegelman H, Gordon KK, et al. (1984) On errors-in-variables for binary regression models. Biometrika 81:19–25.

Carroll RJ, Ruppert D, Stefanski LA. (1995) Measurement error in nonlinear models. (Chapman and Hall, London, UK).

Cox DR. (1972) Regression models and life-tables. J R Stat Soc [Ser B] 34:187–220.

Deddens JA and Hornung RW. (1994) Quantitative examples of continuous exposure measurement errors that bias risk estimates away from the null. In Smith C, Christiani D, Kelsey K (Eds.). Chemical risk assessment of occupational health. (Auburn, London) pp. 77–85.

Heid IM, Kuchenhoff H, Wellmann J, et al. (2002) On the potential of measurement error to induce differential bias on odds ratio estimates: an example from radon epidemiology. Stat Med 21:3261–78.[CrossRef][Web of Science][Medline]

Green MS and Symons MJ. (1983) A comparison of the logistic risk function and the proportional hazards model in prospective epidemiology. J Chronic Dis 36:715–24.[CrossRef][Web of Science][Medline]

Kong FH. (1999) Adjusting regression attenuation in the Cox proportional hazard model. J Stat Plan Inference 79:31–44.[CrossRef]

Kromhout H, Loomis D, Mihlan GJ, et al. (1995) Assessment and grouping of occupational magnetic field exposure in five electric utility companies. Scand J Work Environ Health 21:43–50.[Web of Science][Medline]

Kromhout H, Symanski E, Rappaport SM. (1993) A comprehensive evaluation of within and between-worker components of occupational exposure to chemical agents. Ann Occup Hyg 37:253–70.[Abstract/Free Full Text]

Li Y and Ryan L. (2004) Survival analysis with heterogeneous covariate measurement error. J Am Stat Assoc: Theory Methods 99:724–35.

McCullagh P and Nelder JA. (1983) Generalized linear model. (Chapman and Hall, London, UK).

Nakamura T. (1992) Proportional hazards model with covariate subject to measurement error. Biometrics 48:829–38.[CrossRef][Web of Science][Medline]

Prentice RL. (1982) Covariate measurement errors and parameter estimation in a failure time regression mdoel. Biometrika 69:331–42.[Abstract/Free Full Text]

Rappaport SM. (1991) Assessment of long-term exposures to toxic substances in air. Ann Occup Hyg 35:61–121.[Abstract/Free Full Text]

Reeves GK, Cox DR, Darby C, et al. (1998) Some aspects of measurement error in explanatory variables for continuous and binary regression model. Stat Med 17:2157–77.[CrossRef][Web of Science][Medline]

Rosner B, Willett WC, Spiegelman D. (1989) Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med 8:1051–69.[Web of Science][Medline]

Seixas NS and Shappard L. (1996) Maximizing accuracy and precision using individual and grouped exposure assessments. Scand J Work Environ Health 22:94–101.[Web of Science][Medline]

Spiegelman D, Schneeweiss S, McDermott A. (1997) Measurement error correction for logistic regression models with an "alloyed gold standard". Am J Epidemiol 145:184–96.[Abstract/Free Full Text]

Spiegelman D, Carroll RJ, Kipnis V. (2001) Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. Stat Med 20:139–60.[CrossRef][Web of Science][Medline]

Spiegelman D and Valanis B. (1998) Correcting for bias in relative risk estimates due to exposure measurement error: A case study of occupational exposure to antineoplastics in pharmacists. Am J Public Health 88:406.[Abstract/Free Full Text]

Steenland K and Deddens JA. (2004) A practical guide to dose-response analyses and risk assessment in occupational epidemiology. Epidemiology 5:63–70.

Tielemans E, Kupper L, Kromhout H, et al. (1998) Individual-based and group-based occupational exposure assessment: some equations to evaluate different strategies. Ann Occup Hyg 42:115–19.[Abstract/Free Full Text]

Wang CY, Hsu L, Feng ZD, et al. (1997) Regression Calibration in Failure time regression. Biometrics 50:131–45.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Occup. Environ. Med.Home page
I Burstyn, H-M Kim, Y Yasui, and N M Cherry
The virtues of a deliberately mis-specified disease model in demonstrating a gene-environment interaction
Occup. Environ. Med., June 1, 2009; 66(6): 374 - 380.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
50/6/623    most recent
mel021v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by KIM, H.-M.
Right arrow Articles by BURSTYN, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by KIM, H.-M.
Right arrow Articles by BURSTYN, I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?