Annals of Occupational Hygiene Advance Access originally published online on October 17, 2006
Annals of Occupational Hygiene 2007 51(2):161-172; doi:10.1093/annhyg/mel068
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Monte Carlo Simulation to Reconstruct Formaldehyde Exposure Levels from Summary Parameters Reported in the Literature
Groupe de recherche interdisciplinaire en santé (GRIS), Département de santé environnementale et santé au travail, Faculté de médecine, Université de Montréal PO Box 6128, Main Station, Montreal, QC, Canada H3C 3J7
*Author to whom correspondence should be addressed. E-mail: michel.gerin{at}umontreal.ca
| ABSTRACT |
|---|
|
|
|---|
Objectives: This study presents a procedure allowing the numerical synthesis of exposure data reported in different ways in the literature, including summary parameters and single measurements. The procedure was applied to literature regarding formaldehyde exposure in the reconstituted wood panels industry, including oriented-strand board (OSB), medium density fibre board (MDF) and particle board (PB).
Methods: For each publication providing summary parameters we estimated geometric means (GM) and geometric standard deviations (GSD) by assuming lognormality of exposure levels. Monte Carlo simulation was performed to re-create datasets from the sample sizes and estimated GMs and GSDs, allowing their subsequent formatting together with the single measurements. The precision and bias of the methods used to estimate GMs and GSDs were evaluated.
Results: Altogether, the 13 articles included in our study yielded a final database of 874 data, of which 732 were simulated. For both area and personal data, exposures corresponding to MDF and PB were similar while OSB levels were lower. The most recent available personal levels (19851994) were highest in PB for jobs performed in the vicinity of the press (GM = 0.63 mg m3). Corresponding area levels were highest for PB in the main production zone (GM = 0.43 mg m3). Mixed-effects models fitted to area PB data explained 38% of the total variability. A 6-fold decrease in exposures from 1965 to 1995 was estimated. Replication of the simulation process yielded relative standard deviations of the calculated GMs and GSDs between 10 and 20%. The relative biases of the methods used to estimate GMs and GSDs varied across methods and decreased with higher sample sizes (from
15% for n = 5 to less than 5% for n = 30, in absolute value). The precision also varied across methods and improved with higher sample sizes (from
30% for n = 5 to
10% for n = 30).
Discussion: This methodology constitutes a new meta-analysis tool that should improve the interpretation of industrial hygiene literature data, but needs to be further validated.
Keywords: lognormal model medium density fibre board mixed-effects models occupational exposure assessment oriented-strand board particle board
| INTRODUCTION |
|---|
|
|
|---|
In many exposure assessment situations, either because of a lack of resources for prospective sampling or because of the need to characterize past exposures, literature is the main source of information. In addition to information describing exposure generating processes and tasks, quantitative exposure measurements are reported in a number of studies. Several authors have underlined the limits of the use of published data in the scope of risk analysis (Caldwell et al., 2001; Marquart et al., 2001; Money and Margary, 2002; Tielemans et al., 2002). Hence, some studies lack details about determinants of exposure, about characteristics of the study population or about statistical parameters. It then becomes difficult to integrate all the available numerical data, especially when numerous studies are available and different statistical parameters are reported. Thus, most literature analyses are reported in the form of tables presenting the results of each individual study and expert opinion is used to make a global assessment. On the other hand, quantitative assessments of exposure, particularly for occupational epidemiology, become increasingly preferred over qualitative or semiquantitative assessments (Ahrens and Stewart, 2003).
Recently, Caldwell et al. (2000), in a review on solvent exposure, calculated averages of the reported arithmetic means weighted by the associated sample sizes. The authors had to exclude numerous studies that did not report results as arithmetic means. Tielemans et al. (2002), and Money and Margary (2002) have proposed theoretical frameworks for the use of exposure data available in the published literature, mainly by presenting quality criteria regarding the internal and external validity of the studies.
The main objective of our study was to develop a method to summarize exposure data reported in different ways in the literature by estimating common statistical exposure parameters from different types of exposure metrics and by simulating exposure data from the estimated parameters to allow their combined analysis with single measurements. This paper presents the proposed procedure and its application to formaldehyde exposure data in the reconstituted wood panels industry. This study should not be regarded as a literature review of formaldehyde exposure in this sector, which would include information of a much wider scope than the summary of exposure levels presented here.
The reconstituted wood panels industry is part of the larger Veneer, Plywood and Engineered Wood Product Manufacturing group of the North American Industry Classification System (NAICS). It includes several processes that can be classified as either plywood products or composition boards. The processes included in this study are limited to particle board (PB), medium density fibre board (MDF) and oriented strand board (OSB), which all belong to the composition board category. Occupational exposure to formaldehyde in this industry comes mainly from the degradation of the resins during and after the pressing operation.
| METHODS |
|---|
|
|
|---|
An exhaustive literature review of formaldehyde exposure levels published before and up to 2001 was conducted in the reconstituted wood panels industry. The study was limited to PB, MDF and OSB. All publications reporting formaldehyde levels measured in workplaces in this industrial sector were retained for further analysis. The exposure levels were formatted into a relational database (the initial database, see Fig. 1) and allocated to specific jobs/work zones based on the work by Lavoué et al. (2005). The personal measurements were also classified in exposure groups identified in the same study: group 1 Office includes administration and foremen; group 2 Maintenance-quality control includes laboratory technicians, maintenance workers and cleaners; group 3 Production includes press operators, assistant press operators, finishers, shippers; and group 4 Floater includes floaters and press-miscellaneous tasks. The area measurements were classified in seven zones: Raw materials receivingchip preparation, Resin productionstorage, Main productionpress, Finishing, Storageshipping, Operator booth and other departments (non-production areas).
|
Step 1 of the procedure: calculation of common statistical parameters
The initial database contained two types of exposure data: single exposure concentrations, each representing one measurement (SM record, see Fig. 1), and sets of summary parameters, each set summarizing a number of measurements (SS record, see Fig. 1). Each record of this database was associated with values for different variables such as work zone, job group, source article. The SS records also contained some of the following parameters: sample size (N), arithmetic mean (AM), arithmetic standard deviation (ASD), geometric mean (GM), geometric standard deviation (GSD), range (with minimum a and maximum b), median or an empirical percentile of the distribution of the measurements (
Estimation of GM from GSD and AM.
|
| (1) |
|
| (2) |
|
| (3) |
![]() | (4) |
|
| (5) |
|
| (6) |
Estimation of GSD from [a,b]. (see Appendix for details)
|
| (7) |
Equations 14 are based on the theoretical correspondence between the different parameters characterizing the lognormal distribution (Rappaport, 2000). Equation 5 is based on the fact that the expected values of the maximum and minimum of a sample from a normal distribution are symmetrical around the mean of the distribution (Zwillinger and Kokoska, 2000). Equation 6 is based on the fact that the xth percentile of a sample from a normal distribution is a good approximation of the xth percentile of the parent distribution (see Appendix). The assumptions linked to equation 7 are detailed in the Appendix.
If GM was not available but the sample median was, the latter was used as an estimate of GM. If neither GM nor the median was reported, GM was estimated from AM and GSD (equation 1) or from AM and ASD (equation 4). GM was estimated from the range (equation 5) only when it was the only available parameter.
GSD, if not available, was determined from AM and ASD (equation 3), from AM and GM (equation 2) or from GM and
(equation 6). When none of the previous methods could be used, GSD was estimated from the theoretical median of the standardized range (equation 7, see Appendix). When the estimation of GSD was not possible, the median of the GSDs estimated for the other sets of measurements was used.
Steps 2 and 3 of the procedure: simulation of exposure data and creation of the measurement database
In step two of the methodology, each SS record was replaced with a number of concentrations equal to the reported sample size, drawn at random from a lognormal distribution with the GM and GSD estimated for that particular SS. In step 3 of the methodology, a new measurement database was created by combining the original single measurements and the simulated measurements. The procedure was repeated 1000 times creating 1000 replicas of the measurement database. Hence, while the values simulated from SS measurements differed for each replica, the SM values did not change. Personal and area measurements were separated before further analysis.
Analysis of the measurement database
For personal and area concentrations, GMs and GSDs stratified by process and by period of time were calculated for each of the 1000 replicas of the analysis database. For each stratum, the median of the 1000 resulting GMs was used as the exposure metric and the relative standard deviation (RSDs) of the 1000 GMs was computed as a measure of variability caused by the simulation.
For the personal measurements 90% of the data were in the PB process. Moreover 70% of the PB data corresponded to the unknown category of the job classification and the data for which the job was known were almost entirely (87 out of 96 data) in the most recent time period category (19841995). Because of this severely unbalanced structure, we restricted the analysis of personal data to calculation of stratified GMs for the most populated categories of process, job group and time period. For the area measurements, PB also represented
90% of the data, but inside the PB process data were approximately evenly distributed across categories of work zone and time period. Therefore, statistical modelling was used to analyse the area PB data.
A total of 200 replicas of the area PB data were randomly selected among the 1000 initially created and linear mixed-effects models were fitted to the log-transformed concentrations. The model was constructed in a manual forward stepwise procedure similarly to that described in Lavoué et al. (2005), using the Bayesian information criterion (BIC) as a discrimination tool. Backward stepwise selection was also performed to check the consistency of the resulting model. At each step of the procedure, the model was fitted to the 200 datasets and median values of the BIC were used to decide if a variable would be included or not. The tested fixed effects tested included time period (19651974, 19751984, 19851994), work zone and sampling duration (<1 h, 16 h, >6 h, unknown). The source of the data (i.e. article) was tested as a random effect to evaluate the extent of the difference between sources after taking into account the fixed effects. Since the measurements simulated from SS records were generated with differing variances, a different residual variance for each record was modelled. In addition, the variance of single measurements was modelled as different across sources. The parameters of the final model were computed as the mean of the 200 restricted maximum likelihood (REML) estimates resulting from fitting the common model to the 200 replicas of the area PB data. RSDs were also computed to assess the variability of the results. Internal validation of the models was conducted by graphically assessing normality and independence assumptions for the residuals and estimates of the random effects in a random subsample of 10 models.
During the creation of the measurement database, personal and area single measurements reported as below a limit of detection were replaced with the limit of detection divided, respectively, by
and 2, based on the GSDs of the non-censored values (Hornung and Reed, 1990). Ranges including a limit of detection (e.g. <0.23) were used for estimation of GM and GSD only when the range was the only available parameter because of the potential for important bias. In this case, the lower limit of the range was replaced by the limit of detection divided by
for personal measurement or 2 for area measurements in equations 5 and 7 (Hornung and Reed, 1990).
Partial validation of the equations used to estimate GMs and GSDs
In order to provide some insight on the validity of the estimation tools presented above (equations 17), a limited simulation study was conducted. A total of 500 samples of sizes 5, 10 and 30 were generated from a lognormal distribution with GM = 1 and GSD = 2.5. For each generated sample, the sample GM and GSD were calculated, as well as all the parameters used in equations 17. From these parameters, four estimations of the GMs and GSDs were, in turn, determined using the estimators in equations 17. For each estimator, the relative bias was calculated as the average relative difference between the estimate and the sample GM or GSD, and the relative precision as the RSD of the bias. The relative bias and precision were stratified by sample size.
All calculations presented in this paper, including the Monte Carlo simulations, were performed with the statistical package S-plus© 6.1 for Windows professional edition (Insightful Corp.).
| RESULTS |
|---|
|
|
|---|
Simulation procedure
A total of 22 publications were abstracted into the relational database. Nine publications were excluded from the analysis: one only reported results from semiquantitative colorimetric tubes (Herbert et al., 1995); one described a plant located in a tropical area (Malaka and Kodama, 1990); the sample size could not be estimated in two publications (Edling et al., 1985; Edling et al., 1988); one was excluded because the facilities surveyed did not use a formaldehyde-based resin to bond the wood-particles (Daniels et al., 1988); in another the results summarized a mix of area and personal measurements, which could not be distinguished (Mortimer, 1982b). Finally, three publications were excluded because they used the same dataset presented in a paper of wider scope (Niemelä et al., 1980; Niemelä and Vainio, 1981; Niemelä, 1986); this left 13 publications for further analysis.
Table 1 shows, for each of the 13 publications included in the analysis, the number of single concentrations (SM records) and of summarized sets of measurements (SS records) provided. Among the journal articles retained for analysis, three presented exposure data taken by different organizations in several facilities in a country (Kauppinen and Niemelä, 1985; Triebig et al., 1989; Niemelä et al., 1997). Three were epidemiological studies, mainly on the respiratory effects of formaldehyde exposure (Horvath et al., 1988; Imbus and Tochilin, 1988; Herbert et al., 1994). One article reported the summary of formaldehyde levels in the French occupational exposure database COLCHIC (Carton, 1995) while another reported results of different sampling methods for formaldehyde in a facility manufacturing wood panels (Wentrup et al., 1986).
|
Neither GM nor GSD were reported in any of the analysed papers. AM was available in 44 of the 49 measurement sets while ASD appeared in 1. The median was available in 26 sets. The range was reported in 42 sets and one publication, representing 2 measurement sets, reported the sample's 95th percentile. This resulted in 53% of the estimated GMs determined from the median, 41% from AM and the range (GSD estimated from equation 7 then GM estimated from equation 1), 4% from AM and GSD set to the median of the other GSDs (equation 1), and 2% from AM and ASD (equation 4). The GSDs were estimated from the different parameters in the following proportions: range for 59% (equation 7), AM and median for 26% (equation 2), median of the other GSDs for 8%, AM and
Ten of the 49 measurement sets represented less than 10 measurements each while 20 sets represented more than 10 measurements each. For 19 of the sets, the sample size was not provided explicitly in the publication but could be estimated from other information provided (e.g. sample size of other similar measurement sets in the source article).
Regarding the final datasets, including simulated and actual measurements, the analytical and sampling methods, and a crude description of the sampling strategy were reported respectively in 10 and 7 of the 13 analysed publications. Most of the reported sampling strategies were of the type placed in areas representative of normal exposure conditions or presumed maximal exposure jobs. For 33% of the measurements, information about the associated work zone or job was not provided. The sample duration was not reported for 20% of the measurements. Twenty percent of measurement durations were below 1 h, 52% were between 1 and 6 h, and 8% were greater than 6 h.
Analysis of the measurement database
Personal measurements. The final database contains 376 personal measurements, of which 320 were simulated from 11 SS records. The measurement sets comprised a median of 22 measurements (range 5109). The personal database represents data from eight publications. None of the 56 single measurements were reported under a limit of detection. The personal measurements had a median GM of 0.53 mg m3 (RSD = 3%) and a median GSD of 2.80 (RSD = 3%). Table 2 shows the median GMs, 90th percentiles, and their respective RSDs for personal measurements after stratification by job group, time period and process, for strata with more than five values.
|
Area measurements. The final database contains 498 area measurements, of which 412 were simulated from 38 SS records which comprised a median of 21 measurements (range 461). The area database represents data from 10 publications. Twelve of the 86 single measurements were reported under a limit of detection. The area measurements had a median GM of 0.79 mg m3 (RSD = 3%) and a median GSD of 4.36 (RSD = 3%). The median GMs after stratification by process were: MDF 0.22 mg m3 (N = 32, RSD = 10%, data from 1975 to 1994), PB 0.99 mg m3 (N = 443, RSD = 4%, data from 1965 to 1994) and OSB 0.05 mg m3 (N = 23, 23 original single measurements, data from 1985 to 1994). The median GMs after stratification by time period were: 19651974: 2.52 mg m3 (N = 136, 5 RSD = 5%), 19751984: 0.68 mg m3 (N = 295, RSD = 4%) and 19851994: 0.14 mg m3 (N = 67, RSD = 10%).
There were 443 area PB data available for modelling. The fixed effects of the final model explained an average of 38% of the total variability when fitted to the 200 datasets. Using backward instead of forward stepwise selection yielded the same model structure. A summary of the mean parameters of the model along with their RSDs for the 200 replications are presented in Table 3. Table 4 presents the GMs for the different time periods and work zones predicted from the models by using the average coefficients presented in Table 3. A significant within-source correlation was found in the area data, with an average intraclass correlation coefficient of 0.23. The variable modelling a different variability for the different sets of measurements was refined in a post hoc manner by aggregating categories with estimates less than 10% different from each other. This aggregation was performed because the important initial number of categories yielded a non-definite positive approximate variance-covariance matrix in the REML optimization. The refined variable (from 42 to 9 categories) improved the model fit significantly in terms of BIC. This corresponds to within-source GSDs varying from 1.36 to 5.79 (RSDs not shown). Graphical assessment of 10 randomly chosen models yielded satisfactory conformity to the model assumptions.
|
|
Partial validation of the equations used to estimate GMs and GSDs
Table 5 shows the relative bias and precision of the estimates of GM and GSDs obtained with equations 17 compared to the sample GMs and GSDs, for sample sizes 5, 10 and 30.
|
| DISCUSSION |
|---|
|
|
|---|
Information available in the literature
The analysis of the literature database revealed several limitations, in the context of exposure assessment, of the information reported in the publications, similar to those described by Money and Margary (2002) and Caldwell et al. (2001). Hence, although only looking systematically at crude information (statistical summary parameters, job/work zone sampled, sample size, crude sampling strategies and analytical methods), our results show high percentages of data for which this information was lacking or not adequate.
Most available area exposure data came from the main-production and finishing work zones (74% of the data for which this information was available). Likewise, only 3 job strata had more than 10 measurements available (press operator, finisher and chipboard pasting, representing 71% of the documented data). These results underline a potential for misinterpretation of exposure levels for which no information on the job/workplace is provided.
Statistical modelling
Our simulation procedure yielded area and personal datasets different from exposure datasets commonly described in the literature only in that they contain multiple concentration data columns due to the replication of the simulation process. Although these datasets could have been interpreted using several numerical analysis methods, we initially had planned to use linear mixed-effect models, which are increasingly considered as the state of the art analysis tool in occupational hygiene (Burdorf and Van Tongeren, 2003). However, personal data were so severely unbalanced that the available variables were almost perfectly correlated, which prevented any modelling effort to disentangle their independent influence on the response. We therefore performed a stratified analysis of these measurements in categories with enough data. Mixed-effects models were used to analyse the area PB data, in which data were spread approximately evenly between categories of the different variables.
The fixed-effects of the model constructed for the area data explained an important fraction of the variability of the simulated data (38%), which indicates the usefulness of such an analysis even with crude information on potential determinants of exposure. The model identified a clear time trend in the data, with an estimated 5- to 6-fold reduction in exposure levels between the periods 19651974 and 19851994. The main production area was also shown as the highest exposure zone, with the zones corresponding to post- or pre-pressing operations associated with lower exposure levels. These results are very similar to those reported by Lavoué et al. (2005), who analysed formaldehyde exposure data measured in 12 plants manufacturing wood panels in Quebec. The unknown-other category in our study corresponds to low exposure levels compared to the other zones. This is due to the fact that a significant proportion of these data were classified as other in the initial database, and, as such, probably corresponds to very low exposure locations such as administration or the exterior of the facility.
Moderate correlation was found between concentrations coming from the same publication for area measurements when simple random-effect models were fitted to the data. This suggests that there were differences among the various sources due to undocumented factors. Thus, significant bias could be present in any assessment based on a small fraction of the available publications if sufficient information is not provided.
Exposure levels estimated/predicted in our study
Personal measurements for the most recent available period (19851994) showed formaldehyde levels around 0.60 mg m3 for the most exposed jobs (i.e. job group Production) in the PB process and slightly lower levels in the MDF process (0.40 mg m3). Other job groups corresponded to lower exposures, by a factor of 1.52. The presence of a decrease of exposure over time was less clear in the case of personal measurements than that observed in the area model. There were only five values for 19651974, and Table 2 shows that although exposures in the unknow job category for PB decreased from 19751984 to 19851994, they increased in the case of job group Production. Non-stratified GMs nevertheless show a trend similar to that observed in the area models. The GM corresponding to the unknown category is higher than both group Maintenancequality control and Production, which suggests that most of the unknown data come from high exposure jobs, most probably a mix of group Production and Floater (Group Floater corresponded to the highest exposures in the study of Lavoué et al.). This confirms the risk of exposure misclassification when interpreting data with little ancillary information.
Area measurements were markedly lower in OSB (GM = 0.05 mg m3) than in the two other processes. This is not unexpected and is due to the fact that phenolformaldehyde resins used in OSB are more resistant to hydrolysis than ureaformaldehyde and melamineureaformaldehyde resins, which are used in MDF and PB. MDF concentrations (GM = 0.22 mg m3) also seemed lower than PB levels, but did not differ from PB levels after correction by time period and zone (most of the MDF levels correspond to 19841995 and the Other-unknown zone). PB concentrations, predicted for the most recent period, show formaldehyde GMs at 0.42 mg m3 in the main production zone, and between 0.12 mg m3 and 0.23 mg m3 in the other zones.
The personal and area exposure levels reported above are very similar to predictions made from the models presented in the study of Lavoué et al. in the same industrial sector in Quebec for the year 1990 and for what the authors labelled governmental data as opposed to research data (calculations not shown). These predictions were generally less than 20% different from the GMs presented in Tables 2 and 5. In their study, Lavoué et al. found that levels taken by governmental hygienists were consistently higher than those measured by a research team in the same facilities after correction for other determinants of exposure. They concluded that governmental data probably corresponded to worst case sampling strategies. Our observations would therefore imply that such bias also exist in the data we extracted from the literature. Unfortunately, only three of the seven publications reporting a crude description of the sampling strategy stated explicitly that it was a worst-case strategy; they represented 20% of all the measurements. Moreover these data were not higher than the rest of the data after correction for process, time period and job/zone. Thus it remains unclear how the different and mostly unknown sampling strategies used in the literature influenced our results.
The data in our study corresponded to variable sampling times (20% <1 h, 52% 16 h, 8% >6 h, 20% unknown). Several authors have observed an influence of the sampling duration on exposure levels, generally a decrease in exposures when the duration increases (Raaschou-Nielsen et al., 2002; Kolstad et al., 2005; Lavoué et al., 2005). Including a nominal variable corresponding to the sampling time did not improve the fit of the area models. In the case of personal data, median GMs stratified only by the sample time category showed increasing exposures with increasing sample time : <1 h (0.36 mg m3), 16 h (0.52 mg m3), >6 h (0.77 mg m3), unknown (0.76 mg m3). However, this trend was inversed when using MDF data from 1984 to 1995 stratified by job group and category of sample time. It is therefore difficult to conclude on the existence of an influence of the sample time in our data. This may be due to the classification scheme we used, which was warranted by the lack of precision in the information provided in the articles.
Validity of the simulation methodology
Monte Carlo simulations used to merge aggregated and single measurements yielded moderately variable results. Indeed the RSDs of the summary parameters (GMs and GSDs) calculated from the 1000 replications are between 10 and 20%. These results are confirmed by the similar variability observed in the parameters of the mixed-effects model fitted to the 200 simulated datasets and point to the possibility of using less replications in future studies.
The first main limitation of our methodology regards the calculations used to estimate distribution parameters from very limited information. The simulation study we performed in order to evaluate the accuracy of the different estimators in equations 17 showed moderate bias and precision for most methods, with errors decreasing when the sample size increased. Indeed, the maximum bias for the estimation of GMs decreased (in absolute value) from 18% for a sample size of 5, to 5% for a sample size of 30. Table 5 also shows that the different methods used are associated with different biases and precisions. For example GM2 in Table 5 is negatively biased whereas the other estimators are positively biased. In this particular case, the sign of the bias is easily explained by the fact that equation 1, when used to estimate the distribution arithmetic mean, provides a positively biased estimate (Hewett and Ganser, 1997). Therefore, GM estimated from GSD and an unbiased estimate of the distribution arithmetic mean (the sample arithmetic mean) will be negatively biased. Since the set of available parameters is often different between sources of data, the error introduced by the estimation methods will also vary across sources. We chose an average value of GSD (2.5) to perform the validation (Buringh and Lanting, 1991). Since the GSD we estimated during this study were slightly lower (median values of, respectively, 1.7 and 2.2 for personal and area measurements), the actual error in our estimates should be lower than that observed in Table 5. Altogether, we believe these results are promising since the accuracy of the estimation methods seems acceptable compared to the uncertainty generally associated with occupational exposure assessment. However, more extensive simulation studies should be conducted to fully evaluate the methods presented here. Such studies could permit the determination of a prioritizing scheme for the choice of specific equations and allow the specification of sample sizes below which some methods should not be used. Moreover, error propagation studies could help characterize the uncertainty associated with the concurrent use of several equations in the estimation (e.g. GSD estimated from an equation and then estimation of GM using the GSD estimate).
The second main limitation of our methodology regards the assumption that every set of data summarized in the analysed articles followed a lognormal distribution. This assumption is central in our methodology because it is used in the methods of estimation of GM and GSD and during the simulation process. It is now well established that airborne concentrations of contaminants in the workplace tend to follow, at least approximately, a lognormal distribution, and most methods of interpretation of exposure levels rely heavily on this assumption (Mulhausen and Diamano, 1998; Rappaport, 2000). We believe there was little risk of important departure from the assumption of lognormality in most of our data since each set of measurements came from the same occupational setting and was further characterized by process, time period and job/zone. However, we did not quantify the robustness of our simulation method to such departures, and recommend that other studies be conducted to evaluate it.
While we believe that the procedure we propose permitted to recreate exposure data representative of the data summarized in the articles we analysed, their representativeness of occupational exposure to formaldehyde in the general population cannot be assessed directly. It depends on the validity of the publications themselves. In our study, all data available in the literature were retrieved, and only very crude criteria were used to discard irrelevant data before analysis. In particular, the authors did not use the criteria proposed by Tielemans et al. According to those criteria, a large part of the data summarized in this study would have been excluded because of the lack of adequate summary parameters and ancillary information (Tielemans et al., 2002). We chose to include as much data as possible in order to maximize the dataset available to test the feasibility of our methodology and assess the variability caused by the aggregation of different studies. However, it is plausible that the inclusion of multiple studies in the analysis permitted to compensate to some extent for study-specific biases. Moreover, the analysis of the simulated area datasets with statistical models yielded plausible quantitative results regarding the influence of the different variables and the models explained an important percentage of the variability of the simulated data. Finally, the observations drawn from the analysis of the simulated data were similar to the results obtained in an analysis made on an external source of exposure data in the same industrial sector (Lavoué et al., 2005).
| CONCLUSION |
|---|
|
|
|---|
The new method we used in this study allowed the inclusion in the analysis of data that would have been discarded if conventional methods (e.g. average of the means weighted by the sample size or variance) had been used. Moreover, single measurements could be analysed along with summarized measurements. In addition, the equations and assumptions used to simulate the exposure data are explicit and permit the computational aggregation of all available data. This ensures reproducibility of the results by researchers other than the initial assessor(s) and permits quantitative assessment of the uncertainty, as opposed to the black box of expert-only assessments. Finally, the database-like format resulting from the simulation procedure enables to produce the same kind of analyses one would conduct on a standard exposure dataset, in particular the use of statistical models to explore potential exposure determinants. The authors would like to emphasize that such all numerical analysis should not be taken as a replacement for expert analysis of the literature but merely as a tool for industrial hygiene meta-analyses, available to help the exposure assessor to integrate in a consistent and transparent way the results available from several exposure studies. Further analysis of this methodology, by the use of quality criteria to down-weigh or exclude some data, or by using simulations to assess the accuracy of the estimation equations, will allow for a better appraisal of the potential of this methodology for exposure assessment. Similar studies in other industrial settings and for other contaminants are also warranted to assess the generalizability of our results.
| APPENDIX |
|---|
|
|
|---|
Let Y be a random variable following a normal distribution with mean µ and standard deviation
. Let y(1), ... ,y(n) be the order statistics of any sample of size n taken from this distribution, R the range of the sample, and
Justification of equation 6
Usually,
is obtained by finding the ith order statistic (usually the ith smaller value) for which i/n
x/100. Formally, each order statistic follows a distribution of which the expected value can be estimated. In particular, Blom (1958) proposed the following formula for samples drawn from a normal distribution:
|
| (8) |
Justification of equation 7
The standardized range of a random sample from a normal distribution, defined by W = R/
= (y(n)y(1))/
, follows a specific sampling distribution, described in equation form by Hartley (1942). The cumulative density function of this distribution can be estimated by numerical integration. This concept is much used in R-charts in process quality control, where the ranges of sequential process samples are plotted against control limits, with excessive values indicating departure from the initial distribution and loss of control. The control limits are computed as chosen percentiles of the theoretical sampling distribution of the ranges. The equations proposed by Hartley show that the sampling distribution of the range of the standard normal depends only on the sample size; therefore the sampling distribution of the standardized range of a sample of any normal distribution depends only on the sample size. If we know the quantity y(n) y(1) for one sample of size n from a normal distribution, and we assume that this single value is close from its theoretical median, we can then estimate
as follows:
= (y(n)y(1))/Wmedian. Applied to our situation with lognormal distribution, we estimate GSD = exp(
) from a range [ln(b), ln(a)] and a given sample size.
The determination of Wmedian, the theoretical median of the standardized range was performed with the function qnrange of the S-Plus 6.1 statistical software. This function solves the equations proposed by Hartley by numerical integration. As an alternative to this function, a table giving the cumulative probability of the sampling distribution for values of W ranging from 0 to 7.25 and for sample sizes between 2 and 20 can be found in Zwillinger and Kokoska (2000; pp. 6976).
| ACKNOWLEDGEMENTS |
|---|
|
|
|---|
J.L. was supported by the Institut de Recherche Robert-Sauvé en Santé et en Sécurité du travail.
Received March 9, 2006; in final form August 9, 2006
| REFERENCES |
|---|
|
|
|---|
Ahrens W and Stewart PA. (2003) Retrospective exposure assessment. In Nieuwenhuijsen M (Ed.). Exposure assessment in occupational and environmental epidemiology.(Oxford University Press, New York, NY) pp. 10318.
Blom G. (1958) Statistical estimates and transformed beta variables.(John Wiley, New York, NY).
Burdorf A and Van Tongeren M. (2003) Variability in workplace exposures and the design of efficient measurement and control strategies. Ann Occup Hyg; 47:959.
Buringh E and Lanting R. (1991) Exposure variability in the workplace: its implications for the assessment of compliance. Am Ind Hyg Assoc J; 52:613.[Web of Science][Medline]
Caldwell DJ, Armstrong TW, Barone NJ, et al. (2000) Hydrocarbon solvent exposure data: compilation, analysis of the literature. Am Ind Hyg Assoc J; 61:88194.
Caldwell DJ, Armstrong TW, Barone MJ, et al. (2001) Lessons learned while compiling a quantitative exposure database from the published literature. Appl Occup Environ Hyg; 16:1747.[CrossRef][Medline]
Carton B. (1995) COLCHIC chemical exposure database: information on lead and formaldehyde. Appl Occup Environ Hyg; 10:34550.
Centaur Associates. (1986) Case studies of formaldehyde exposure control in six industriesprepared for the Occupational Safety and Health Administration under contract with the Office of Regulatory Analysis. Washington, DC: United States Department of Labor, Occupational Safety and Health Administration (OSHA Docket No. H-225, Exhibit No. 85-116).
Daniels W, Hales T, Gunter B, et al. (1988) Health Hazard Evaluation Report No. HETA-87-099-1938: LouisianaPacific Corporation, Olathe, Colorado. Cincinnati, OH: United States Department of Health and Human Services, Public Health Service, Centers for Disease Control, National Institute for Occupational Safety and Health.
Edling C, Oedkvist L, Hellquist H. (1985) Formaldehyde and the nasal mucosa. Br J Ind Med; 42:5701.[Web of Science][Medline]
Edling C, Hellquist H, Oedkvist L. (1988) Occupational exposure to formaldehyde and histopathological changes in the nasal mucosa. Br J Ind Med; 45:7615.[Web of Science][Medline]
Hartley HO. (1942) The range in random samples. Biometrika; 32:33448.
Herbert FA, Hessel PA, Melenka LS, et al. (1994) Respiratory consequences of exposure to wood dust, formaldehyde of workers manufacturing oriented strand board. Arch Env Health; 49:46570.[Web of Science][Medline]
Herbert FA, Hessel PA, Melenka LS, et al. (1995) Pulmonary effects of simultaneous exposures to MDI, formaldehyde, wood dust on workers in an oriented strand board plant. J Occup Env Med; 37:4615.[Web of Science][Medline]
Hewett P and Ganser GH. (1997) Simple procedures for calculating confidence intervals around the sample mean and exceedance fraction derived from lognormally distributed data. Appl Occup Environ Hyg; 12:13242.
Hornung R and Reed LD. (1990) Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg; 5:4651.
Horvath EP, Anderson H, Pierce WE. (1988) Effects of formaldehyde on the mucous membranes and lungs. A study of an industral population. J Am Med Assoc; 259:7017.
IARC. (1995) IARC monographs on the evaluation of carcinogenic risks to humans. Wood dust and formaldehyde.(International Agency for Research on Cancer, World Health Organization, Lyon) Vol. 62:.
Imbus HR and Tochilin SJ. (1988) Acute effect upon pulmonary function of low level exposure to phenolformaldehyde-resin-coated wood. Am Ind Hyg Assoc J; 49:4347.[Web of Science][Medline]
Kauppinen TP and Niemelä R. (1985) Occupational exposure to chemical agents in the particleboard industry. Scand J Work Environ Health; 11:35763.[Web of Science][Medline]
Kolstad HA, Sonderskov J, Burstyn I. (2005) Company-level, semi quantitative assessment of occupational styrene exposure when individual data are not available. Ann Occup Hyg; 49:15565.
Lavoué J, Beaudry C, Goyer N, et al. (2005) Investigation of past, current exposures to formaldehyde in the reconstituted wood panels industry in Quebec. Ann Occup Hyg; 49:587600.
Lee SA. (1988) Health Hazard Evaluation Report No. HETA-87-309-1906, LouisianaPacific, Corporation, Missoula, Montana. Cincinnati, OH: United States Department of Health and Human Services, Public Health Service, Centers for Disease Control, National Institute for Occupational Safety and Health.
Malaka T and Kodama AM. (1990) Respiratory health of plywood workers occupationally exposed to formaldehyde. Arch Env Health 45:28894.[Web of Science][Medline]
Marquart H, Van Drooge H, Groenewold M, et al. (2001) Assessing reasonable worst-case full shift exposure. Appl Occup Environ Hyg; 16:2107.[CrossRef][Medline]
Money CD and Margary SA. (2002) Improved use of workplace exposure data in the regulatory risk assessment of chemicals within Europe. Ann Occup Hyg; 46:27985.
Mortimer VD. (1982a) Preliminary Survey Report No. 108-17a: Particleboard Plant, Timber Products Company, Medford, Oregon. Cincinnati, OH: United States Department of Health and Human Services, Public Health Service, Centers for Disease Control, National Institute for Occupational Safety and Health.
Mortimer VD. (1982b) Preliminary Survey Report No. 108-19a: Particleboard and Plywood Plants, Springfield Wood Products Facility, Weyerhaeuser Inc., Springfield, Oregon. Cincinnati, OH: United States Department of Health and Human Services, Public Health Service, Centers for Disease Control, National Institute for Occupational Safety and Health.
Mulhausen JR and Diamano J. (1998) A strategy for assessing and managing occupational exposures.(AIHA Press, Fairfax, VA).
Niemelä R. (1986) A tracer pulse method for the assessment of airflow patterns in a particleboard mill. Scand J Work Environ Health; 12:50411.[Web of Science][Medline]
Niemelä RI and Vainio H. (1981) Formaldehyde exposure in work and the general environment. Scand J Work Environ Health; 7:95100.[Web of Science][Medline]
Niemelä R, Riipinen H, Aatola S, et al. (1980) Ventilation in particle board, plywood factoriesOccupational Health Institute Studies Report no 164. Helsinki: Occupational Health Institute.
Niemelä RI, Priha E, Heikkila P. (1997) Trends of formaldehyde exposure in industries. Occup Hyg; 4:3146.
Raaschou-Nielsen O, Hansen J, Thomsen BL, et al. (2002) Exposure of Danish workers to trichloroethylene, 19471989. Appl Occup Environ Hyg; 17:693703.[CrossRef][Medline]
Rappaport SM. (2000) Interpreting levels of exposures to chemical agents. In Harris RL (Ed.). Patty's industrial hygiene. 5th edn. (John Wiley & Sons, Inc., New York, NY) pp. 679745.
Sussell A. (1995) Health Hazard Evaluation Report No. HETA-91-0239-2509, Medite of New Mexico, Las Vegas, New Mexico. Cincinnati, OH: United States Department of Health and Human Services, Public Health Service, Centers for Disease Control, National Institute for Occupational Safety and Health.
Tielemans E, Marquart H, de Cock J, et al. (2002) A proposal for evaluation of exposure data. Ann Occup Hyg; 46:28797.
Triebig G, Schaller KH, Beyer B, et al. (1989) Formaldehyde exposure at various workplaces. Sci Total Environ; 79:1915.[CrossRef][Medline]
Wentrup GJ, Brenk FR, Wenzel M, et al. (1986) Field measurements of formaldehyde for workplace monitoring, using various active, passive methods for personal and area sampling. Diffusive samplingan alternative approach to workplace air monitoring2226 September 1986Luxembourg(Royal Society of ChemistryIn Berlin A, Brown RH, Saunders KJ (Eds.). , London) pp. 32832.
Zwillinger D and Kokoska S. (2000) Standard probability and statistics tables and formulae.(Chapman & Hall/CRC, Boca Raton, FL).
This article has been cited by other articles:
![]() |
L. E. Beane Freeman, A. Blair, J. H. Lubin, P. A. Stewart, R. B. Hayes, R. N. Hoover, and M. Hauptmann Mortality From Lymphohematopoietic Malignancies Among Workers in Formaldehyde Industries: The National Cancer Institute Cohort J Natl Cancer Inst, May 20, 2009; 101(10): 751 - 761. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


