breast conservation surgery, breast neoplasms, prognostic factor, radiation treatment, seer program, subgroup analyses
A Danzel. Statistical Interaction In The Survival Analysis Of Early Breast Cancer Using Registry Data. The Internet Journal of Oncology. 2004 Volume 2 Number 2.
Extreme heterogeneity of neoplastic cells from cell transformation to invasion and metastasis is well known (1,2,3,4). Cells originating from the same tumor may express entirely different behavior (5). Gene expression levels from the same gene mutation are not identical (6). Circumstances of diagnosis and socio-demographic differences (7,8) contribute to the heterogeneity. In view of this variability, both biological and non-biological, it cannot be expected that any given treatment would be identically efficient in all patients (9). Identifying subsets of patients that might or might not benefit from a treatment, termed “subgroup analyses”, is a cornerstone for medical practice.
Typically, in a subgroup analysis treatment effects in multiple subsets of patients enrolled, e.g., in a clinical trial, will be systematically evaluated (10,11). Such an analyses is subject to several problems, related to the risk of false-positive and false-negative findings (12).
A false-positive finding is a 'statistically significant' result, which is due to chance rather than to a non-zero treatment effect. The probability of such a finding increases with the number of comparisons performed using a particular dataset (13,14). For instance, even if treatment has no effect, when 100 subgroup tests is performed, each at 0.05 significance level, the probability of finding at least one statistically significant (and thus false-positive) result is 0.994. We are thus almost certain to get at least one ‘positive’ finding. In fact, the expected number of false-positive results will be 100 × 0.05 = 5 (15).
A false-negative finding, on the other hand, is the 'statistically non-significant result', or failure to detect a treatment effect, when the effect in fact exists. The probability of such a finding increases with a decreasing sample size. Subgroup testing requires the data to be split and these smaller datasets will therefore have a reduced power to detect a treatment effect.
To address these issues, several recommendations have been formulated. In essence, they suggest the use of a limited number of pre-planned comparisons within pre-specified groups; adjustment of significance levels for multiplicity of comparisons; adjustment of trial size; and inference based on formal tests of interaction in appropriate regression analyses (12,13,15,16,17,18).
While most of the recommendations are intuitively easily understood, the use of tests of interaction presents a practical problem. Most likely, this can be attributed to the somewhat more complex nature of the technique. From this point of view it is unfortunate that major textbooks sometimes do not list interaction as a keyword (19), incorporate interaction terms in models without explanatory guidance (20), or treat the more difficult issues with sentences like: “Interactions among more than two variables can be exceedingly complex” (21). There is still a clear need for a careful explanation and exemplary applications of the technique, which could encourage its more extensive use.
The purpose of this paper is two fold, clinical and technical. Using conventional survival analysis models applied to a large population data, we found previously that radiotherapy was associated with a survival advantage in that population data (22). In the present study we use the analysis of interaction to investigate whether the effect of post-operative radiotherapy may depend on type of surgery, stage of breast cancer, nodal status and extent of node examination. The clinical investigation is justified in view of the continuing debate on the role of radiotherapy in different subgroups of patients (23). However, in the paper somewhat more attention will be given to the presentation of the analytical steps, to give more insight into the use of the tests of interaction.
Patients And Methods
Patients were selected from the SEER (Surveillance, Epidemiology, and End Results) nine-registries database (24) according to the following criteria: women diagnosed 1988-1998, aged 40-69, primary non-inflammatory histologically confirmed invasive carcinoma confined to breast, tumor largest diameter
In the SEER data collection prior to 1998, receipt of radiotherapy was defined as radiation delivered within 4 months of the initial treatment. Information on systemic treatment was not available for use as a variable. Hormone receptors status was available for 1990 and later.
The effect of treatments and patient characteristics on survival time was examined using the proportional hazards model (25). The following variables were included in the model: patient registration area, age at diagnosis, race, marital status, multiple primaries indicator, histology, grade, estrogen receptor status, tumor size (largest diameter), number of nodes examined, number of positive nodes, type of surgery, and delivery of post-surgery radiation. Quantitative variables were used in the models as untransformed continuous variables. Qualitative data were re coded using binary variables.
Two variables are said to interact in their effect on the response if the effect of the first variable is different at the different levels of the second variable (26). Thus, the effect of the first variable (e.g., treatment) is different in the subgroups defined by the levels of the second variable (e.g., tumor size). In modelling, a simple way to describe the interaction is to include in the model the product of the two individual variables. The product is termed a “first order interaction”, as it is the lowest level of interaction that can be encountered, which necessarily involves two variables since there can be no interaction with less than two variables. A test of statistical significance of the coefficient for the product is the test of interaction, i.e., the test of the assumption about the constant effect of one of the variables across the levels of the other one. Analogously, one can define, model, and test interactions of higher orders, i.e., involving three or more variables.
For the analyzed data, tests of interaction were performed by entering into the proportional hazard models selected multiplicative interaction terms between the following binary variables: the use of radiotherapy (RT) vs. no RT; the use of breast conserving surgery (BCS) vs. total mastectomy (TM); T2 vs. T1 stage; extensive (>10 nodes examined) vs. limited (≤ 10 nodes) node examination; extensive (≥ 4 nodes involved) vs. limited (< 4 nodes) nodal involvement (27). In analogy with stepwise regression, we examined different orders of interaction, simplifying the model by excluding non significant terms.
All computations were performed using SAS v8.01 (SAS Institute Inc, Cary, NC, USA). Tests of significance and confidence intervals were computed using the Wald test (20). Two-sided 0.05 significance level was used to assess results of significance tests. A description of the SEER data regarding surgery and radiotherapy and of the verification of proportional hazards assumptions was provided in a previous analysis (22).
There were 60,349 patients records matching the selection criteria. Follow-up cutoff date was December 31, 1998. Median follow-up of patients alive was 56 months (range 0-131 months). The total number of events (death from any cause) was 7,090.
The distribution of selected characteristics by tumor and by type of treatment is shown in Table 1. BCS patients tended to be 2-3 years younger and tumor size tended to be smaller than TM patients. BCS patients who did not receive RT tended to be 1 year younger than BCS patients who received RT. The extent of node examination was comparable among the different subgroups. The number of nodes involved was particularly large in N+ patients who had TM and post-surgery RT (Table 1). Other details have been presented elsewhere (22).
Note that all selected patients had an axillary dissection (ALND). When BCS or TM is mentioned, it means BCS with ALND or TM with ALND, respectively.
The result of the proportional hazards model without interactions is reported in Table 2. RT was associated with a statistically significant 10% reduction in mortality hazard (hazard ratio, HR=0.904) relative to no-RT. BCS was associated with a statistically significant 13% reduction of the hazard (HR=0.874), relative to TM. Note that, since the model does not include interactions, the effect of RT is assumed to be the same irrespectively of whether the patient was treated with BCS or TM. Thus, for both types of surgery, an important beneficial effect of RT might be inferred.
For illustrative purposes, we will consider another model, which includes the first order interaction (BCRT) between type of surgery and radiotherapy (Table 3). The interaction is expressed as the product of the binary variables BCS and RT. It assumes value 1 only if both BCS and RT were used, and 0 otherwise. Consequently, the term needs to be taken into account only in the computations for the subgroup of patients treated with both BCS and RT. The hazard ratio associated with the interaction term is estimated to be equal to 0.757 and is statistically significant. This result can be interpreted as follows: in the absence of RT, the mortality hazard for patients treated with BCS was equal to 1.071 of the hazard of patients not treated with BCS (TM patients). However, if RT was applied, the hazard for BCS treated patients was equal to 1.071 x 0.757 = 0.811 of the hazard of TM patients. Thus, the statistically significant interaction term implies a substantial modification of the effect of RT by BCS: for BCS treated patients, RT is beneficial, but has no effect for patients treated with TM.
It is worth noting here that, to investigate the possibility of the influence of BCS on the effect of RT using a subgroup analysis, two separate analyses of data would have to be performed: in BCS and TM patients. The two resulting estimates of the effect of RT would then have to be compared. An additional complication would be the need to adjust the estimates for the effect of other covariates. The model presented in Table 3 addresses all these issues simultaneously.
To investigate the possibility of a differential effect of RT in a more detail, a series of models containing interactions between RT, BCS, T-stage, number of positive nodes (0, 1 3, 4+ nodes), and extent of node examination (less than 10 or 10+ examined nodes) was considered. First, a model containing fourth-order interaction terms involving all the factors mentioned above, as well as all lower-order interactions, was fitted. Since it was found that the fourth-order terms were not significant, the model was simplified by deleting them and testing the joint significance of all third-order terms (for quadruples of factors). These terms were also found not to be significant. Hence, the model with only second-order (for triplets of factors) and lower-order interactions was considered. After deleting non-significant second-order terms (and, if possible, first-order interactions), the model presented in Table 4 was obtained. It was found that neither a small number of positive nodes (1 to 3) nor a small number of examined nodes (less than 10) changed the effect of treatments. The model suggests that T2 modifies the effect of both RT and BCS (as suggested by the presence of T2•RT•BCS interaction), while a large number of positive nodes (4+) influences mainly RT (T•(4+nodes)•RT interaction).
To facilitate interpretation, a summary that presents the factorial layout derived from the model of Table 4 was computed (Table 5). The factorial table shows hazard ratios for RT and BCS for different combinations of T-stage and category of node positivity. The details of the computation procedure needed to derive the table from the model is shown in Table 6.
Example of calculations for RT and BCS in T1 with 4+ positive nodes: the relevant main effect parameters from Table 4 are bcs, rt and n2 (T1 is the reference category for T2); the relevant interaction effects are RT•BCS (rtbc) and 4+nodes•RT (n2rt). Consequently, the hazard ratio rt*n2*bcs*rtbc*n2rt is obtained. Table 5 shows the corresponding numerical value calculated from the estimates given in Table 4: 1.121 × 2.454 ×1.106 × 0.627 × 1.008 = 1.92.
Table 5 might be read in different ways depending on the center of interest. We will examine it from the perspective of treatment combinations: BCS vs. BCS+RT, TM vs. TM+RT, BCS vs. TM.
Among patients treated with breast conserving surgery, radiotherapy was associated with a reduced mortality hazard ratio, as compared to patients who did not receive radiotherapy. This reduction was consistent for all combinations of T stage and category of node positivity: the hazard ratios for RT vs. no RT (relative to T1 patients with 0 3 positive nodes treated with TM and no RT) in Table 5 were equal to 0.78 vs. 1.11 for T1 and 0 3 positive nodes, 1.92 vs. 2.71 for T1 and 4+ nodes, 1.19 vs. 1.33 for T2 and 0 3 nodes, and 2.40 vs. 3.44 for T2 and 4+ nodes. From Table 6 it follows that, using the estimated hazard ratios for the model presented in Table 4, the ratio 0.78 vs. 1.11 can be expressed as rt*rtbc, 1.92 vs. 2.71 as rt*rtbc*n2rt, 1.19 vs. 1.33 as rt*rtbc*t2rt*t2rtbc and 2.40 vs. 3.44 as rt*rtbc*t2rt*t2rtbc*n2rt*t2n2rt. Based on the model one can construct tests whether the products are equal to 1 (i.e., whether there is no difference in the mortality hazard ratio) . The tests yield the following
Among patients treated with total mastectomy, radiotherapy – as compared to no use of radiotherapy – seemed to increase mortality hazard ratio, except for T2 stage with 4+ positive nodes (Table 5, columns TM, rows RT vs. no-RT: hazard ratios of 1.12 vs. 1 for T1 and 0 3 positive nodes, 2.77 vs. 2.45 for T1 and 4+ nodes, 1.38 vs. 1.31 for T2 and 0 3 nodes, and 2.79 vs. 3.41 for T2 and 4+ nodes). Significance of these differences can be achieved by testing whether the following ratios are equal to 1: rt, rt*n2rt, rt* t2rt and rt*t2rt*n2rt*t2n2rt (see Table 6). The tests yield the following
Breast conserving surgery in the absence of radiotherapy did not seem to have any effect as compared to total mastectomy (Table 5, rows No RT, columns BCS vs. TM: hazard ratios of 1.11 vs. 1 for T1 and 0 3 positive nodes, 2.71 vs. 2.45 for T1 and 4+ nodes, 1.33 vs. 1.31 for T2 and 0 3 nodes, and 3.44 vs. 3.41 for T2 and 4+ nodes). In this case the significance of the differences can be verified by testing whether bcs=1 (
As had been commented by Harrell (21), the complexity and the difficulty of interpretation of interactions increase steeply when it involves more than two variables. Raw results such as those shown in Table 4 may be difficult to understand at first sight and are prone to erroneous interpretation. Consider for example the non-significant variables labeled
When answering the above question one needs to consider the form of the model presented in Table 4. In particular, the presence of interaction terms involving
The above example shows that the estimated effects of variables depends on the structure of the model, and in particular, on the presence of interactions involving them. It also indicates that interpretability is not a trivial issue. To that aim, we find that the construction of a factorial summary table, like Table 5, is useful, if not almost necessary. It enhances the presentation of results and might facilitate the comprehension of how statistical interaction may relate to subgroup analyses.
Significant interactions detected in Table 4 indicate that the simpler models presented in Tables 2 and 3 are inappropriate. The results of interaction tests might be summarized as follow:
The use of radiotherapy was in general beneficial for patients treated with breast conserving surgery with axillary dissection, but the magnitude of the positive effect depended on T stage and the number of positive nodes.
For patients treated with total mastectomy the application of radiotherapy in general did not seem to have a favorable effect, except for T2 patients with 4+ positive nodes.
Breast conserving surgery without radiotherapy appeared equivalent to mastectomy without radiotherapy, although one cannot rule out the possibility of a false-negative finding.
A few remarks on the validity of the aforementioned are worth mentioning. First, they were obtained as a result of an analysis that was exploratory in nature. That is, it did not start with a pre specified set of hypotheses to be confirmed, as it might be the case in, e.g., a clinical trial. For this reason, no adjustment for multiple testing was applied. Admittedly, while this increases the statistical power of the analysis, it also increases the risk of false positive findings. Therefore, the conclusions based on the statistically significant results of performed tests should be treated as statements that would require an independent verification.
Secondly, since the conclusions are based on an analysis of observational data, one needs to consider a possibility of bias. The imbalance in the distribution of different patients characteristics between the treatment groups (Table 1) indicates that the data may be subject to some selection processes. The imbalances observed in Table 1 are not of the main concern, since their effect can be removed by using the models like those presented in Tables 2 4. However, a hidden bias might still exist, if treatment was systematically allocated to some categories of patients for undocumented reasons related to patient’s prognosis. Because the presence of a hidden bias can never be excluded, results based on observational studies like the present one cannot be accepted without consideration of randomized trials or other collateral evidence.
Regarding the clinical plausibility, the present results for breast conserving surgery confirm previous findings (22), and are concordant with a recent meta-analysis of breast-conserving surgery and radiation that found a survival advantage with radiation (28). For patients treated with mastectomy, the results suggest a favorable effect of radiotherapy in higher risk patients (T2 stage with more nodes involved), but no significant effect in lower risk patients. This seems in discrepancy with clinical trials of post-mastectomy radiotherapy that found no significant interaction with tumor size or with the number of positive lymph nodes (29), or a significant interaction between radiotherapy and number of nodes removed only in patients who survived less than 4 years (30). We have currently no explanation. A tentative hypothesis is the confounding by the variability of lymph node dissection that would require more complex modeling (31), but this is an investigation that is beyond the purpose of the present analysis.
Statistical guidelines recommend the use of formal tests of interactions instead of subgroup analyses. However, examples of application of the procedure are rare. We applied systematic stepwise tests of interactions within proportional hazards models to the study of a large registry database of breast cancer patients, in order to investigate the changes in the effect of radiotherapy related to the use of breast conserving surgery and other clinical factors. The major finding is that in the absence of radiotherapy, breast conserving surgery patients presented no survival disadvantage as compared with mastectomy patients, while with the addition of radiotherapy, a substantial survival advantage was observed in all subgroups of patients.
Vincent Vinh-Hung (Oncology Center, Academic Hospital, Vrije Universiteit Brussel, Belgium) and Tomasz Burzykowski (Limburgs Universitair Centrum, Center for Statistics, Diepenbeek, Belgium) were the original authors. The paper underwent several revisions. Reviewers were not sure whether it addressed methodological or clinical audience. For statisticians, the topic is not new. For clinicians, it is too difficult. The original authors could not pursue this work further, nevertheless they gave full permission for the present submission.