Surgical Oncology
Volume 19, Issue 2 , Pages 55-58, June 2010

Survival analysis in clinical trials: Old tools or new techniques

Service de Biostatistique, Institut Curie, 26 rue d'Ulm, Paris 750015, France

Accepted 21 January 2010.

Article Outline

 

Back to Article Outline

1. Introduction 

Almost 40 years have passed since the publication by D.R. Cox in 1972 of the most widely used method of life table analysis [1]. The same year R. Peto and J. Peto published the logrank test statistic [2]. These approaches are now recognized as the two major cornerstones of survival data analyses. In the meantime, numerous statisticians have developed or updated tools for survival analysis. In the journal, Maetani and Gamel [3] question the importance and the applicability of these classical approaches and wonder just how useful some new techniques can be in clinical research. In particular, are they still relevant when a fraction of the population is cured, which is hopefully more and more frequent with the advance of surgery and systemic treatments.

We surely do not have the definitive answer. In this communication, we give a brief overview of the strengths and limitations of the ‘classical’ tools. We argue that the logrank test and the Cox model provide reliable results even when in the presence of a non-negligible cure rate. However, they depend on the proportional hazard assumption. If this assumption is not verified, we will comment on how the cure models reported by Maetani and Gamel [3] can help in analysing survival data.

Back to Article Outline

2. Survival data: a difficult summary 

Survival data essentially measures time to some event of interest. Events can be just about anything such as death, relapse, negative dosage, etc., as long as they are defined. However, the whole story is made complicated by the presence of censoring. At the time the researcher analyses the data, some patients may be lost to follow-up or may just be alive without having experienced the event. The patients are then known to have been alive until the last assessment, but nothing can be extrapolated beyond this time point. Survival data carry two pieces of information: the survival status (alive or not, event/no event, etc.) and the time to the event or to the last assessment if no event was recorded. At least two types of summary statistics are possible: the delay to the event (median time to event, 25th percentile, etc.) and the percentage of patients not having experienced the event before a given time point (the 2-year survival rate for instance). Both measures have limitations. For the survival rate to be useful, the time point should be of clinical interest but consensus is not easy to reach. Recently de Gramont and colleagues [4] advocated that benefit of a new treatment in colon cancer should be evaluated over a period longer than the usual 5 years. Moreover, the percentage of events may be biased since patients dying just after the time point are counted as not having an event. Conversely, median survival can be extremely versatile [5]. For instance in gastric cancer, adjuvant chemotherapy improved median overall survival by 3 years [6], which is apparently not in line with the moderate risk reduction of 6.3% at 5 years obtained on the same data. Therefore, the complete survival curves are often much more informative.

Hazard curves plotted over time are another representation of survival data, and are being provided more and more. In randomized trials, the hazard rate is similar to the incidence rate at different time points for the population at risk. This conveys important information as to whether and when the events occur. When two therapies are investigated, the hazard ratio (HR), so familiar in the medical literature, then compares the hazard or risk of an event in patients treated by a given treatment with the hazard in patients treated with a control. In the gastric example, HR = 0.83, correspond to a 17% risk reduction. No direct conversion between the HR and median survival exists except for the particular case of exponential survival. One cannot just apply the risk reduction to calculate the extra median survival.

Back to Article Outline

3. Classic tools: Kaplan–Meier curves, logrank test and Cox model 

3.1. In theory 

As recalled by Maetani and Gamel, each technique is based on some hypotheses that can or cannot be verified. The first one is that all patients will eventually die or experience the event. The goal of any survival analysis with these tools is then to show a modification in the time to death (or to some event) and not to measure the event rate at a given time point. That is why the analysis in the 1972 paper [1] of the 6-MP data is perfectly correct. The HR of 0.2 only indicates that the instantaneous rate is reduced but it does not enable to infer on the cure rate. In other words, the velocity of death is reduced but all patients are still on the fatal road. More recently the cost-benefit of sorafenib in advanced hepatocellular carcinoma (Child A) has been questioned in Great Britain [7]. The reduction of the HR by 25% observed in the phase III trial [8] translated in a 3-month increase in median survival, but no patients were cured. If we follow up patients for a sufficiently long time, both survival curves come to 0. Only the speed is different.

3.2. In practice 

Does this mean that tests and survival curves are just flaws when a large fraction of the patients are cured or censored? No, indeed. The logrank test and the semi-parametric Cox model have been shown to be quite robust to censoring. Analyses with incomplete follow-up are unbiased assuming censored patients have the same risk as uncensored patients. If 80% of the patients are cured, the comparison of survival curves with the logrank test is still valid. Likewise, the Cox model provides reliable estimates of the treatment effect adjusted on covariates even if we are in an adjuvant setting with curative resections as long as the proportional hazards assumption is verified.

3.3. Proportional hazards assumption 

This is one of the main limitations of classical tools. When the difference between two survival curves is summarized by one number (HR), we assume it to be constant over time. In other words, the treatment effect is maintained throughout follow-up. Even if all patients eventually die, the death rate is constantly slower in one arm as compared to the other one. For instance in gastric cancer, this means that at any point in time patients have a 17% risk reduction of dying.

This assumption is not necessary for the computation of the logrank statistic but it is central to the interpretation of the results of the test. A statistically significant difference can be obtained when the curves cross, i.e., when the treatment is better than the control for a given period of time and much worse later on. Even if the two curves do not cross, this test can detect some difference that may only correspond to a transient benefit of limited interest. A good example was provided with the NSADP study C-08 presented during ASCO 2009 [9].

When using the semi-parametric Cox model, this is a crucial assumption. If it is not verified and if the censoring rate is high, conclusions may be both misleading and uninterpretable. For instance consider an old trial comparing the outcome of marrow transplantation with that of continued chemotherapy for adults with acute non-lymphoblastic leukaemia who achieve a first remission [10]. When compared to the chemotherapy group, patients undergoing transplantation had a higher risk of dying during the first 6 months after remission induction but a lower risk of dying thereafter. We could imagine that trials comparing surgical approach to therapeutic one would yield the same type of results since chemotherapy alone will probably delay time to progression but is unlikely to be curative. As a result survival curves cross: chemotherapy curve being above surgery until some time point after which they will cross and the surgery curve should reach a plateau. Yet, Cox model would show 2 parallel curves!

How should we compare different interventions that may have different effects according to the time point (surgery vs chemotherapy for instance)? Even though extra developments have been proposed to address the issue of non-proportionality (stratification and time-varying parameters among others) they do not allow for distinguishing between curative treatment and treatment that postpones the event. This is the second limit of the classical tools. Are parametric models incorporating cure rate fraction an adequate solution as suggested by Maetani and Gamel [3]?

Back to Article Outline

4. Parametric models with cure rate fractions 

When we are interested in modelling the time to an event in the investigated arms and not only the relative difference between the two survival curves, one should rely on parametric models. The semi-parametric Cox model is no longer appropriate. Among the large family of parametric models for survival data, some models incorporate the possibility to have a fraction of patients cured.

Cure rate models assume that there is a mixture of two groups of patients: cured and not cured. Patients not cured have a survival distribution that can be modelled using parametric models. Their main advantage is the possibility to estimate both the percentage of failure and the time to failure resulting in a more accurate picture of the clinical benefit.

These models fit the data much better when a plateau occurs to the right-hand side of the survival curve. As described by Metani and Gamel [3] and Frankel and Longmate [11], they may also be more powerful to detect a treatment effect when the treatment only delays time to relapse but does not modify a non-null cure. Last but not least, these models allow introducing covariates related either to the fraction of cured patients or to the time to event, which brings extra flexibility.

These models appear attractive in the field of oncology, where the presence of growing tumour cells and their possible eradication seems appropriately modelled by failure models with cure. For instance, Yakovlev proposed a model based on clonogenic cells in the adjuvant setting [12]. Each clonogen is a tumour cell that has the property to develop and to produce relapse or metastases. Some patients do not have any clonogen after complete resection and are then cured. Others will relapse.

The main advantages of parametric models (with or without cure rate) have been summarized by Cox and Oakes [13] who concluded that, asymptotically, well-fitting parametric models should yield more efficient parameter estimates than Cox regression if either one of the following conditions applies: (i) there is a strong effect, (ii) follow-up depends on covariates, (iii) there is a strong time trend in covariates. One can notice that in randomized clinical trials, points (ii) and (iii) are not verified and treatment effect is exceptionally strong.

Two important limits to this modelling approach are often raised. To estimate the time to failure, a model for the survival distributions must be specified. A large choice is available (Lognormal, exponential, accelerated, Gomperzt, etc) that correspond to different hypotheses. Yet, the survival distribution is often a complicated and unknown mixture that is not accurately fit by the above mentioned models. The robustness of the conclusions under model misspecification is unclear and despite diagnostic tools [14], selection of a correct and valid model is often challenging with limited sample sizes. This is one of the reasons for the success of the Cox semi-parametric model: inference is not dependent upon the shape of the survival function.

The second concern is the required number of patients to estimate the different parameters. Parametric survival models for the non-cured patients can have only one parameter but they lead to an over-simplification that is not acceptable in many situations. At least two parameters are then usually used and a third one is necessary to estimate the cure rate fraction. Using a more complex model necessarily accompanies with extra estimation burden. If computational resources now enable to easily fit models as the Boag model developed more than 50 years ago [15], the required number of patients (or events) is larger than with the Cox model. In addition, when several parameters capturing different aspects of the treatment effect are estimated, several tests can be performed leading to an increase in the risk of false positive results. To control this false positive result rate (the type I error rate, usually set to 0.05), more stringent tests are required and hence larger sample sizes.

Back to Article Outline

5. Mimic the reality or inference on the treatment effect? 

We saw that the main interest of cure rate models is to be able to evaluate the treatment effect on both the cure rate and the time to failure. In particular, it is very promising when comparing curative treatments with non-curative ones or when covariates are known to impact specifically the time to event or the cure rate. However, this situation apart, how useful is this extra information? For instance, in a setting where a fraction of patients are cured, how interesting is a treatment that delays the events but does not modify the cure rate? Likewise, the palliative setting may not be the best field for the application of cure rate models since prognosis is very poor and the cure rate is close to 0.

More generally, we would like to question the interest of sophisticated modelling to detect the treatment effect. Typically, in clinical research or in epidemiology, most statistical models have very poor goodness of fit. One is used to saying that they can hardly predict the past. However, “basic” survival models and logrank tests have shown their ability to detect the treatment effect when the effect is lasting over some period. If we are interested in a more subtle impact, much information can be drawn from non-parametric tools such as the simple Kaplan–Meier curves. Survival rates at specific time points, and life expectancy can be calculated. Within the framework of the Cox model, the survival curves (and not only the ratio) can be estimated using the Breslow or Aalen estimators for instance. Finally, in randomized clinical trials, covariates are expected to be balanced between the investigated arms and simple tests can detect the treatment effect on any predefined aspect without relying on modelling.

In our opinion, parametric models with cure rate fraction are very useful statistical tools in observational cohort studies when no randomization can be done or in randomized trials comparing interventions with different types of expected benefit. They allow for capturing mechanistic aspects of the underlying disease and of the different types of interventions. Their interest in other randomized trials is more doubtful.

Back to Article Outline

Conflict of interest statement 

None declared.

Back to Article Outline

Funding sources 

None declared.

Back to Article Outline

Authorship statement 

Xavier Paoletti and Bernard Asselain are responsible for the content of the communication entitled “Survival analysis in clinical trials: old tools or new techniques”.

Back to Article Outline

Acknowledgements 

The authors are indebted to Dr. Andrew Kramar for his fruitful discussions and precious advises.

Back to Article Outline

References 

  1. Cox DR. Regression models and life tables (with discussion). J R Stat Soc B. 1972;34:187
  2. Peto R, Peto J. Asymptotically efficient rank invariant test procedures. J R Stat Soc A. 1972;135:185
  3. Maetani S, Gamel JW. Evolution of cancer survival analysis. Surg Oncol. 2010;
  4. De Gramont A, Hubbard J, Shi Q, O'Connell MJ, Buyse M, Benedetti J, et al. Association between disease-free survival and overall survival when survival is prolonged after recurrence in patients receiving cytotoxic adjuvant therapy for colon cancer: simulations based on the 20,800 patient ACCENT Data Set. J Clin Oncol 2009 Dec 14. (epub ahead of print).
  5. Michiels S, Piedbois P, Burdett S, Syz N, Stewart L, Pignon JP. Meta-analysis when only the median survival times are known: a comparison with individual patient data results. Int J Technol Assess Health Care. 2005;21(1):119–125Winter
  6. Buyse ME, Pignon J, On behalf of the GASTRIC group . Meta-analysis of randomized trials assessing the interest of postoperative adjuvant chemotherapy and prognostic factors in gastric cancer. J Clin Oncol. 2009;27(Suppl.):15;(Abstr 4539)
  7. NICE website . Available at: <http://www.nice.org.uk/>
  8. Llovet JM, Ricci S, Mazzaferro V, Hilgard P, Gane E, Blanc JF, et al. Sorafenib in advanced hepatocellular carcinoma. N Engl J Med. 2008;359(4):378–390
  9. Wolmark N, Yothers G, O'Connell MJ, Sharif S, Atkins JN, Seay TE, et al. A phase III trial comparing mFOLFOX6 to mFOLFOX6 plus bevacizumab in stage II or III carcinoma of the colon: results of NSABP Protocol C-08. J Clin Oncol. 2009;27(Suppl.):18;(abstr LBA4
  10. Appelbaum FR, Dahlberg S, Thomas ED, Buckner CD, Cheever MA, Clift RA, et al. Bone marrow transplantation or chemotherapy after remission induction for adults with acute nonlymphoblastic leukemia. A prospective comparison. Ann Intern Med. 1984;101(5):581–588
  11. Frankel P, Longmate J. Parametric models for accelerated and long-term survival: a comment on proportional hazards. Stat Med. 2002;21(21):3279–3289
  12. Tsodikov AD, Asselain B, Fourque A, Hoang T, Yakovlev A. Discrete strategies of cancer post-treatment surveillance. Estimation and optimization problems. Biometrics. 1995;51(2):437–447
  13. Cox DR, Oakes D. Analysis of survival data. London: Chapman & Hall; 1984;
  14. Nardi A, Schemper M. Comparing Cox and parametric models in clinical studies. Stat Med. 2003;22:3597–3610
  15. Boag JW. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc B. 1949;11:15–53

PII: S0960-7404(10)00022-8

doi:10.1016/j.suronc.2010.01.004

Surgical Oncology
Volume 19, Issue 2 , Pages 55-58, June 2010