Because not all circumstances can be covered in this chapter, the PI should seek advice from a biostatistician if he or she has questions regarding the design of a clinical trial or the elements described in this chapter.
The objectives of a protocol need to be specified. Ordinarily, the primary hypothesis is the basis for evaluating the success or failure of the protocol. For example, can drug combination abc produce noticeable disease improvement in at least 20% of the patients? Secondary hypotheses also should be explicitly specified. A list of study objectives should be included in every protocol, and the objectives should be ordered, with the most important first. Primary and secondary objectives can be separated.
The PI should specify how the objective is to be measured. If there is a preferred or required technique for obtaining a measurement, the technique should be stated. Are there intervals of time during which measurements should be obtained? With what frequency are measurements performed? Who will do the measuring? The greater the degree to which the parameters can be specified, the more likely a result will be meaningful because each subject in the protocol will be measured the same way.
The PI should clearly define the patient population from which inferences will be made. Simply stating "patients with xyz disease" may not be adequate unless enough patients can be enrolled to represent fairly the full range of the disease. It is more common to restrict the patient population to individuals with specific stages or forms of the disease, or to have separate accrual requirements specified for each of several stages. The protocol should state clearly the eligibility requirements for entry into the study. However, remember that the more specific the requirements for entry, the less generalizable are the results. For that reason, it is best to include only the eligibility and exclusion criteria absolutely necessary for defining the population of interest and to provide for the greatest possible reduction of risk.
"Study design" means the specific manner in which the protocol will be conducted. Will the protocol require that all patients receive the same therapy, or will half undergo randomization to a standard therapy? Is the protocol attempting to escalate doses of a new agent to identify a dose with maximally tolerated toxicity, or to evaluate the efficacy of a drug for which the maximally tolerated dose has already been determined? The specific design of the protocol may require a great deal of thought. Among other factors, the design depends heavily on the degree to which the therapies being evaluated have been developed and tested previously.
Testing a particular drug, agent, or technique for the first time in humans necessitates a small study. In certain disciplines, small quantities of a drug are administered to a handful of patients (e.g., three), and then the dose is systematically increased by a constant amount or proportion until unacceptable toxicity is attained. The protocol must state explicitly the number of patients needed for evaluation at each therapeutic level, as well as the exact description of the intervention at each level, and the conditions in which fewer or more patients would be treated. For example, toxicity may lead to treating fewer patients, but a noticeable improvement may suggest that evaluation of additional patients would be worthwhile. In trials where efficacy is measured, the PI should establish the dose or other measure of the quantity or type of treatment to be tested and to administer equivalent therapy to all patients in the protocol.
The next step is to estimate what the magnitude of effect is likely to be. Determining the sample size for a trial evaluating a single intervention depends on the magnitude of the effect anticipated and the magnitude of errors the PI is willing to accept. Two types of errors exist, often called type I (represented by "alpha") and type II (represented by "beta"). In the single intervention study, a type I error is made when the PI concludes that an intervention is effective above a particular rate (e.g., 20%), when it actually is below that level of efficacy. A type II error has been committed when the PI falsely concludes that an intervention has less than a particular level of effectiveness when, in fact, it has greater than that particular level. For this type of study, (1-beta) x 100% is known as the power of the test of the efficacy hypothesis at a particular level. Normally, in a protocol describing a study with a "one-arm" design, the PI should allow for a type I error of no greater than 5%, and a type II error of no greater than 20%, or else 10% for each type of error.
The PI should consider formally permitting early termination of a simple efficacy study if early indications are that the intervention appears to be doing far worse than expected. An example uses actual data from a trial of a newly developed drug to treat a particular form of cancer. None of the first 14 patients treated had an adequate objective response to therapy. The protocol stated that observing zero responses in 14 treated patients was adequate to conclude that the therapy had less than 20% efficacy. This was sufficient reason to stop and not treat the 16 additional patients for whom original approval had been granted. Recently, improved designs have been developed that have a higher probability of permitting early termination of accrual if the true response rate is low. The PI can discuss with a biostatistician the use of the so-called optimal designs, which are now considered a standard approach in this setting.
The most complicated type of protocol design compares a treatment with "something else" to determine effectiveness. The something else can be a placebo, in which case the PI often tries to keep both the physician and patient from knowing which agent has been administered until all patients have been treated and evaluated. Alternatively, the new treatment may be compared with a therapy considered standard at the time the trial is conducted, or against a different form of the experimental therapy. Regardless of the actual treatments (or control) used in the two (or more) study arms, it is important that the two (or more) sets of patients have similar characteristics, except for the manner in which their condition is being treated. This ensures that the only difference between groups is treatment effect.
Randomization is often used to let probability provide unbiased and approximately equivalent patients on the study arms. It is highly recommended when two interventions are compared. Differences between the two randomized groups of patients should then be due only to treatment effects rather than to important prognostic characteristics. If the groups cannot be randomly selected – especially when the treatment administered can result in long-term physical effects or other changes – alternative approaches can be considered. The means by which patients will be assigned to therapies must be clearly stated in the protocol.
PI's often use stratification in randomized trials to assure that patients with good and bad prognoses are evenly distributed on the study arms. Stratification means the division of patients into groups that are similar with respect to a particular characteristic. A PI who wishes to stratify by sex would randomize males separately from females, assuring that approximately half the males received one therapy and half the other. By stratifying, the PI increases the likelihood that a difference in outcome between the two arms of the trial is due to an actual difference between the treatments rather than to an imbalance in distribution of the prognostic factors.
Choose only a small number of factors on which to stratify, and select only those known (or likely) to be associated with the prognosis. In protocols conducted at the CC, it is rare to accrue enough patients to warrant stratifying for more than two, or occasionally three, factors with two levels each. Because each additional stratification variable subdivides the patients into ever smaller groups, the number of patients in a subgroup can quickly become so small that chance imbalances become likely in patients randomized to a particular therapy.
When a comparative clinical trial is being considered, the PI must take several preliminary steps to determine the sample size. The first step is to revalidate that the primary objective previously set is reasonable for a comparative study. The objective should have one clearly definable, readily quantified end point. To compare differences in quantities, the PI should identify how large a difference – either in absolute or relative terms – is of interest if a difference rarely exists between the interventions being compared. To identify reasonable differences in interventions, the PI can carefully review existing literature on the treatments to obtain the best estimates for the patient population. Conducting a small, randomized pilot study to investigate the magnitude of the parameters also may be warranted. The PI should not assume, for example, that standard therapy will do worse than it is likely to do, and that the new therapy will do better than it is likely to do. The PI must be conservative in identifying the difference because the number of patients to be included in the protocol is determined in large part by the magnitude of the difference that the PI wants to detect.
The next step is to decide if the difference between groups is to be evaluated by a one-sided or a two-sided hypothesis test. In a one-sided test, the objective is to determine if a is greater than b (or if b is greater than a), rather than to determine if a and b differ. A one-sided test is appropriate only if the new treatment will not have toxicity and associated morbidity and mortality, which could actually make it inferior to the standard therapy. The standard approach in many fields, such as oncology, is to select a two-sided hypothesis test in order to allow for the possibility that observed differences, if any, may be opposite to those expected.
To determine the sample size, the PI must also select the maximum probability of errors that can be made. As with a trial of a single intervention, two types of errors exist – but with slightly different interpretations than before because two treatments are being compared. The errors, type I and type II, are described below:
- alpha = type I error = Probability that the PI will decide the interventions are different when they are not different.
- beta = type II error = Probability that the PI will decide the interventions are not different, when, in fact, they are.
Have Same Results
type I error
Better Than The Other
type II error
The PI makes an error whenever he or she makes an incorrect decision about the relative effect of the two interventions. Just as for a study evaluating a single intervention, it is reasonable and customary to allow a maximum type I error probability of 5% and a maximum type II error probability of 20%.
For a comparative clinical trial, the power of the study is the probability of correctly identifying a difference of a specified magnitude; a study is considered to be powerful if it has a high probability of detecting an important treatment difference. In the framework presented above, the power of the study is 1-beta. For a fixed type I error probability, increasing the sample size will increase the power of the study. The ability to detect a clinically relevant effect, or power, can be augmented by determining the correct design and correct sample size for the study and by carefully considering the principal end point of interest, the magnitude of effects that would be of clinical importance, and the acceptable probabilities of making an error.
Determining the size of the sample depends on the nature of the end points and particular characteristics of protocol design pertinent to an individual field, which is well beyond the scope of this chapter. Numerous books and computer programs can provide fairly precise estimates of the numbers of patients needed once the end point, magnitudes of difference, accrual rates, acceptable error rates, and types of hypothesis (one-sided or two-sided) have been specified.
Once the PI has estimated the initial sample size, he or she should think about the feasibility of conducting a trial of that size. If accrual will be too slow, can other institutions be enlisted to participate? Can a slightly larger error rate be tolerated? Are the differences in which the PI is interested realistic?
An adequate rate of accrual into the trial ensures that the trial will not be continued so long as to make the results uninterpretable or no longer scientifically important. If the PI can enter only 10 patients a year in the protocol, but a total of 100 patients are required, the PI probably needs to reconsider some aspects of the protocol design. In general, comparative trials requiring more than 5 years to complete the stated accrual can be difficult to interpret and their findings difficult to use – unless the interventions being compared remain completely consistent throughout the years and nothing that is potentially more beneficial to patients has been developed in the interim.
Regardless of the type of clinical trial being conducted, it should be determined if the results are of statistical significance. The way in which statistical significance is determined depends on the design of the trial and the manner in which the trial questions are specified. Statistical significance can be achieved only if the size of the sample is adequate for identifying an effect of the magnitude that is of interest. Also, statistical significance at a particular level is more likely to be observed if accumulated study data are repeatedly tested for significance. To evaluate data repeatedly requires even greater care in considering the level of significance to be used at each evaluation; if repeated evaluation is planned, it should be specified in the research protocol. The PI may examine multiple end points or results in multiple subgroups improperly if he or she does not adjust the statistical tests in accordance with the actual number of comparisons being made.
A major product of a trial is data, often in very large quantities. Before conducting the trial, the PI should determine the methods and personnel that will be used to manage the data. Sometimes only a set of three or four basic forms is needed for data entry; the forms are entered into a personal computer's data-base package and transferred to a statistical software package for analysis. For larger trials, or for a series of several trials, the PI could have a data base developed for use in his or her personal computer. Programs can be prepared to check for improper codes, invalid or inconsistent dates, or other types of faulty information. By carefully collecting and storing data from the beginning, the PI will be able to easily examine them later. The type of data management to be used, including descriptions or examples of forms, should be included in the protocol.
The PI must specify clearly the statistical considerations relating to the trial's design and analysis. The principal study objectives, the design that will be used to address the objectives, the assumptions regarding expected outcomes, and the sample size required should all be placed in the body of the protocol in the statistical considerations section. The section often will describe the specific statistical techniques that will be used to evaluate the results. If consulted about a protocol, a biostatistician will often volunteer to write the first draft of this section for the PI to ensure that it correctly conveys the proper information.
Paying attention to details, such as specifying the objectives concretely and properly designing a trial to measure them, is worthwhile because it will give the PI more confidence in the results. This will be accomplished with the fewest possible subjects put at risk and with a minimum expenditure of limited resources.
Seth M. Steinberg, Ph.D.
National Cancer Institute