1. Is difference in proportion appropriate measure to compare performance of a drug over another one?

    Is difference in proportion appropriate measure to compare performance of a drug over another one?

    Introduction

    For past few weeks, a question lingered in my mind that “Is the traditional approach of assessing difference in proportion (both in ways of arithmetic difference and ratio) between intervention A and intervention B as a way to ascertain the performance of intervention A and intervention B appropiate?”. The question came in my mind after observing practitioners dogmatically approving a particular intervention over another one based on so called statistically significant difference (p << 0.05 or miniscule confidence interval not including value of no effect, usually 0 for difference or 1 for ratio) in proportion of success (say patients surviving at a certain fixed time point) between the two groups.

    I will be discussing about the actual meaning of p value and confidence intervals of difference in proportions. I will also try to highlight ill effects and limitations of using confidence intervals in deciding which of the two interventions is better in a properly conducted experimental design (no fallacy in study design and conduct).

    I will be using simulation techniques to explain my point.

    A clinical scenario

    Let us assume that we have discovered a new drug which is more effective than the standard of care drug A and interested in proving that drug B is more effective than drug A. We assume that our outcome of interest is “proportion of patients dead at the end of 6 months of starting the treatment”. Let us assume that patients on Drug A have 6 months mortality of 60% and patients on Drug B have 6 months mortality of 40% (reduction of mortality rate by 20%)(population parameters, usually unknown to us).

    We take 2n number of patients and randomly assign Drug A and Drug B to n patients each and observe them for 6 months (we assume appropiate randomisation, complete follow up and independence of all patients) to observe proportion of death in each group.

    We will use 95% CI for all the measures throughout this post.

    library(plyr)
    library(ggplot2)
    
    ## Package SparseM (0.99) loaded.
    ##     To cite, see citation("SparseM")
    
    library(Hmisc, quietly = T)
    
    ## Loading required package: splines
    ## 
    ## Attaching package: 'Hmisc'
    ## 
    ## The following objects are masked from 'package:plyr':
    ## 
    ##     is.discrete, summarize
    ## 
    ## The following objects are masked from 'package:base':
    ## 
    ##     format.pval, round.POSIXt, trunc.POSIXt, units
    
    N <- list(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 1000, 2000, 10000)
    
    res.ci <- function(n, seed, prob.a1, prob.b1) {
    
      samp.n <- function(n, prob.a1, prob.b1) {
      a <- sample(c(0,1), size = n, replace = T, prob = c(1 - prob.a1, prob.a1))
      b <- sample(c(0,1), size = n, replace = T, prob = c(1 - prob.b1, prob.b1))
      df.wide <- data.frame(deathA = a, deathB = b)
      df.narrow <- data.frame(treat = c(rep("a", length = n), rep("b", length = n)), death = c(a, b))
      return(list(df.wide = df.wide, df.narrow = df.narrow))
    }
    
      samp <- samp.n(n, prob.a1, prob.b1)
    
      # calculate ci of propA and propB separately
    
      bt <- function(x) {
    
        r <- binom.test(table(factor(x, levels = c(1,0))))$conf.int
        return(c(lcl = r[1], ucl = r[2]))
      }
    
      ciA <- bt(samp$df.wide$deathA)
      ciB <- bt(samp$df.wide$deathB)
    
      # compute ci of difference in proportion and p value
    
      matAB <- laply(.data = samp$df.wide, .fun = function(x) return(c(`1` = sum(x), `0` = sum(x == 0))))
    
      propDiff <- prop.test(matAB)
    
      # Get number of patients who died in A but not in B
    
      # logistic regression analysis: OR and their CIs
    
      l.mod <- glm(death ~ treat, data = samp$df.narrow, family = binomial(link = "logit"), x = T, y = T)
    
      or.b <- exp(coef(l.mod)[2])
      conf.int.b <- exp(confint(l.mod)[2,])
    
      # Prediction of l.mod for "a" and "b" with CI
    
      nd <- data.frame(treat = c("a", "b"))
    
      pred.lp <- predict(l.mod, nd, se.fit = T, type = "link")
    
      pred.prop <- plogis(pred.lp$fit)
      pred.ucl.prop <- plogis(pred.lp$fit + 1.96*pred.lp$se.fit)
      pred.lcl.prop <- plogis(pred.lp$fit - 1.96*pred.lp$se.fit)
    
      res.df <- data.frame(num = nrow(samp$df.wide), lclA = ciA[1], uclA = ciA[2], lclB = ciB[1], uclB = ciB[2], lclDiff = propDiff$conf.int[1], uclDiff = propDiff$conf.int[2], pDiff = propDiff$p.value, orB = or.b, lclOR = conf.int.b[1], uclOR = conf.int.b[2], pred.propA = pred.prop[1], pred.lcl.propA = pred.lcl.prop[1], pred.ucl.propA = pred.ucl.prop[1], pred.propB = pred.prop[2], pred.lcl.propB = pred.lcl.prop[2], pred.ucl.propB = pred.ucl.prop[2])
    
      names(res.df) <- Cs(num, lclA, uclA, lclB, uclB, lclDiff, uclDiff, pDiff, orB, lclOR, uclOR, pred.propA, pred.lcl.propA, pred.ucl.propA, pred.propB, pred.lcl.propB, pred.ucl.propB)
    
      return(res.df)
    
    }
    
    pa1 <- 0.6
    pb1 <- 0.4
    s <- 3333
    set.seed(s)
    sim.res <- ldply(.data = N, .fun = res.ci, prob.a1 = pa1, prob.b1 = pb1)
    

    What do confidence intervals and p values depict?

    Below, we have plot graphs depicting different measures of difference in proportion and their confidence intervals with respect to sample size.

    library(gridExtra)
    
    p <- ggplot(aes(x = factor(num)), data = sim.res) + xlab("Sample size")  + theme_bw()
    
    p1 <-  p + geom_linerange(aes(ymin = lclA, ymax = uclA), alpha = 0.4, lwd = 1.5) + geom_linerange(aes(ymin = lclB, ymax = uclB), color = "red", alpha = 0.4, lwd = 1.5) + geom_hline(yintercept = c(pa1,pb1), color = c("black", "red"), linetype = "dashed") +  ylab("Proportion dead")
    
    p2 <- p + geom_linerange(aes(ymin = lclDiff, ymax = uclDiff), lwd = 1.5, colour = "blue") + geom_hline(yintercept = c(0, pa1 - pb1), color = c("red", "blue"), linetype = "dashed") +  ylab("Difference in proportion")
    
    p3 <- p + geom_point(aes(y = pDiff), pch = "x", size = 5) + geom_hline(yintercept = 0.05, color = "red", linetype = "dashed") +  ylab("p value of difference in Proportion") + scale_y_log10()
    
    p4 <- p + geom_crossbar(aes(y = pred.propA, ymin = pred.lcl.propA, ymax = pred.ucl.propA), alpha = 0.4, lwd = 0.2, fill = "gray") + geom_crossbar(aes(y = pred.propB, ymin = pred.lcl.propB, ymax = pred.ucl.propB), fill = "red", alpha = 0.4, lwd = 0.2) + geom_hline(yintercept = c(pa1,pb1), color = c("black", "red"), linetype = "dashed") +  ylab("Proportion dead \n as predicted from \n logistic regression model")
    
    grid.arrange(p1, p2, p3, p4, nrow = 4)
    

    plot of chunk unnamed-chunk-2

    Depiction of each of the figures

    X axis of each of the figures depict sample size (as factor) from 10 to 10000.

    1. Topmost figure: It depicts 95% confidence intervals of proportion dead for drug A (black) and drug B (red). The dashed horizontal lines depict actual proportion of deads for each of the groups.

    2. Second figure: It depicts the difference between the proportions of deads for patients receiving drug A and drug B. The blue dashed line depicts the actual difference (0.2) and red dashed line depicts the line of no difference (difference of 0).

    3. Third figure: It depicts the p value (in log scale) for difference between proportions dead for the two groups (drug A and drug B). Red dashed line depicts level of significance (p = 0.05).

    4. Lowermost figure: It depicts the predicted proportion deads for drug A and drug B and their confidence intervals from fitted logistic regression model. This figure is almost identical to the topmost figure.

    As can be seen from the above figures, confidence intervals depict how sure are we that the population proportion (or their difference) lie within a given range. CI becomes narrower with increasing sample size. So, with increasing sample size we just become more and more sure of the population parameters. Figures 2 and 3 depicts the confidence interval of difference between proportions of the two groups and the correspinding p values. We see that the confidence interval is surrounding actual difference in proportion more accurately with increasing sample size and is converging on to the actual difference (0.2). In doing so, the lower limit of the interval is getting further away from the line of no difference (0) with limit to the actual difference (0.2). Now, p value depends on the distance between the lower limit of confidence interval and the line of no effect. Its value reduces with increasing distance.

    We can see, that confidence intervals can be made narrow enough and p value can be made to be less than any given arbitrary significance level by increasing sample size appropriately. A research team who has invented a new drug and want that the new drug is proved to be better can do so with appropriately selecting the sample size. Also, all of the above mentioned measures only state that the population proportion of deads are 0.6 for drug A and 0.4 for drug B and the difference is 0.2, thereby meaning drug B is better than A. Proportion is a property of group of patients. None of the above parameters tells us any thing about the performance of drug B vs drug A when compared head to head.

    How to tackle the situation?

    Another way to state the problem will be “If drug A is given to a patient and drug B is given to an identical patient, what is the probability that only patient on drug A dies, only patient on drug B dies, patients on both the drugs die and none of them die?”.

    Mathematically, if we assume that patient receiving drug A (A) and drug B (B) behave as Bernoulli Trial with probability of death as pA and pB respectively and that both of them are independent events, then \( P(A \cap B) = P(A) \times P(B) \). So, the probability of various events will be as under:

    \( P(A = 1 \cap B = 0) = 0.6 * (1-0.4) = 0.36 \) (Drug A dies but drug B survives) … 1

    \( P(A = 0 \cap B = 1) = (1-0.6) * 0.4 = 0.16 \) (Drug A survives but drug A dies) … 2

    \( P(A = 1 \cap B = 1) = 0.6 * 0.4 = 0.24 \) (Both drug A and drug B die, equivalent response) … 3

    \( P(A = 0 \cap B = 0) = (1-0.6) * (1-0.4) = 0.24 \) (Both drug A and drug B survive, equivalent response) … 4

    We can see that only 36% of times patient taking drug A dies but drug B survives (eqn 1). So, drug B is better than drug A only 36% of times. Similarly, 16% of times patient taking drug B dies but drug A survives making drug A better than drug B. 48% of times performance of drug A and drug B is equivalent (either both die or both survive). So, despite the fact that drug B is significantly better than drug A (as depicted by highly significant p value and narrow confidence interval), only in 36% of times (less than 50%) does patient taking drug B survives but drug A dies.

    Also, despite the fact that drug B is better than drug A by 20% (40% death rate vs 60% death rate), on head to head comparision, both drugs are different only 52% of times and equivalent 48% of times. The following figures depict the same.

    set.seed(3333)
    
    foo <- function(nSamp, nsim, probA, probB) {
      sampA <- replicate(n = nsim, sample(c(0,1), size = nSamp, replace = T, prob = c(1 - probA, probA)))
      sampB <- replicate(n = nsim, sample(c(0,1), size = nSamp, replace = T, prob = c(1 - probB, probB)))
    
      deathOnlyA <- (sampA > sampB)
      deathOnlyB <- (sampB > sampA)
      deathBoth <- ((sampA == 1) & (sampB == 1))
      deathNone <- ((sampA == 0) & (sampB == 0))
      deathConcordant <- (sampA == sampB)
      deathDiscordant <- (sampA != sampB)
    
      L <- list(deathOnlyA, deathOnlyB, deathBoth, deathNone, deathConcordant, deathDiscordant)
    
      props <- lapply(L, function(y) colSums(y)/nSamp)
    
      meds <- lapply(props, function(x) median(x))
      lcls <- lapply(props, function(x) quantile(x, 0.025))
      ucls <- lapply(props, function(x) quantile(x, 0.975))
    
      return(data.frame(num = nSamp, medOnlyA = meds[[1]], medOnlyB = meds[[2]], medBoth = meds[[3]], medNone = meds[[4]], medConcordant = meds[[5]], medDiscordant = meds[[6]], 
                        lclOnlyA = lcls[[1]], lclOnlyB = lcls[[2]], lclBoth = lcls[[3]], lclNone = lcls[[4]], lclConcordant = lcls[[5]], lclDiscordant = lcls[[6]],
                        uclOnlyA = ucls[[1]], uclOnlyB = ucls[[2]], uclBoth = ucls[[3]], uclNone = ucls[[4]], uclConcordant = ucls[[5]], uclDiscordant = ucls[[6]]))
    }
    
    res <- ldply(.data = N, .fun = foo, nsim = 100, probA = 0.6, probB = 0.4)
    
    p <- ggplot(data = res, aes(x = factor(num))) + xlab("Sample size") + theme_bw()
    
    p + geom_linerange(aes(ymin = lclOnlyA, ymax = uclOnlyA), alpha = 0.4, colour = "red", lwd = 2) + ylab("Proportion deaths only with drug A") + geom_hline(yintercept = c(0.36, 0.5), colour = c("red", "black"), linetype = "dashed")
    

    plot of chunk unnamed-chunk-3

    p + geom_crossbar(aes(y = medConcordant, ymin = lclConcordant, ymax = uclConcordant), fill = "red", alpha = 0.4) + geom_crossbar(aes(y = medDiscordant, ymin = lclDiscordant, ymax = uclDiscordant), fill = "green", alpha = 0.5) + ylab("Proportion concordant performance (red)\n and discordant performance (green)") + geom_hline(yintercept = 0.5, linetype = "dashed")
    

    plot of chunk unnamed-chunk-3

    Conclusion

    The traditional way to analyse performance of a given drug based on difference in proportion is not adequate and more appropriate metric for the same should be used by the researchers and checked by peer reviewers.

    I am not sure about the adequacy and correctness of the above methology. Comments and criticisms are welcome.

    Bye.

    Suman.

    Session Information

    sessionInfo()
    
    ## R version 3.0.2 (2013-09-25)
    ## Platform: x86_64-pc-linux-gnu (64-bit)
    ## 
    ## locale:
    ##  [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C              
    ##  [3] LC_TIME=en_IN.UTF-8        LC_COLLATE=en_IN.UTF-8    
    ##  [5] LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8   
    ##  [7] LC_PAPER=en_IN.UTF-8       LC_NAME=C                 
    ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    ## [11] LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C       
    ## 
    ## attached base packages:
    ## [1] splines   grid      stats     graphics  grDevices utils     datasets 
    ## [8] methods   base     
    ## 
    ## other attached packages:
    ## [1] gridExtra_0.9.1 Hmisc_3.14-4    Formula_1.1-1   survival_2.37-7
    ## [5] lattice_0.20-24 ggplot2_1.0.0   plyr_1.8        knitr_1.6      
    ## 
    ## loaded via a namespace (and not attached):
    ##  [1] cluster_1.14.4      colorspace_1.2-4    dichromat_2.0-0    
    ##  [4] digest_0.6.4        evaluate_0.5.5      formatR_0.10       
    ##  [7] gtable_0.1.2        labeling_0.2        latticeExtra_0.6-26
    ## [10] lubridate_1.3.3     MASS_7.3-29         memoise_0.1        
    ## [13] munsell_0.4.2       proto_0.3-10        RColorBrewer_1.0-5 
    ## [16] reshape2_1.2.2      scales_0.2.3        SparseM_0.99       
    ## [19] stringr_0.6.2       tools_3.0.2
    
    4

    View comments

Subscribe
Subscribe
Blog Archive
My Blog List
My Blog List
About Me
About Me
I am a clinical hematologist practising in India with data analysis and R as my passions.
Loading
Dynamic Views theme. Powered by Blogger.