I recently read about the relationship between bernoulli process and poisson distribution. I wrote about it explaining the process with simulation.
Click here to visit the post.
Hope you like it.
Comments are welcome.
Bye.
Dr Suman
WORK
). Make following subfolders: fig
to hold figures, html
to hold final html files, html/fig
which will be a copy of the fig
subfolder and will be referenced by the html files, pdf
to hold final pdf files. Make a folder .pandoc/templates
in the HOME
folder which will hold the Pandoc Templates (default.html(5)
and default.latex
)fig
folder in png format.WORK
folder holding all the variables to be used throughout all the documents (say, my.yaml
). Any document specific YAML can be inserted in the md file.my.css
) in WORK/html
folder, which contain all the necessary formatting codes for html output.WORK/fig
after giving it an appropriate name, preferably in .png
format.pandoc doc1.html -o doc1.md
default.html
and default.latex
into the home/.pandoc/templates
folder as told before.default.html
in text editor. Following is an example of the template:<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"$if(lang)$ lang="$lang$" xml:lang="$lang$"$endif$>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
$for(author-meta)$
<meta name="author" content="$author-meta$" />
$endfor$
$if(date-meta)$
<meta name="date" content="$date-meta$" />
$endif$
<title>$if(title-prefix)$$title-prefix$ - $endif$$pagetitle$</title>
<style type="text/css">code{white-space: pre;}</style>
$if(quotes)$
<style type="text/css">q { quotes: "“" "”" "‘" "’"; }</style>
$endif$
$if(highlighting-css)$
<style type="text/css">
$highlighting-css$
</style>
$endif$
$for(css)$
<link rel="stylesheet" href="$css$" $if(html5)$$else$type="text/css" $endif$/>
$endfor$
$if(math)$
$math$
$endif$
$for(header-includes)$
$header-includes$
$endfor$
</head>
<body>
$for(include-before)$
$include-before$
$endfor$
$if(title)$
<div id="$idprefix$header">
<h1 class="title">$title$</h1>
$if(subtitle)$
<h1 class="subtitle">$subtitle$</h1>
$endif$
<div class="author"><b>$author$</b></div>
<div class="affil"><i>$affiliation$</i></div>
$if(date)$
<h3 class="date">$date$</h3>
$endif$
</div>
$endif$
$if(toc)$
<div id="$idprefix$TOC">
$toc$
</div>
$endif$
$body$
$for(include-after)$
$include-after$
$endfor$
</body>
</html>
The following characteristics are seen from the above code segment:$---$
: These are the variables, the values of which are to be provided with YAML document (to be told later). Sometimes, when variable is in form of a collection (like author -> name & address in YAML), the variable name of author can be accessed as $author.name$
and address of author can be accessed as $author.address$
$if(---)$ --- $else$ --- $endif$
construct: This the branching code for the template. One example is as below:$if(date)$
<h3 class="date">$date$</h3>
$endif$
The bove construct means that if date
variable is given in YAML then it will be entered in the html document as h3 with class “date” (whose formatting can be manipulated inside css file).$for(---)$ --- $endfor$
construct: This is the loopng code for the template. One example is as below:$for(css)$
<link rel="stylesheet" href="$css$" $if(html5)$$else$type="text/css" $endif$/>
$endfor$
The above construct checks for the css
variable which is a collection of variables. It inserts given html statement <link ---- />
for each element of css
variable.$body$
construct: This variable contains all the contents of doc1.md
file after converting into html format by Pandoc converter. We cannot change anything which is denoted by $body$
variable inside the template. If we want to assign a new class (or say id) to any of the element inside the md file, we will have to do it by inserting raw html statement, as depicted below.## header 2
The normal statement
<p class="myclass">Content of the special paragraph. It can **contain** markdown codes.</p>
Another normal statement
## Another header 2
my.yaml
should be as under:---
css: my.css
---
The details of YAML language construct can be found here.---
YAML CODE
---
$variable$
in Pandoc template) is denoted as variable
and following is the code for assigning a value to the variable.---
variable: value
---
---
author:
name: xxx
address: yyy
---
The name of author is accessed in Pandoc template as $author.name$
. Note is to be made of indentation in front of name and address. Indentation is to be made by inserting space, not tab.---
css:
- my1.css
- my2.css
---
The variable css
has two values associated with it (my1.css
and my2.css
).$for(css)$
<link rel="stylesheet" href="$css$" $if(html5)$$else$type="text/css" $endif$/>
$endfor$
The above code segment in Pandoc Template will access both the values of css
and insert a line each for my1.css
and my2.css
.pandoc doc1.md my.yaml -s --data-dir=/home/HOME/.pandoc -o html/doc1.html @for html file output@
pandoc doc1.md my.yaml -s --data-dir=/home/HOME/.pandoc -o pdf/doc1.pdf @for pdf file output@
If many md files are present, as in the project I was doing, then the whole process may be automated using a batch file with the following code:file <- as.list(list.files()[grep(".md",list.files())])
foo <- function(x) {
s.pdf <- paste0("pandoc ", x, " m.yaml -s --data-dir=/home/HOME/.pandoc -o pdf/", str_sub(x, 1L, -4L), ".pdf")
s.htm <- paste0("pandoc ", x, " m.yaml -s --data-dir=/home/HOME/.pandoc -o html/", str_sub(x, 1L, -4L), ".html")
system(s.pdf)
system(s.htm)
}
lapply(file, foo)
Recently while preparing lecture on scales of measurements and types of statistical data, I came across two scales of measurement when numbers are used to denote a quantitative variable. I took some time to clarify the difference between “Interval and "Ratio” scales of measurements. I am writing down what I understand of the above mentioned scales.
First step in variable measurement is to understand the concept we want to measure, i.e., we would like to define the variable on a conceptual level. Then we need to make an operational definition of the variable, which includes the following steps:
Setting up of a domain of all the possible values the variable can assume.
Understanding the meaning of different values the variable can assume.
Checking if a real origin (“0”) exists for the variable in the particular scale. Origin (“0”) should mean absolute absence of the variable.
Designing a device which will measure the variable.
Validating the measurement from the device.
There are two prerequisites for a measurement scale to be a Ratio Scale:
Let us assume that we have made numerical observations \( A_{ratio} \) and \( B_{ratio} \) for a variable in ratio scale and that \( B_{ratio} > A_{ratio} \). There are two valid ways to denote the difference between A and B:
Arithmetic difference between \( A_{ratio} \) and \( B_{ratio} \): It is denoted by \( B_{ratio} - A_{ratio} \). It is a valid measure of difference because of the fact that the scale is uniformly spaced across the domain.
Ratio difference between \( A_{ratio} \) and \( B_{ratio} \): It is denoted by \( B_{ratio}/A_{ratio} \). It indicates that \( B_{ratio} \) is \( B_{ratio}/A_{ratio} \) times larger than \( A_{ratio} \). We say this as a valid measure of difference because the origin is an absolute one and is same for both observations. Note that there is no unit as the result is a ratio. It is also equivalent to arithmetic difference of log transformation of observations, \( log(B_{ratio}) - log(A_{ratio}) \).
Location transformation: If we shift the observations by \( x \) units, we get \( Ax_{ratio} = A_{ratio} + x \) and \( Bx_{ratio} = B_{ratio} + x \). Arithmetic difference between the two transformed observations, \( Bx_{ratio} - Ax_{ratio} = B_{ratio} - A_{ratio} \), which is the same as original observations.
Scale transformation: If we multiply each of the observations by \( x \) units, we get \( Ax_{ratio} = A_{ratio} \cdot x \) and \( Bx_{ratio} = B_{ratio} \cdot x \). Ratio difference between the two transformed observations, \( Bx_{ratio}/Ax_{ratio} = B_{ratio}/A_{ratio} \), which is the same as original observations.
So, for ratio scale, both arithmetic and ratio difference are valid measures of difference between observations and the difference remain same after both location and scale transformations.
Any transformation (\( X_{trans} \)) of the original ratio scale, say \( X_{ratio} \) can be depicted as follows
\[ X_{ratio} = f(X_{trans},S(X_{trans}),L(X_{trans})) \]
where, \( S(X_{trans}) \) denotes scale transformation parameter as a function wrt location in transformed scale and \( L(X_{trans}) \) denotes location transformation parameter as a function wrt location in transformed scale.
If we assume constant \( S \) and \( L \) wrt location in transformed scale, one of the simplest scale transformation will be:
\[ X_{ratio} = (X_{trans} + L) \cdot S \]
where, \( S \neq 0 \)
and interval scale of measurement (\( X_{int} \)) will be the one with \( L \neq 0 \) in addition to the above constraints.
In interval scale, the zero doesnot mean absolute nothingness, but it is an arbitrarily chosen one and corresponds to a distance of \( L \) from the real origin in ratio scale.
We continue our example from the above section:
Let us say that we make two observations in interval scale, \( A_{int} \) and \( B_{int} \), and want to assess difference between both the observations as done earlier.
Observation \( A_{int} \) will be mapped as \( (A_{int} + L) \cdot S \) and observation \( B_{int} \) will be mapped as \( (B_{int} + L) \cdot S \) in ratio scale. We will have to use values in ratio scale for comparision, as it has got “real origin”.
So, for interval scale, only arithmetic difference is a valid measure of difference between observations.
The aim of this post is to express what I understand of the interval and ratio scale of measurements. Comments, suggestions and criticisms are welcome.
Bye.
For past few weeks, a question lingered in my mind that “Is the traditional approach of assessing difference in proportion (both in ways of arithmetic difference and ratio) between intervention A and intervention B as a way to ascertain the performance of intervention A and intervention B appropiate?”. The question came in my mind after observing practitioners dogmatically approving a particular intervention over another one based on so called statistically significant difference (p << 0.05 or miniscule confidence interval not including value of no effect, usually 0 for difference or 1 for ratio) in proportion of success (say patients surviving at a certain fixed time point) between the two groups.
I will be discussing about the actual meaning of p value and confidence intervals of difference in proportions. I will also try to highlight ill effects and limitations of using confidence intervals in deciding which of the two interventions is better in a properly conducted experimental design (no fallacy in study design and conduct).
I will be using simulation techniques to explain my point.
Let us assume that we have discovered a new drug which is more effective than the standard of care drug A and interested in proving that drug B is more effective than drug A. We assume that our outcome of interest is “proportion of patients dead at the end of 6 months of starting the treatment”. Let us assume that patients on Drug A have 6 months mortality of 60% and patients on Drug B have 6 months mortality of 40% (reduction of mortality rate by 20%)(population parameters, usually unknown to us).
We take 2n
number of patients and randomly assign Drug A and Drug B to n
patients each and observe them for 6 months (we assume appropiate randomisation, complete follow up and independence of all patients) to observe proportion of death in each group.
We will use 95% CI for all the measures throughout this post.
library(plyr)
library(ggplot2)
## Package SparseM (0.99) loaded.
## To cite, see citation("SparseM")
library(Hmisc, quietly = T)
## Loading required package: splines
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:plyr':
##
## is.discrete, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
N <- list(10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 1000, 2000, 10000)
res.ci <- function(n, seed, prob.a1, prob.b1) {
samp.n <- function(n, prob.a1, prob.b1) {
a <- sample(c(0,1), size = n, replace = T, prob = c(1 - prob.a1, prob.a1))
b <- sample(c(0,1), size = n, replace = T, prob = c(1 - prob.b1, prob.b1))
df.wide <- data.frame(deathA = a, deathB = b)
df.narrow <- data.frame(treat = c(rep("a", length = n), rep("b", length = n)), death = c(a, b))
return(list(df.wide = df.wide, df.narrow = df.narrow))
}
samp <- samp.n(n, prob.a1, prob.b1)
# calculate ci of propA and propB separately
bt <- function(x) {
r <- binom.test(table(factor(x, levels = c(1,0))))$conf.int
return(c(lcl = r[1], ucl = r[2]))
}
ciA <- bt(samp$df.wide$deathA)
ciB <- bt(samp$df.wide$deathB)
# compute ci of difference in proportion and p value
matAB <- laply(.data = samp$df.wide, .fun = function(x) return(c(`1` = sum(x), `0` = sum(x == 0))))
propDiff <- prop.test(matAB)
# Get number of patients who died in A but not in B
# logistic regression analysis: OR and their CIs
l.mod <- glm(death ~ treat, data = samp$df.narrow, family = binomial(link = "logit"), x = T, y = T)
or.b <- exp(coef(l.mod)[2])
conf.int.b <- exp(confint(l.mod)[2,])
# Prediction of l.mod for "a" and "b" with CI
nd <- data.frame(treat = c("a", "b"))
pred.lp <- predict(l.mod, nd, se.fit = T, type = "link")
pred.prop <- plogis(pred.lp$fit)
pred.ucl.prop <- plogis(pred.lp$fit + 1.96*pred.lp$se.fit)
pred.lcl.prop <- plogis(pred.lp$fit - 1.96*pred.lp$se.fit)
res.df <- data.frame(num = nrow(samp$df.wide), lclA = ciA[1], uclA = ciA[2], lclB = ciB[1], uclB = ciB[2], lclDiff = propDiff$conf.int[1], uclDiff = propDiff$conf.int[2], pDiff = propDiff$p.value, orB = or.b, lclOR = conf.int.b[1], uclOR = conf.int.b[2], pred.propA = pred.prop[1], pred.lcl.propA = pred.lcl.prop[1], pred.ucl.propA = pred.ucl.prop[1], pred.propB = pred.prop[2], pred.lcl.propB = pred.lcl.prop[2], pred.ucl.propB = pred.ucl.prop[2])
names(res.df) <- Cs(num, lclA, uclA, lclB, uclB, lclDiff, uclDiff, pDiff, orB, lclOR, uclOR, pred.propA, pred.lcl.propA, pred.ucl.propA, pred.propB, pred.lcl.propB, pred.ucl.propB)
return(res.df)
}
pa1 <- 0.6
pb1 <- 0.4
s <- 3333
set.seed(s)
sim.res <- ldply(.data = N, .fun = res.ci, prob.a1 = pa1, prob.b1 = pb1)
Below, we have plot graphs depicting different measures of difference in proportion and their confidence intervals with respect to sample size.
library(gridExtra)
p <- ggplot(aes(x = factor(num)), data = sim.res) + xlab("Sample size") + theme_bw()
p1 <- p + geom_linerange(aes(ymin = lclA, ymax = uclA), alpha = 0.4, lwd = 1.5) + geom_linerange(aes(ymin = lclB, ymax = uclB), color = "red", alpha = 0.4, lwd = 1.5) + geom_hline(yintercept = c(pa1,pb1), color = c("black", "red"), linetype = "dashed") + ylab("Proportion dead")
p2 <- p + geom_linerange(aes(ymin = lclDiff, ymax = uclDiff), lwd = 1.5, colour = "blue") + geom_hline(yintercept = c(0, pa1 - pb1), color = c("red", "blue"), linetype = "dashed") + ylab("Difference in proportion")
p3 <- p + geom_point(aes(y = pDiff), pch = "x", size = 5) + geom_hline(yintercept = 0.05, color = "red", linetype = "dashed") + ylab("p value of difference in Proportion") + scale_y_log10()
p4 <- p + geom_crossbar(aes(y = pred.propA, ymin = pred.lcl.propA, ymax = pred.ucl.propA), alpha = 0.4, lwd = 0.2, fill = "gray") + geom_crossbar(aes(y = pred.propB, ymin = pred.lcl.propB, ymax = pred.ucl.propB), fill = "red", alpha = 0.4, lwd = 0.2) + geom_hline(yintercept = c(pa1,pb1), color = c("black", "red"), linetype = "dashed") + ylab("Proportion dead \n as predicted from \n logistic regression model")
grid.arrange(p1, p2, p3, p4, nrow = 4)
Depiction of each of the figures
X axis of each of the figures depict sample size (as factor) from 10 to 10000.
Topmost figure: It depicts 95% confidence intervals of proportion dead for drug A (black) and drug B (red). The dashed horizontal lines depict actual proportion of deads for each of the groups.
Second figure: It depicts the difference between the proportions of deads for patients receiving drug A and drug B. The blue dashed line depicts the actual difference (0.2) and red dashed line depicts the line of no difference (difference of 0).
Third figure: It depicts the p value (in log scale) for difference between proportions dead for the two groups (drug A and drug B). Red dashed line depicts level of significance (p = 0.05).
Lowermost figure: It depicts the predicted proportion deads for drug A and drug B and their confidence intervals from fitted logistic regression model. This figure is almost identical to the topmost figure.
As can be seen from the above figures, confidence intervals depict how sure are we that the population proportion (or their difference) lie within a given range. CI becomes narrower with increasing sample size. So, with increasing sample size we just become more and more sure of the population parameters. Figures 2 and 3 depicts the confidence interval of difference between proportions of the two groups and the correspinding p values. We see that the confidence interval is surrounding actual difference in proportion more accurately with increasing sample size and is converging on to the actual difference (0.2). In doing so, the lower limit of the interval is getting further away from the line of no difference (0) with limit to the actual difference (0.2). Now, p value depends on the distance between the lower limit of confidence interval and the line of no effect. Its value reduces with increasing distance.
We can see, that confidence intervals can be made narrow enough and p value can be made to be less than any given arbitrary significance level by increasing sample size appropriately. A research team who has invented a new drug and want that the new drug is proved to be better can do so with appropriately selecting the sample size. Also, all of the above mentioned measures only state that the population proportion of deads are 0.6 for drug A and 0.4 for drug B and the difference is 0.2, thereby meaning drug B is better than A. Proportion is a property of group of patients. None of the above parameters tells us any thing about the performance of drug B vs drug A when compared head to head.
Another way to state the problem will be “If drug A is given to a patient and drug B is given to an identical patient, what is the probability that only patient on drug A dies, only patient on drug B dies, patients on both the drugs die and none of them die?”.
Mathematically, if we assume that patient receiving drug A (A) and drug B (B) behave as Bernoulli Trial with probability of death as pA and pB respectively and that both of them are independent events, then \( P(A \cap B) = P(A) \times P(B) \). So, the probability of various events will be as under:
\( P(A = 1 \cap B = 0) = 0.6 * (1-0.4) = 0.36 \) (Drug A dies but drug B survives) … 1
\( P(A = 0 \cap B = 1) = (1-0.6) * 0.4 = 0.16 \) (Drug A survives but drug A dies) … 2
\( P(A = 1 \cap B = 1) = 0.6 * 0.4 = 0.24 \) (Both drug A and drug B die, equivalent response) … 3
\( P(A = 0 \cap B = 0) = (1-0.6) * (1-0.4) = 0.24 \) (Both drug A and drug B survive, equivalent response) … 4
We can see that only 36% of times patient taking drug A dies but drug B survives (eqn 1). So, drug B is better than drug A only 36% of times. Similarly, 16% of times patient taking drug B dies but drug A survives making drug A better than drug B. 48% of times performance of drug A and drug B is equivalent (either both die or both survive). So, despite the fact that drug B is significantly better than drug A (as depicted by highly significant p value and narrow confidence interval), only in 36% of times (less than 50%) does patient taking drug B survives but drug A dies.
Also, despite the fact that drug B is better than drug A by 20% (40% death rate vs 60% death rate), on head to head comparision, both drugs are different only 52% of times and equivalent 48% of times. The following figures depict the same.
set.seed(3333)
foo <- function(nSamp, nsim, probA, probB) {
sampA <- replicate(n = nsim, sample(c(0,1), size = nSamp, replace = T, prob = c(1 - probA, probA)))
sampB <- replicate(n = nsim, sample(c(0,1), size = nSamp, replace = T, prob = c(1 - probB, probB)))
deathOnlyA <- (sampA > sampB)
deathOnlyB <- (sampB > sampA)
deathBoth <- ((sampA == 1) & (sampB == 1))
deathNone <- ((sampA == 0) & (sampB == 0))
deathConcordant <- (sampA == sampB)
deathDiscordant <- (sampA != sampB)
L <- list(deathOnlyA, deathOnlyB, deathBoth, deathNone, deathConcordant, deathDiscordant)
props <- lapply(L, function(y) colSums(y)/nSamp)
meds <- lapply(props, function(x) median(x))
lcls <- lapply(props, function(x) quantile(x, 0.025))
ucls <- lapply(props, function(x) quantile(x, 0.975))
return(data.frame(num = nSamp, medOnlyA = meds[[1]], medOnlyB = meds[[2]], medBoth = meds[[3]], medNone = meds[[4]], medConcordant = meds[[5]], medDiscordant = meds[[6]],
lclOnlyA = lcls[[1]], lclOnlyB = lcls[[2]], lclBoth = lcls[[3]], lclNone = lcls[[4]], lclConcordant = lcls[[5]], lclDiscordant = lcls[[6]],
uclOnlyA = ucls[[1]], uclOnlyB = ucls[[2]], uclBoth = ucls[[3]], uclNone = ucls[[4]], uclConcordant = ucls[[5]], uclDiscordant = ucls[[6]]))
}
res <- ldply(.data = N, .fun = foo, nsim = 100, probA = 0.6, probB = 0.4)
p <- ggplot(data = res, aes(x = factor(num))) + xlab("Sample size") + theme_bw()
p + geom_linerange(aes(ymin = lclOnlyA, ymax = uclOnlyA), alpha = 0.4, colour = "red", lwd = 2) + ylab("Proportion deaths only with drug A") + geom_hline(yintercept = c(0.36, 0.5), colour = c("red", "black"), linetype = "dashed")
p + geom_crossbar(aes(y = medConcordant, ymin = lclConcordant, ymax = uclConcordant), fill = "red", alpha = 0.4) + geom_crossbar(aes(y = medDiscordant, ymin = lclDiscordant, ymax = uclDiscordant), fill = "green", alpha = 0.5) + ylab("Proportion concordant performance (red)\n and discordant performance (green)") + geom_hline(yintercept = 0.5, linetype = "dashed")
The traditional way to analyse performance of a given drug based on difference in proportion is not adequate and more appropriate metric for the same should be used by the researchers and checked by peer reviewers.
I am not sure about the adequacy and correctness of the above methology. Comments and criticisms are welcome.
Bye.
Suman.
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=en_IN.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_IN.UTF-8 LC_COLLATE=en_IN.UTF-8
## [5] LC_MONETARY=en_IN.UTF-8 LC_MESSAGES=en_IN.UTF-8
## [7] LC_PAPER=en_IN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] splines grid stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] gridExtra_0.9.1 Hmisc_3.14-4 Formula_1.1-1 survival_2.37-7
## [5] lattice_0.20-24 ggplot2_1.0.0 plyr_1.8 knitr_1.6
##
## loaded via a namespace (and not attached):
## [1] cluster_1.14.4 colorspace_1.2-4 dichromat_2.0-0
## [4] digest_0.6.4 evaluate_0.5.5 formatR_0.10
## [7] gtable_0.1.2 labeling_0.2 latticeExtra_0.6-26
## [10] lubridate_1.3.3 MASS_7.3-29 memoise_0.1
## [13] munsell_0.4.2 proto_0.3-10 RColorBrewer_1.0-5
## [16] reshape2_1.2.2 scales_0.2.3 SparseM_0.99
## [19] stringr_0.6.2 tools_3.0.2
library(xtable)
##
## Attaching package: 'xtable'
##
## The following objects are masked from 'package:Hmisc':
##
## label, label<-
library(stringr)
library(whisker)
Rmd
.ENTER
keystrokes. The example is shown below: string 1\nstring 2\nstring 3\nstring 4
I faced problems when I was trying to render this text as following in the resulting html document. 1. string 1
2. string 2
3. string 3
4. string 4
s <- "string 1\nstring 2\nstring 3\nstring 4"
I substituted the \n
with <br>
, html tag for line break using str_replace_all
s1 <- str_replace_all(string = s, pattern = "\\n", replacement = "<br>")
s1
## [1] "string 1<br>string 2<br>string 3<br>string 4"
Then I tried to put the string s1
into the html document as follows. Actually I was working with dataframe with multiple rows and wanted to convert the data in table format.print(xtable(data.frame(s1)), type = "html")
s1 | |
---|---|
1 | string 1< br> string 2< br> string 3< br> string 4 |
<br>
in line breaks and the <br>
came in the output verbatim. I cheked up the underlying html code and found the following: <TABLE border=1>
<TR> <TH> </TH> <TH> s1 </TH> </TR>
<TR>
<TD align="right"> 1 </TD>
<TD> string 1< br> string 2< br> string 3< br> string 4</TD>
</TR>
</TABLE>
What happened internally was that, while parsing the document html escaped <
and >
tags into <
and >
respectively. Problem I faced was how to prevent escaping the <br>
and thereby inserting line breaks.s
and trimmed the resultant components.s2 <- str_trim(unlist(str_split(s, "\\n")))
s2
## [1] "string 1" "string 2" "string 3" "string 4"
I made dataframe out of the character vector and printed the required output.d <- data.frame(str_c(seq(from = 1, by = 1, along.with = s2), "."), s2)
print(xtable(d), type = "html", include.colnames = F, include.rownames = F,
html.table.attributes = "style='border-width:0;'")
1. | string 1 |
2. | string 2 |
3. | string 3 |
4. | string 4 |
<br>
from getting escaped, I have bypassed the issue.R
implementation, whisker package.s1
and take the following stepsl <- list(s1 = s1)
html.templ <- "<table><tr><td>{{{s1}}}</td></tr></table>"
cat(whisker.render(template = html.templ, data = l))
## <table><tr><td>string 1<br>string 2<br>string 3<br>string 4</td></tr></table>
{{{}}}
prevents the <br>
from getting escaped.cat(whisker.render(template = html.templ, data = l))
string 1 string 2 string 3 string 4 |
<
and >
from html rendering.sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=en_IN.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_IN.UTF-8 LC_COLLATE=en_IN.UTF-8
## [5] LC_MONETARY=en_IN.UTF-8 LC_MESSAGES=en_IN.UTF-8
## [7] LC_PAPER=en_IN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] datasets grid grDevices splines graphics utils stats
## [8] methods base
##
## other attached packages:
## [1] whisker_0.3-2 xtable_1.7-1 knitr_1.5 mypackage_1.0
## [5] devtools_1.4.1 dplyr_0.1.2 ggplot2_0.9.3.1 rms_4.0-0
## [9] SparseM_0.99 Hmisc_3.13-0 Formula_1.1-1 cluster_1.14.4
## [13] car_2.0-19 stringr_0.6.2 lubridate_1.3.3 lattice_0.20-24
## [17] epicalc_2.15.1.0 nnet_7.3-7 MASS_7.3-29 survival_2.37-4
## [21] foreign_0.8-57 deSolve_1.10-8
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 colorspace_1.2-4 dichromat_2.0-0
## [4] digest_0.6.4 evaluate_0.5.1 formatR_0.10
## [7] gtable_0.1.2 httr_0.2 labeling_0.2
## [10] memoise_0.1 munsell_0.4.2 parallel_3.0.2
## [13] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5
## [16] Rcpp_0.11.0 RCurl_1.95-4.1 reshape2_1.2.2
## [19] scales_0.2.3 tools_3.0.2
Bye and regards..git
folder into the SRCFOLDER$ SRCFOLDER git init
$ SRCFOLDER git add *.*
master
.$ SRCFOLDER git commit -m "initial commit"
https://www.github.org/sumprain/XXX
..git
folder. It gives a name origin
to the remote source.$ SRCFOLDER git clone https://www.github.org/sumprain/XXX
$ SRCFOLDER git push origin master
commit
the changes and then run another push
.
Add a comment