Category Archives: Mathematics and Statistics

Biostatistics Review Times

It's always great to see new analytic tools being used to describe everyday activities. Shiny Apps are a great way of allowing readers to interact with data and analysis. Recently, the co-editors of Biostatistics, Jeff Leek and Dimitris Rizopoulos wrote a Shiny app that lets you examine the recent review times for manuscripts.. This way, you can see the historical “review survival time” data to get an idea of how long your manuscript will take to review. Very cool idea!

R Markdown to WordPress Demo

This is an R Markdown document that will end up as a wordpress post, describing the process of publishing to WordPress directly from an R Markdown document in RStudio. You need a few things in order to get started, but they are easy to install. The biggest thing needed is the RWordPress package, which is not included in the standard repositories, and it is no longer available from the omegahat website. Don't worry, it can still be accessed from GITHUB. So we need two things:

1- The dependencies for RWordPress (namely RCurl, XML and XMLRPC)
2- A way to install packages directly from GITHUB (namely, devtools)

(Thanks to Koehlern at “Scripts and Statistics” for this workaround).

Let's get cracking. Simply run this to get the required packages:

install.packages("devtools")
install.packages("RCurl")
devtools:::install_github("duncantl/XMLRPC")
devtools:::install_github("duncantl/RWordPress")

Great. Now run the following, replacing with your values for 'user', 'password', 'wordpressurl', 'yourmarkdownfile', and 'desired post title':

library(RWordPress)
options(WordPressLogin=c(user='password'),WordpressURL='https://user.wordpress.com/xmlrpc.php')
library(knitr)
knit2wp('yourmarkdownfile.Rmd',title="desired post title")

If you get errors, check carefully for typos. Commands can be case-sensitive. If all went well, the RMarkdown document you described should be posted to your wordpress blog! I'll check mine right now. Code can be found on Github

.

ASA Statement on P-Values

This rarely happens (actually, this specific type of thing has actually never happened), but the American Statistical Association formed a committee and published a statement on P-Values.

Basically, P-Values have come under attack in recent years and many scattered discussions took place debating a few aspects of them and current practice. The ASA decided it would be helpful to centralize and organize thoughts a little and explain the most common pitfalls in current P-Value mentalities, since some folks haven’t yet fully understood these issues, and some folks have over-reacted to them.

The ASA boiled their thoughts down to six principles:

1- P-values can indicate how incompatible the data are with a specified statistical model.
2- P-values do not measure the probability that the studied hypothesis is true, or the
probability that the data were produced by random chance alone.
3- Scientific conclusions and business or policy decisions should not be based only on
whether a p-value passes a specific threshold.
4- Proper inference requires full reporting and transparency.
5- A p-value, or statistical significance, does not measure the size of an effect or the
importance of a result.
6- By itself, a p-value does not provide a good measure of evidence regarding a model or
hypothesis.

The statement is aimed towards, and should be read in more detail, by anyone involved in research today. I share the opinion that p-values are like cars. Very useful, but you really shouldn’t use one without a license.

Are Coin Flips Actually Unbiased?

This one has always bugged me. As subjects, math and stats tend to be a bit more unforgiving in terms of assumptions, and yet even at higher levels coins are sometimes still used in practice as a 50/50 binary random variable generators without much question from some. I looked into it, and found that I am far from being the first skeptic.

Here is a detailed paper resulting from collaboration between mathematicians on the subject. They did some very impressive experimental and theoretical work on the subject which yielded potentially surprising results.

An interesting interpretation of the paper can also be found here.

Basically, aside from existing ways to “cheat” with a coin flip through skilled technique, some hypothesize that a regular coin toss has a 51% chance of landing with the same side face up as when flipped. One possible basic idea can be understood by looking at the finite sequence of “Heads” up or “Tails” up that any flipped coin must go through.

(ex: HTHTHTHTHTHTHTHTHTHT…. where the landing position is either H or T)

By starting a coin flip with “Heads” up, you are guaranteeing that there will never be a higher proportion of “Tails positions”. At best, they will be tied. So, there is a small boost in the probability of landing “Heads”, or whatever the initial position of the coin was…Or is there? It can be debated, and although the ideas are interesting no real consensus has been reached.

The actual arguments in the paper are more physical in nature and are a little different, but are also in favor of landing on the same side up as when flipped.

Suppose we are still interested in using a coin toss to, say, allocate patients to treatments under a certain randomization scheme. Is all well and fair, so long as the coin-flipper has no technique and is blind to the initial position of the coin?

(Note: I would use simulation software for this, but I wonder if anyone actually does still flip coins).

Or should another investigator be around, and randomly decide to turn the flipped coin over (as is customary in many playgrounds)? Or maybe the coin should be caught sideways?

What about professional sports? Coin tosses play a big part in those.

Just brainstorming, but the main idea I’m exploring is that coin tosses might not deserve the innocent appreciation that they are currently enjoying, and we perhaps should all discuss it more.

Many dispute the interpretations of the results of the paper, and there are some pretty lively debates about all aspects of the study. Basically, the authors suggest that at least 250000 genuine flips would be required to detect the kind of bias they are hypothesizing, which far surpasses any world record yet set for consecutive flips.

Better solution? A well known way to get a fair flip out of an unfair coin.
Everyone should just do this if they suspect the coin they are flipping with may not be fair.

Turn an unfair coin into a fair flip algorithm:

Flip coin two times.
If it ends up HT or TH, the result is the first one of the sequence.
If it ends up HH or TT, continue flipping it again two times until you get HT or TH.

This works because if the probability of getting H is p and getting T is (1-p), we have
p(1-p) = (1-p)p, meaning that the events HT and TH are equally likely. So, by choosing one of two equally likely events(HT or TH), we have turned this into a fair game.

I’m interested in learning more about this, so as always feel free to comment.

coin_toss_date_idea-576x193

Penney's Game

I’ve been thinking a bit about coin flips recently and reading some work that was done attempting to establish if coin flips are as unbiased as we think they are (more on that soon). Along the way, I learned about something neat called Penney’s game.

Penney’s game can be played with a single coin and it involves two players betting on which triplet of results (such as heads-tails-heads or heads-heads-tails) will appear first if the coin is continuously flipped and all results are recorded.

E.g. Suppose player 1 chooses HHH (heads-heads-heads) and player 2 chooses TTT (tails-tails-tails) and the flip results are as follows:
HHTHTTHTTT

Then player 2 wins, as the triplet TTT has appeared first.

Now, if both players conceal their choices, this game is fair. However, if player 2 is allowed to choose their triplet after player 1 has declared theirs, it turns out that player 2 can actually gain a significant advantage.

Suppose player 1 chooses HHH again and tells player 2, who then chooses THH.
first three flip results are as follows:
HHT.. (if that last flip would have been heads, player 1 would have won)

Now let’s continue (including the first three flips):

HHT HTHH (there, since the last three flips were THH, player 2 has won. In fact, we can see that if the first three flips aren’t HHH, then player 1 can never win, as the second a tails pops up, two more H will make THH, making the HHH impossible to come before THH).

In fact, the general rule for player 2 to always have the edge is to pick the opposite of the middle result from player 1, then the first and middle.
e.g. If player 1 chooses HTH, then player 2 should choose H (opposite of the middle from player 1) and then HT(the first two from player 1) making HHT in total.

The take-home message here is that the game played this way is a bit like rock-paper-scissors, in that the game is “intransitive” (meaning that every choice by one player can be countered by the second player under optimal play).
So while you cannot win Penney’s game 100% of the time (different triplets give different edges), you can always gain a substantial advantage, and most people wouldn’t suspect that at first glance.

Simpson's Paradox

Here’s a good article I came across on Simpson’s Paradox, with a concrete example involving researchers who concluded that despite claims that women were discriminated against during the admissions process, that there was actually an admission bias in FAVOR of women at UC-Berkeley. How can you be so off? The overall stats for admissions showed that a higher percentage of men were admitted, but if you break down by department, women generally had higher acceptance rates than men in the top departments. In this example, this is caused by the fact that women tended to apply to more selective programs than men, who mainly applied to more specialized science-related programs that have higher acceptance rates. I especially liked the interactive tables which make the concept more accessible to non-math people, which is important since it’s a dangerous fallacy in research (e.g. omission of variables that can make a big difference). Play around with it.