# It's raining fake pills

## Pill or placebo?

The FDA is approving medications that do not quite work.

This is the conclusion of Don van Ravenzwaaij’s research which he conducted for the RUG in collaboration with John Ioannidis.

The FDA approves medication when two separate studies show the medication to have a positive effect. However, they do not take into account the total number of studies.

To show that the FDA’s policies are flawed, he created simulations based on real data.

Only through simulations were they able to show how likely it is for the FDA to even reach a correct decision.

This problem arises not only from policy issues, but also from the traditional way that data is interpreted.

Van Ravenzwaaij argues for new policies in which the FDA considers all tests that were done and assesses how strong the evidence of each separate study is.

Reading time: 8 minutes (1,640 words)

The FDA is the institution for testing and approving medication in America since 1962. To ensure that only effective and safe medication enters the market, the administration follows strict protocol. That means that ‘bad’ medicine only enters the market when the FDA deviates from this protocol. Or so we like to think.

According to its own policy, the FDA approves medication when two clinical trials convincingly prove a positive effect (i.e. when said effect is statistically proven). ‘However, the policy does not state whether that’s two successful trials out of two, or out of five or even out of 20’, says Van Ravenzwaaij. ‘And that’s not just a theoretical problem. It regularly turns out to be that only two out of multiple trials are actually successful.’

## Hit or miss

You do not need to understand statistics to grasp that this is not great. It is like calling someone who pulls the trigger on a gun 20 times and hits the target twice a good marksperson.

But how often does the FDA come to the wrong conclusion? To find out, Van Ravenzwaaij simulated several situations based on real studies. That is because, according to him, the number of times things go wrong in reality can only be shown with simulated data. ‘Only simulated data allow us to see the real truth.’

Using ‘real’ data, we would not know whether the medication actually worked or not. After all, we would only have the statistical evidence of the tested samples. But it is impossible to test whether the FDA is making the right decision without knowing what the right decision is. That can be done in a simulation.

## Financial interests

A simulation can be used to generate fictional data, for example based on effective or ineffective medication. Next, we can check what goes wrong when the FDA procedure is applied to this virtual medication. This enables us to test how often the FDA would approve medication under the assumption that it is effective, as well as how often that would happen if they assumed medication did not work. This is the only method to test how (un)likely it is for the FDA’s policy to lead to the right decision, according to Van Ravenzwaaij. ‘It is our conclusion that it goes wrong in a large number of cases, and that means that the strict implementation of this policy leads to new medications that don’t work entering the market.’

## The validity of conclusions

Don van Ravenswaaij is an instructor and researcher at the Centre for Psychometry and Statistics at the Faculty of Behavioural and Social Sciences at the RUG where he researches the validity of conclusions that scientists have reached by applying traditional statistical methods to their research data, among other things.

The traditional ‘p-value statistics’ look at the plausibility of the data based on real world examples. He compares this way of interpreting data with other methods, such as Bayesian statistics. Bayesian statistics combines several possible explanations and their relative probability for a more nuanced result.

Does that also mean that there is medication on the market that has negative side effects that have been withheld? That certainly can happen, says Van Ravenzwaaij. ‘The approval of new medication comes with great financial interests. My colleague John Ioannidis has published a lot on this, but that’s not what this research is about.’

He would rather not name any specific medications, not because he cannot think of any but because he does not want to get burnt. Besides, it is not just about any particular pharmaceutical company getting things wrong, but rather about the procedure as a whole, says Van Ravenzwaaij. ‘The message we’re trying to send is that this policy can lead to the wrong decisions!’

## Murder

So how does a large administration like this, with such an important task, have this kind of blind spot in its policy? According to Van Ravenzwaaij, it has a lot to do with the way traditional statistics work. One example of how wrong this can go is the story of Sally Clark.

Sally Clark had two small babies who died within a short period of each other. After the death of her second son, she was charged with and convicted of the murder of her children. The public prosecutor argued that the chance of two young children in the same family dying of cot death is too small for the occurrence to be coincidental. Paediatrician Roy Meadow explained that the chance of cot death is approximately one in 8,500. The chance of a second cot death becomes one in 8,500, squared. ‘That is a chance of one in approximately 73 million’, Van Ravenzwaaij calculates. ‘That is so incredibly unlikely that traditional statistics automatically reach the conclusion that there must be a different explanation: that Sally Clark murdered her two children.’

Just like the FDA only counts the successful trials, the court made the same mistake by thinking that one little sliver of reality was proper proof. A woman murdering her two children is approximately nine times as unlikely as two cases of cot death in one family, according to Van Ravenzwaaij. Besides, there are other possible explanations, such as the children suffering from a birth defect. It took the courts five years to realise that the evidence given in the case made no sense, and released Sally Clark.

## P-hacking

Another problem putting pressure on traditional statistics is so-called p-hacking. P-hacking is the manipulation of research data to ensure that only the desired results occur. The name refers to the p-value used in traditional statistics to prove or reject hypotheses. The most common method of p-hacking is selectively removing research data that negatively influences the results, Van Ravenzwaaij explains. ‘The goal is to make the p-value drop just below the magical 5 per cent limit. This is obviously very unethical; it’s not the correct way to conduct research.’

Even researchers with the best intentions sometimes purge ‘wrong’ data. On the other hand, ‘if a researcher has malicious intent and consciously engages in data torture, it’s going to go wrong regardless of which type of statistics the researcher is using.’

## Meanwhile, in the Netherlands…

The procedures and conditions for medication approval in Europe and the Netherlands are largely similar to those of the FDA. Typically, pharmaceutical companies submit a European application with the European Medicines Agency (EMA). This agency sends the application to the countries’ authorities to be assessed. The Dutch Medicines Evaluation Board (MEB) plays an important role in this.

If the EMA approves of a medication, that approval applies to the whole of Europe in theory. ‘Manufacturers do have to hand over everything they know about the medication’s effectiveness and side effects for approval application’, emphasises professor Marcel Bouvy, an MEB member. ‘Cherry picking, a.k.a. only showing the positive results, is obviously not allowed.’

Determining which medication is allowed on the market varies per drug. ‘It’s done on a case-by-case basis’, says Bouvy. ‘Sometimes there are diseases for which no other remedy exists. So then we have to decide whether to have nothing to offer patients, or to give them access to a drug and be prepared to accept any possible risks.’

Fortunately, Van Ravenzwaaij has a good alternative for the FDA policy: Bayesian statistics. This is a different type of statistics that combines various scenarios and relative probability. The difference between the two methods can be imagined as follows.

## A bit crazy

If you see your colleague Don go outside carrying his umbrella, there are two possible explanations: either it is raining, or it is not raining and Don is a little crazy (or there is a different reason he is bringing his umbrella). To find out the probability of it actually raining, we have to find out how probable it is to rain at this time of year (say, 30 per cent) and how probable it is to not rain outside (70 per cent). We then link those percentages to the respective probability of Don taking his umbrella because it is raining (say, 80 per cent) and the probability that Don is taking his umbrella on a dry day (say, 10 per cent).

In order to explain Don’s behaviour, you combine these probabilities and calculate the probability of it raining and Don bringing his umbrella (30 per cent x 80 per cent = 24 per cent) as well as the chance of it not raining and Don bringing his umbrella (70 per cent x 10 per cent = 7 per cent). We have now calculated the probability of Don taking his umbrella for different possible reasons. We now know that there is a much larger probability of it raining when Don goes outside carrying his umbrella (in this case: 24 per cent is more than three times as much as 7 percent, so the probability of it raining is slightly more than three times larger than the probability that it is not). In order to explain Don’s behaviour, we have combined various possible explanations. ‘We’re looking at a model of the world relative to a different model of the world.’

‘And that makes all the difference’, according to Van Ravenzwaaij. ‘Traditional statistics only test one possible explanation. And because it’s highly unlikely for Don to be carrying his umbrella when it’s not raining, the only conclusion is that it is raining.’

Thanks to the FDA’s current testing policy, there is a wealth of medication on the market that might work no better than a placebo. Therefore, Van Ravenzwaaij is arguing for a new policy, one in which the administration takes all the tests into account and verifies how strong the evidence of each individual trial actually is. ‘The FDA needs to distinguish on the basis of the number of trials that were needed to get two successful tests’, according to Van Ravenzwaaij. ‘And they need to incorporate that distinction in their final assessment.’