Her Campus Logo Her Campus Logo
Life

Why Stats is Pretty Much Magic

The methods provided by the field of statistics are more powerful than most people are aware. They allow for testing of various hypotheses, if a hypothesis is likely, it can then be used to predict what is likely to happen in the future. I say likely because there is always some uncertainty that is impossible to get rid of. But when different methods yield the same prediction, there is a good chance that you have an idea of what is going to happen in the future! That’s HUGE!!! I am going to go through two ways that allow us to test hypotheses and how credible they are: the Frequentist and the Bayesian approach.

1.     Randomization test (The Frequentist)

So many factors out there could be correlated. There could be a correlation between the number of times I eat out and the number of birds on campus. But obviously, it is unreasonable to think the correlation suggests causation of the two. It is definitely a possibility, but no such inferences can be drawn from the correlation alone.

Say that we go to Western students and collect data in the form of a survey. We then create a database of these surveys. We notice something interesting: sleep hours and student grades show a very high degree of correlation. As much as we think that it is intuitive to say sleeping less can cause students to perform more poorly on a test, we can’t prove it and hence cannot conclude that for the same reasons in the case of the example stated above. So then… how do we back up our intuition?

We run a randomization test! We can use this test to keep information about every single student in our dataset constant but shuffle up their hours of sleep. And each time, we can calculate the mean difference in grades between students that sleep 6 hours or more and students that sleep less than 6 hours. If we do this 1000 times, we end up with 1000 differences and that in itself will be used as our stats. Through calculation of a p-value (a type of conditional probability that I will not go into detail about) of the column containing the 1000 stats, we can see if we were right that the two have a significant relationship, or that we were wrong and in fact the relationship observed between the two was due to random chance and the difference observed was from some other characteristic of the students surveyed. Powerful, huh? That’s only one of the uses of a randomization test.

2.     Baye’s theorem

It is again a method of testing that can help us find out if our hypothesis was correct or if all the relationships seen are just due to random chance. To demonstrate the power of Bayesian way of thinking, consider the following probability: P(A|B). That means the probability of A happening under the assumption of B. Now, in the case of hypothesizing based on data, A is our data and B is our hypothesis. Measuring the probability of the data happening under our hypothesis is not that difficult. What we are interested in is P(B|A), the probability of our hypothesis occurring under the data we have. The Bayesian theorem can calculate this probability!!

However, it does need some starting point. In other words, it needs P(B). Once this challenge is overcome—if nothing else is known, the P(B) can be assigned a value of 50%—the P(B|A) can be calculated which can be interpreted as the updated probability in light of new evidence; we now have an updated “posterior” probability. We can calculate two posterior probabilities: one for our hypothesis that amount of sleep can affect academic performance and the other for the opposite (null) hypothesis, which states that sleep and grades are not related and the data observed is due to random chance. Bayes Factor is then just the ratio of the updated probabilities for the suggested hypothesis over null. If this ratio is 3 or greater (a universal standard) then our hypothesis can be used to explain the data.

I hope that by now, I was able to give you a glimpse of why the field of stats is powerful enough to be known as the closest tool we have to magic. Prediction is one of the many interests explored by this field, and the methods noted above are only two ways of many—some of which that are more direct than the ones explained here—that statisticians have developed for the prediction of unseen.

Related Articles 

Want more HCW? Check us out on social media! 

Facebook, Twitter, Instagram, Pinterest

 

Nadia Aiaseh

Western '21

I am a second-year student in integrated science with specilization in physics. I aim to write about topics that are most relatable to students on campus. So if you are excited/bothered by something and want others to know about it, don't hesitate to contact me for a potential article :) 
Similar Reads👯‍♀️