You are here

How to understand statistics?

LindaLane's picture
Submitted by LindaLane on Mon, 01/29/2018 - 23:01

Scientists frequently use statistics to analyze their results. Why do researchers use statistics? Statistics can help to understand a phenomenon by confirming or rejecting a hypothesis. They are essential for acquiring knowledge in most scientific theories.

However, no need to be a scientist; anyone who wants to learn how statistics can help researchers can read this tutorial on statistics for the scientific method.

Research data

This section of the tutorial explains how data is acquired and used.

The results of a scientific survey often contain more information than the researcher needs. This information or data is called raw data.

To be able to analyze the data wisely, the raw data is treated as "output data". There are many methods for processing data but essentially, the scientist organizes and summarizes the raw data in one block for clarity. Any type of organized information can be called a "dataset".

Then, researchers can apply various statistical methods to analyze and understand data more easily and accurately. Depending on the research, the researcher may also want to use descriptive statistics as in exploratory research.

The great thing about raw data is that you can go back and check some things if you suspect something is happening differently than you expected. This is done after analyzing the meaning of the results.

As you see more clearly what is happening, the raw data can inspire you for new hypotheses. Sure, you can get a statistics project here, but you can also control the variables that could influence the conclusion (eg third-party variables). In statistics, a parameter is any numerical value characterizing a given population or one of its aspects.

Central trend and normal distribution

This part of the tutorial will help you understand the distribution, the central trend and how they relate to datasets.

Most of the real world data is distributed normally. The normal distribution is a totals curve or a frequency distribution in which the most frequent number is near the middle. Many experiments are based on conjectures of a normal distribution. This is one reason why researchers very often measure the central tendency in statistical research, such as the average (arithmetic mean or geometric mean), the median or the mode.

The central tendency may give a fairly good idea of the nature of the data (mean, median and mode display the "mean value"), especially when combined with a measure of data distribution. To measure this distribution, scientists calculate the standard deviation; but there are other methods: the variance, the standard error of the mean, the standard error of the estimate or the "range" (which indicates the extremes in the data).

To create the graph of the normal distribution of something, you will normally use the arithmetic mean of a "sufficiently large sample" and you will need to calculate the standard deviation.

However, the sampling distribution will not be distributed normally if the distribution is shifted (naturally) or has isolated values (mostly rare results or measurement errors) that distort the data. The F-distribution is an example of a distribution that is not normally distributed; it is shifted to the right.

Thus, researchers often double-check that their results are normally distributed using the range, median, and mode. If the distribution is not distributed normally, this will influence the choice of method or statistical test for the analysis.