### Hypothesis, Significance level and other basics

If you are delving into data analytics and statistics, it is essential to get a strong hold on hypothesis and related terms.

In this post, I am listing down the key concepts with simple explanations.

### What is a hypothesis?

Hypothesis is a prediction about what your research will find.

It proposes a relationship between 2 or more variables – independent and dependent variables.

Independent variable is the one whose value changes in an experiment. Dependent variable is the measured one

*Null hypothesis H0* – assumes no relation between variables in the experiment.

Eating an apple everyday doesn’t lead to lower doctor visits

More screen time doesn’t lead to higher chances of myopia in children

*Alternative hypothesis Ha* – assumes a relationship does exist between variables in the experiment.

Eating an apple everyday leads to lower doctor visits

More screen time leads to higher chances of myopia in children

### What is a significance level in hypothesis?

The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true.

For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference.

*Higher the significance level, the experiment is more lenient.*

### What is statistical significance / p-value?

Statistical significance means the results are significant in terms of supporting the theory being investigated (i.e. not due to chance).

In other words, p is the probability of rejecting H0 when it is actually true. It is often expressed as a* p-value* or probability value between 0 and 1.

Lower the p-value, stronger is the evidence to reject the null hypothesis.

For a significance level of 5%, p <= 0.05 means that there is enough evidence that null hypothesis can be rejected. p > 0.05 suggests otherwise and you fail to reject the null hypothesis.

### Confidence interval and confidence limits

Confidence interval CI represents the interval that you are certain contains the true population.

For example, a confidence interval of 95% whose boundaries are set by confidence limits, means we are 95% confident that this interval contains true population.

Bigger is the interval, higher the chance it contains the value.

The confidence level sets the boundaries of a confidence interval, this is conventionally set at 95% to coincide with the 5% convention of statistical significance in hypothesis testing.

In some studies wider (e.g. 90%) or narrower (e.g. 99%) confidence intervals will be required. This rather depends upon the nature of your study. You should consult a statistician before using CI’s other than 95%.

Be sure to remember these basics when you delve deeper into data analytics.

## Leave a Reply