Student's t-tests

One of the most frequent statistical problems is testing hypotheses about the mean of the samples considered.

One-sample t-test

This test is used to check hypotheses about the fact that the mean of random variable X equals to given μ. Testing sample should be a sample of a normal random variable. During its work, the test calculates t-statistic:

If X has a normal distribution, the t-statistic will have Student's distribution with N-1 degrees of freedom. This allows the use of the Student's distribution to define the significance level which corresponds to the value of t-statistic.

Note #1
If X is not normal, t will have an unknown distribution and, strictly speaking, the t-test is inapplicable. However, according to the central limit theorem, as the sample size increases, the distribution of t tends to be normal. Therefore, if the sample size is big, we can use the t-test even if X is not normal. But there is no way to find out what value is big enough. This value depends on how X deviates from the normal distribution. Some sources claim that N should be greater than 30, but sometimes even this size is not enough. Alternatively, we can use non-parametric test: sign test or Wilcoxon rank-sign test.

Subroutine StudentTTest1 returns three p-values:

Two-sample pooled test

This test checks hypotheses about the fact that the means of two random variables X and Y which are represented by samples xS and yS are equal. The test works correctly under the following conditions:

During its work, the test calculates t-statistic:

If X and Y have a normal distribution, the t-statistic will have Student's distribution with NX+NY-2 degrees of freedom. This allows the use of the Student's distribution to define a significance level which corresponds to the value of t-statistic.

Note #2
If X or Y is not normal, t will have an unknown distribution and, strictly speaking, the t-test is inapplicable. However, according to the central limit theorem, as the sample sizes increase, the distribution of t tends to be normal. Therefore, if sample sizes are big enough, we can use the t-test even if X or Y is not normal. But there is no way to find what values for NX and NY are big enough. These values depend on how X and Y deviate from the normal distribution. Some sources claim that NX+NY should be greater than 40, but sometimes even these sizes are not enough. If you are not confident that distributions are normal, it's better to use non-parametric test: Mann-Whitney U-test.

Subroutine StudentTTest2 returns three p-values:

Two-sample unpooled test

This test checks hypotheses about the fact that the means of two random variables X and Y which are represented by samples xS and yS are equal. The test works correctly under the following conditions:

Dispersion equality is not required.

During its work, the test calculates the t-statistic:

If X and Y have a normal distribution, the t-statistic will have Student's distribution with DF degrees of freedom:

This allows the use of the Student's distribution to define the significance level which corresponds to the value of the t-statistic.

Note #3
If X or Y is not normal, t will have an unknown distribution and, strictly speaking, the t-test is inapplicable. However, according to the central limit theorem, as the sample sizes increase, the distribution of t tends to be normal. Therefore, if sample sizes are big enough, we can use the t-test even if X or Y is not normal. But there is no way to find what values for NX and NY are big enough. These values depend on how X and Y deviate from the normal distribution. Some sources claim that NX +NY should be greater than 40, but sometimes even these sizes are not enough. If you are not confident that the distributions are normal, it's better to use non-parametric test: Mann-Whitney U-test.

Subroutine UnequalVarianceTTest returns three p-values:

Links

  1. 'Hypothesis testing', Wikipedia
  2. 'P-value', Wikipedia
  3. 'T-test', Wikipedia

This article is licensed for personal use only.

Download ALGLIB for C++ / C# / Java / Python / ...

ALGLIB Project offers you two editions of ALGLIB:

ALGLIB Free Edition:
+delivered for free
+offers full set of numerical functionality
+extensive algorithmic optimizations
-no multithreading
-non-commercial license

ALGLIB Commercial Edition:
+flexible pricing
+offers full set of numerical functionality
+extensive algorithmic optimizations
+high performance (SMP, SIMD)
+commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

ALGLIB 4.04.0 for C++

C++ library.
Delivered with sources.
Monolithic design.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 4.04.0 for C#

C# library with native kernels.
Delivered with sources.
VB.NET and IronPython wrappers.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 4.04.0 for Java

Java wrapper around HPC core.
Delivered with sources.
Seamless integration with Java.
Editions:   FREE   COMMERCIAL

ALGLIB 4.04.0 for Delphi

Delphi wrapper around C core.
Delivered as precompiled binary.
Compatible with FreePascal.
Editions:   FREE   COMMERCIAL

ALGLIB 4.04.0 for CPython

CPython wrapper around C core.
Delivered as precompiled binary.
Editions:   FREE   COMMERCIAL