ECON6037 Experimental Economics

Statistical tests

Gerhard Riener

g.riener@soton.ac.uk

February 19, 2024

Data Analysis and Experiments

Good experimental design makes for clean data analysis.
Knowing which statistical techniques you analyze helps to plan your design.
Choose the statistical approach that best fits your needs (graphs, tests, confidence intervals, regressions).
Think of what kind of data you can collect, to get the cleanest possible test of your hypothesis.
Compute the sample size necessary to meaningfully test your hypotheses.

Random Variable Definition

A random variable \(X\) is a measurable function \(X: \Omega \rightarrow E\) from a sample space \(\Omega\) (a set of possible outcomes) to a measurable space \(E\).

Technical Axiomatic Definition

Requires the sample space \(\Omega\) to be part of a probability triple \((\Omega, \mathcal{F}, P)\), as per the measure-theoretic definition.
A random variable is often denoted by capital Roman letters such as \(X, Y, Z, T\).

Probability of a Random Variable

The probability that \(X\) takes on a value in a measurable set \(S \subseteq E\) is denoted as:

\[ P(X \in S) = P(\{\omega \in \Omega \mid X(\omega) \in S\}) \]

Examples of Random Variables

Experiment	Random variable
Toss two dice	\(Y =\) sum of the numbers
Toss a coin 25 times	\(Y =\) number of heads in 25 tosses
Apply different amounts of fertilizer	\(Y =\) yield per acre
Can communicate / can not communicate	\(Y =\) contribution to public good

Variable types

What types of variables exist

Quantitative

Reflects a notion of magnitude, that is, if the values it can take are numbers. A quantitative variable represents thus a measure and is numerical.

Discrete countable finite number.
- Number of children per family
- Number of students in a class
- Number of citizens of a country
Continuous not countable
- Age
- Weight
- Height

Note: For all measurements, we usually stop at a standard level of granularity

Qualitative

also referred as categorical variables or factors are variables that are not numerical and which values fits into categories.

In other words, a qualitative variable is a variable which takes as its values modalities, categories or even levels, in contrast to quantitative variables which measure a quantity on each individual.

Nominal: no ordering
- Gender: female/male/ungendered/others
- Eye color: blue/brown/green

Note: that a qualitative variable with exactly 2 levels is also referred as a binary or dichotomous variable.

Qualitative: Ordinal

On the other hand, a qualitative ordinal variable is a qualitative variable with an order implied in the levels. For instance, if the severity of road accidents has been measured on a scale such as light, moderate and fatal accidents, this variable is a qualitative ordinal variable because there is a clear order in the levels.

Another good example is health, which can take values such as poor, reasonable, good, or excellent. Again, there is a clear order in these levels so health is in this case a qualitative ordinal variable.

Distribution of Variables

Parametric vs. Non-parametric Statistics

Parametric

Error distribution from a known family of parametric statistics.
Includes tests such as:
- t-test
- F-test
- etc…

Non-parametric

Fewer distributional assumptions.
Typically requires only independence.
Can be less powerful when the parametric assumptions are true.
Includes tests such as:
- Wilcoxon rank-sum test
- Kruskal-Wallis test
- etc…

Tests: A decision aid

One Variable

Two qualitative Variables

Two quantitative variables

More than Two variables

Two groups to compare

More than two groups to compare

Statistical Tests

The Binomial Test (Siegel and Castellan, 1988)

Details

Let \(X_{i}\) be a (Bernoulli) random variable (two categories - success/failure) with success probability \(p\).
\(E(X_{i}) = p\) and \(Var(X_{i}) = p(1-p)\).
Statistical problems:
- Hypothesis testing on \(p\).
- Confidence interval for \(p\).
- Estimator for \(p\).
Let \(B = \sum_{i=1}^{n}X_{i}\) be the total number of success.
\(B \sim \text{Binomial}(n, p)\).

Binomial Test

Hypothesis Test

Hypothesis test: \(H_{0}: p = p_{0}\) versus \(H_{A}: p \neq p_{0}\).
Test statistic: \(B = \sum_{i=1}^{n}X_{i} \sim \text{Binomial}(n, p_{0})\).

Rejection Regions

\(H_{A}: p > p_{0}\): Reject \(H_{0}\) if \(B \geq b_{\alpha; n, p_{0}}\).
\(H_{A}: p < p_{0}\): Reject \(H_{0}\) if \(B \leq c_{\alpha; n, p_{0}}\).
\(H_{A}: p \neq p_{0}\): Reject \(H_{0}\) if \(B \geq b_{\alpha_{1}; n, p_{0}}\) or \(B \leq c_{\alpha_{2}; n, p_{0}}\), where \(\alpha_{1}+\alpha_{2}=\alpha\).
Due to the discreteness of \(B\), tests cannot be conducted for all \(\alpha\) values.

R Code: binom.test Function

Use the binom.test function in R for performing a binomial test.

x <- 4
n <- 20
binom.test(x, n, p = 0.5,
           alternative = c("two.sided", "less", "greater"),
           conf.level = 0.95)


    Exact binomial test

data:  x and n
number of successes = 4, number of trials = 20, p-value = 0.01182
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.057334 0.436614
sample estimates:
probability of success 
                   0.2

Parameters Explained

\(x\): Number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.
\(n\): Number of trials; ignored if x has length 2.
\(p\): Hypothesized probability of success.
alternative: Specifies the alternative hypothesis.
conf.level: Confidence level of the interval.

\(\chi^2\)-Test (Siegel and Castellan, 1988)

Overview

The Chi-Squared (\(\chi^2\)) test is used to determine if there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.
Commonly used in testing relationships between categorical variables.

Example Question

“Are males more likely to accept Offer 3 (O3) than females?”

Test Details

The test statistic is based on the observed and expected frequencies.
It’s a non-parametric test (not based on parameter estimates).
Calculated as: \(\chi^2 = \sum \frac{(O - E)^2}{E}\)
- Where \(O\) = Observed frequency, \(E\) = Expected frequency.

Degrees of Freedom

Calculated as: (Number of rows - 1) × (Number of columns - 1) = 1 in this case.
Critical values are looked up in a \(\chi^2\) distribution table.
For a one-tailed test with a significance level of 5%, compare the test statistic to the critical value from the table.

Chi-Squared Test in R

# Sample data: Observed frequencies
observed <- matrix(c(10, 20, 30, 40), nrow = 2)

# Perform the Chi-squared test
test.result <- chisq.test(observed)

# Print the test results
print(test.result)


    Pearson's Chi-squared test with Yates' continuity correction

data:  observed
X-squared = 0.44643, df = 1, p-value = 0.504

Fisher Exact Test

What is the Fisher Exact Test?

The Fisher Exact Test is a statistical significance test used for small sample sizes.
It’s particularly useful for examining the association or independence of row and column variables in 2x2 contingency tables.

When to Use It?

Ideal when sample sizes are too small for the Chi-squared test.
When you have categorical data and want to compare proportions.

Key Features

It calculates the exact probability of observing the data assuming the null hypothesis of no association.
More accurate than Chi-squared test for small sample sizes.

Fisher Exact Test in R

R Code Example

# Sample data: Contingency table
observed <- matrix(c(8, 2, 1, 9), nrow = 2, byrow = TRUE)

# Perform Fisher Exact Test
test.result <- fisher.test(observed)

# Display the test results
print(test.result)


    Fisher's Exact Test for Count Data

data:  observed
p-value = 0.005477
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
    2.057999 1740.081669
sample estimates:
odds ratio 
  27.32632

Wilcoxon Mann-Whitney Test

Overview

The Wilcoxon Mann-Whitney test is a nonparametric test that compares two independent samples.
It’s used to determine whether there is a significant difference between the medians of two groups.

When to Use It?

When data are not normally distributed or when dealing with ordinal data.
Appropriate for small sample sizes.

Key Features

It tests the null hypothesis that two samples come from the same distribution.
Does not assume normal distribution of the data.
Often used as an alternative to the independent samples t-test.

Wilcoxon Mann-Whitney Test in R

R Code Example

# Sample data
group1 <- c(7, 8, 9, 5, 6)
group2 <- c(1, 2, 3, 4, 5)

# Perform the Wilcoxon Mann-Whitney test
test.result <- wilcox.test(group1, group2)

# Display the test results
print(test.result)


    Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 24.5, p-value = 0.01597
alternative hypothesis: true location shift is not equal to 0

Output includes the test statistic (W) and the p-value. A low p-value indicates a significant difference between the group medians.

Wilcoxon Signed-Rank Test

What is the Wilcoxon Signed-Rank Test?

A nonparametric test used to compare two related samples.
It assesses whether their population mean ranks differ.

When to Use It?

Ideal for paired data when the data are not normally distributed.
Commonly used as an alternative to the paired t-test.

Key Features

It tests the null hypothesis that the median differences between pairs of observations are zero.
Assumes that the differences between pairs are symmetrically distributed.

Interpretation: The test helps assess the effect of an intervention or treatment by comparing measurements taken before and after.

Wilcoxon Signed-Rank Test in R

R Code Example

# Sample paired data
before <- c(120, 101, 130, 108, 143)
after <- c(115, 107, 132, 101, 138)

# Perform the Wilcoxon signed-rank test
test.result <- wilcox.test(before, after, paired = TRUE)

# Display the test results
print(test.result)


    Wilcoxon signed rank test with continuity correction

data:  before and after
V = 10, p-value = 0.5879
alternative hypothesis: true location shift is not equal to 0

The wilcox.test function is used with paired = TRUE for paired data.
Output includes the test statistic (V) and the p-value.
A low p-value suggests a significant difference in the median values of the two groups.

Friedman Test

The Friedman test is a non-parametric alternative to one-way ANOVA with repeated measures.
It’s used to detect differences in treatments across multiple test attempts.

When to Use

Appropriate when dealing with ranked (ordinal) data.
Useful for non-normally distributed data or when the assumption of homogeneity of variances is violated.

Friedman test in R

# Sample data: Here, 'data' is a data frame with ranked data
# and 'group' is the grouping factor

library(tidyverse)
library(ggpubr)
library(rstatix)

data("selfesteem", package = "datarium")



selfesteem <- selfesteem %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
# Perform the Friedman test
res.fried <- selfesteem %>% friedman_test(score ~ time |id)
res.fried

# A tibble: 1 × 6
  .y.       n statistic    df        p method       
* <chr> <int>     <dbl> <dbl>    <dbl> <chr>        
1 score    10      18.2     2 0.000112 Friedman test

Effect Sizes

The Importance of Effect Sizes

Statistical significance alone is not enough.
Effect sizes measure the magnitude of the relationship or difference.

Pearson’s r: Correlation Coefficient

Strength of Relationships

Measures the strength and direction of a linear relationship.
Ranges from -1 to 1.

R Implementation

# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

# Calculate Pearson's r
cor(x, y, method = "pearson")

[1] 1

Interpreting the Correlation Coefficient

Calculating in R

# Perform Pearson's correlation test
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

cor.test(x, y, method = "pearson")


    Pearson's product-moment correlation

data:  x and y
t = 82191237, df = 3, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 1 1
sample estimates:
cor 
  1

Large Correlation?

Interpretation varies by discipline.
Greater than 0.5 often considered large in social sciences.

Standardized Mean Difference

Differences in Means

Understanding the difference between two group means.
Expressed as a ratio of mean difference to standard deviation.

R Implementation

# Assuming 'group1' and 'group2' are vectors of data

group1 <- c(1, 4, 3, 4, 5)
group2 <- c(2, 3, 4, 5, 6)
theta <- (mean(group1) - mean(group2)) / sd(c(group1, group2))

theta

[1] -0.4014898

Cohen’s d

Calculating Effect Size

Measure of effect size for the difference between two means.

R Implementation


Cohen's d

d estimate: -0.3872983 (small)
95 percent confidence interval:
    lower     upper 
-1.859353  1.084756

Interpreting Cohen’s d

d = 0.2: Small effect.
d = 0.5: Medium effect.
d = 0.8: Large effect.