ECON6037 Experimental Economics

Statistical tests

Gerhard Riener

February 19, 2024

Data Analysis and Experiments

  • Good experimental design makes for clean data analysis.
  • Knowing which statistical techniques you analyze helps to plan your design.
  • Choose the statistical approach that best fits your needs (graphs, tests, confidence intervals, regressions).
  • Think of what kind of data you can collect, to get the cleanest possible test of your hypothesis.
  • Compute the sample size necessary to meaningfully test your hypotheses.

Random Variable Definition

A random variable \(X\) is a measurable function \(X: \Omega \rightarrow E\) from a sample space \(\Omega\) (a set of possible outcomes) to a measurable space \(E\).

Technical Axiomatic Definition

  • Requires the sample space \(\Omega\) to be part of a probability triple \((\Omega, \mathcal{F}, P)\), as per the measure-theoretic definition.
  • A random variable is often denoted by capital Roman letters such as \(X, Y, Z, T\).

Probability of a Random Variable

  • The probability that \(X\) takes on a value in a measurable set \(S \subseteq E\) is denoted as:

\[ P(X \in S) = P(\{\omega \in \Omega \mid X(\omega) \in S\}) \]

Examples of Random Variables

Experiment Random variable
Toss two dice \(Y =\) sum of the numbers
Toss a coin 25 times \(Y =\) number of heads in 25 tosses
Apply different amounts of fertilizer \(Y =\) yield per acre
Can communicate / can not communicate \(Y =\) contribution to public good

Variable types

What types of variables exist

G A Variable B Quantitative A->B C Qualitative A->C D Discrete B->D E Continuous B->E F Nominal C->F G Ordinal C->G

Quantitative

Reflects a notion of magnitude, that is, if the values it can take are numbers. A quantitative variable represents thus a measure and is numerical.

  • Discrete countable finite number.
    • Number of children per family
    • Number of students in a class
    • Number of citizens of a country
  • Continuous not countable
    • Age
    • Weight
    • Height

Note: For all measurements, we usually stop at a standard level of granularity

Qualitative

also referred as categorical variables or factors are variables that are not numerical and which values fits into categories.

In other words, a qualitative variable is a variable which takes as its values modalities, categories or even levels, in contrast to quantitative variables which measure a quantity on each individual.

  • Nominal: no ordering
    • Gender: female/male/ungendered/others
    • Eye color: blue/brown/green

Note: that a qualitative variable with exactly 2 levels is also referred as a binary or dichotomous variable.

Qualitative: Ordinal

On the other hand, a qualitative ordinal variable is a qualitative variable with an order implied in the levels. For instance, if the severity of road accidents has been measured on a scale such as light, moderate and fatal accidents, this variable is a qualitative ordinal variable because there is a clear order in the levels.

Another good example is health, which can take values such as poor, reasonable, good, or excellent. Again, there is a clear order in these levels so health is in this case a qualitative ordinal variable.

Distribution of Variables

Parametric vs. Non-parametric Statistics

Parametric

  • Error distribution from a known family of parametric statistics.
  • Includes tests such as:
    • t-test
    • F-test
    • etc…

Non-parametric

  • Fewer distributional assumptions.
  • Typically requires only independence.
  • Can be less powerful when the parametric assumptions are true.
  • Includes tests such as:
    • Wilcoxon rank-sum test
    • Kruskal-Wallis test
    • etc…

Tests: A decision aid

One Variable

G start 1 variable qualitative Qualitative start->qualitative quantitative Quantitative start->quantitative twoGroups 2 groups qualitative->twoGroups moreThanTwoGroups > 2 groups qualitative->moreThanTwoGroups parametric Parametric quantitative->parametric nonparametric Nonparametric quantitative->nonparametric oneProportionTest One-proportion test twoGroups->oneProportionTest chiSquareGoodness Chi-square goodness of fit test moreThanTwoGroups->chiSquareGoodness oneSampleTTest One-sample Student's t-test parametric->oneSampleTTest oneSampleWilcoxonTest One-sample Wilcoxon test nonparametric->oneSampleWilcoxonTest

Two qualitative Variables

G start 2 Qualitative twoGroupsEach 2 groups for each variable start->twoGroupsEach moreThanTwoGroups > 2 groups for at least one variable start->moreThanTwoGroups paired Paired samples twoGroupsEach->paired independent Independent twoGroupsEach->independent chiSquareIndep Chi-square test of independence moreThanTwoGroups->chiSquareIndep twoPairedSamples 2 paired samples paired->twoPairedSamples moreThanTwoPaired > 2 paired samples paired->moreThanTwoPaired freqLessThan5 Expected frequencies < 5 independent->freqLessThan5 freqMoreOrEqual5 Expected frequencies >= 5 independent->freqMoreOrEqual5 mcnemarTest McNemar's test twoPairedSamples->mcnemarTest cochranTest Cochran's Q test moreThanTwoPaired->cochranTest fisherTest Fisher's exact test freqLessThan5->fisherTest freqMoreOrEqual5->chiSquareIndep

Two quantitative variables

G start 2 Qualitative twoGroupsEach 2 groups for each variable start->twoGroupsEach moreThanTwoGroups > 2 groups for at least one variable start->moreThanTwoGroups paired Paired samples twoGroupsEach->paired independent Independent twoGroupsEach->independent twoPairedSamples 2 paired samples paired->twoPairedSamples moreThanTwoPaired > 2 paired samples paired->moreThanTwoPaired freqLessThan5 Expected frequencies < 5 independent->freqLessThan5 freqMoreOrEqual5 Expected frequencies >= 5 independent->freqMoreOrEqual5 mcnemarTest McNemar's test twoPairedSamples->mcnemarTest cochranTest Cochran's Q test moreThanTwoPaired->cochranTest fisherTest Fisher's exact test freqLessThan5->fisherTest chiSquareIndep Chi-square test of independence freqMoreOrEqual5->chiSquareIndep

More than Two variables

G start > 2 variables quantDep Quantitative dependent variable start->quantDep qualDep Qualitative dependent variable start->qualDep multipleLinReg Multiple linear regression quantDep->multipleLinReg twoGroups 2 groups qualDep->twoGroups moreThanTwoGroups > 2 groups qualDep->moreThanTwoGroups binaryLogReg Binary logistic regression twoGroups->binaryLogReg multinomialLogReg Multinomial logistic regression moreThanTwoGroups->multinomialLogReg

Two groups to compare

G start 2 groups independent Independent samples start->independent paired Paired samples start->paired parametricInd Parametric independent->parametricInd nonparametricInd Nonparametric independent->nonparametricInd parametricPair Parametric paired->parametricPair nonparametricPair Nonparametric paired->nonparametricPair equalVar Equal population variances parametricInd->equalVar unequalVar Unequal population variances parametricInd->unequalVar mannWhitney Mann-Whitney U test nonparametricInd->mannWhitney studentPair Student's t-test for paired samples parametricPair->studentPair wilcoxon Wilcoxon signed-rank test nonparametricPair->wilcoxon studentInd Student's t-test for 2 independent samples equalVar->studentInd welchInd Welch's t-test for 2 independent samples unequalVar->welchInd

More than two groups to compare

G start > 2 groups independent Independent samples start->independent paired Paired samples start->paired parametricInd Parametric independent->parametricInd nonparametricInd Nonparametric independent->nonparametricInd parametricPair Parametric paired->parametricPair nonparametricPair Nonparametric paired->nonparametricPair equalVarInd Equal population variances parametricInd->equalVarInd unequalVarInd Unequal population variances parametricInd->unequalVarInd kruskalWallis Kruskal-Wallis test nonparametricInd->kruskalWallis repeatedMeasuresANOVA Repeated measures ANOVA parametricPair->repeatedMeasuresANOVA friedmanTest Friedman test nonparametricPair->friedmanTest oneWayANOVA One-way ANOVA equalVarInd->oneWayANOVA welchsANOVA Welch's ANOVA unequalVarInd->welchsANOVA

Statistical Tests

The Binomial Test (Siegel and Castellan, 1988)

Details

  • Let \(X_{i}\) be a (Bernoulli) random variable (two categories - success/failure) with success probability \(p\).
  • \(E(X_{i}) = p\) and \(Var(X_{i}) = p(1-p)\).
  • Statistical problems:
    • Hypothesis testing on \(p\).
    • Confidence interval for \(p\).
    • Estimator for \(p\).
  • Let \(B = \sum_{i=1}^{n}X_{i}\) be the total number of success.
  • \(B \sim \text{Binomial}(n, p)\).

Binomial Test

Hypothesis Test

  • Hypothesis test: \(H_{0}: p = p_{0}\) versus \(H_{A}: p \neq p_{0}\).
  • Test statistic: \(B = \sum_{i=1}^{n}X_{i} \sim \text{Binomial}(n, p_{0})\).

Rejection Regions

  • \(H_{A}: p > p_{0}\): Reject \(H_{0}\) if \(B \geq b_{\alpha; n, p_{0}}\).

  • \(H_{A}: p < p_{0}\): Reject \(H_{0}\) if \(B \leq c_{\alpha; n, p_{0}}\).

  • \(H_{A}: p \neq p_{0}\): Reject \(H_{0}\) if \(B \geq b_{\alpha_{1}; n, p_{0}}\) or \(B \leq c_{\alpha_{2}; n, p_{0}}\), where \(\alpha_{1}+\alpha_{2}=\alpha\).

  • Due to the discreteness of \(B\), tests cannot be conducted for all \(\alpha\) values.

R Code: binom.test Function

Use the binom.test function in R for performing a binomial test.

x <- 4
n <- 20
binom.test(x, n, p = 0.5,
           alternative = c("two.sided", "less", "greater"),
           conf.level = 0.95)

    Exact binomial test

data:  x and n
number of successes = 4, number of trials = 20, p-value = 0.01182
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.057334 0.436614
sample estimates:
probability of success 
                   0.2 

Parameters Explained

  • \(x\): Number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.
  • \(n\): Number of trials; ignored if x has length 2.
  • \(p\): Hypothesized probability of success.
  • alternative: Specifies the alternative hypothesis.
  • conf.level: Confidence level of the interval.

\(\chi^2\)-Test (Siegel and Castellan, 1988)

Overview

  • The Chi-Squared (\(\chi^2\)) test is used to determine if there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.
  • Commonly used in testing relationships between categorical variables.

Example Question

  • “Are males more likely to accept Offer 3 (O3) than females?”

Test Details

  • The test statistic is based on the observed and expected frequencies.
  • It’s a non-parametric test (not based on parameter estimates).
  • Calculated as: \(\chi^2 = \sum \frac{(O - E)^2}{E}\)
    • Where \(O\) = Observed frequency, \(E\) = Expected frequency.

Degrees of Freedom

  • Calculated as: (Number of rows - 1) × (Number of columns - 1) = 1 in this case.
  • Critical values are looked up in a \(\chi^2\) distribution table.
  • For a one-tailed test with a significance level of 5%, compare the test statistic to the critical value from the table.

Chi-Squared Test in R

# Sample data: Observed frequencies
observed <- matrix(c(10, 20, 30, 40), nrow = 2)

# Perform the Chi-squared test
test.result <- chisq.test(observed)

# Print the test results
print(test.result)

    Pearson's Chi-squared test with Yates' continuity correction

data:  observed
X-squared = 0.44643, df = 1, p-value = 0.504

Fisher Exact Test

What is the Fisher Exact Test?

  • The Fisher Exact Test is a statistical significance test used for small sample sizes.
  • It’s particularly useful for examining the association or independence of row and column variables in 2x2 contingency tables.

When to Use It?

  • Ideal when sample sizes are too small for the Chi-squared test.
  • When you have categorical data and want to compare proportions.

Key Features

  • It calculates the exact probability of observing the data assuming the null hypothesis of no association.
  • More accurate than Chi-squared test for small sample sizes.

Fisher Exact Test in R

R Code Example

# Sample data: Contingency table
observed <- matrix(c(8, 2, 1, 9), nrow = 2, byrow = TRUE)

# Perform Fisher Exact Test
test.result <- fisher.test(observed)

# Display the test results
print(test.result)

    Fisher's Exact Test for Count Data

data:  observed
p-value = 0.005477
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
    2.057999 1740.081669
sample estimates:
odds ratio 
  27.32632 

Wilcoxon Mann-Whitney Test

Overview

  • The Wilcoxon Mann-Whitney test is a nonparametric test that compares two independent samples.
  • It’s used to determine whether there is a significant difference between the medians of two groups.

When to Use It?

  • When data are not normally distributed or when dealing with ordinal data.
  • Appropriate for small sample sizes.

Key Features

  • It tests the null hypothesis that two samples come from the same distribution.
  • Does not assume normal distribution of the data.
  • Often used as an alternative to the independent samples t-test.

Wilcoxon Mann-Whitney Test in R

R Code Example

# Sample data
group1 <- c(7, 8, 9, 5, 6)
group2 <- c(1, 2, 3, 4, 5)

# Perform the Wilcoxon Mann-Whitney test
test.result <- wilcox.test(group1, group2)

# Display the test results
print(test.result)

    Wilcoxon rank sum test with continuity correction

data:  group1 and group2
W = 24.5, p-value = 0.01597
alternative hypothesis: true location shift is not equal to 0

Output includes the test statistic (W) and the p-value. A low p-value indicates a significant difference between the group medians.

Wilcoxon Signed-Rank Test

What is the Wilcoxon Signed-Rank Test?

  • A nonparametric test used to compare two related samples.
  • It assesses whether their population mean ranks differ.

When to Use It?

  • Ideal for paired data when the data are not normally distributed.
  • Commonly used as an alternative to the paired t-test.

Key Features

  • It tests the null hypothesis that the median differences between pairs of observations are zero.
  • Assumes that the differences between pairs are symmetrically distributed.

Interpretation: The test helps assess the effect of an intervention or treatment by comparing measurements taken before and after.

Wilcoxon Signed-Rank Test in R

R Code Example

# Sample paired data
before <- c(120, 101, 130, 108, 143)
after <- c(115, 107, 132, 101, 138)

# Perform the Wilcoxon signed-rank test
test.result <- wilcox.test(before, after, paired = TRUE)

# Display the test results
print(test.result)

    Wilcoxon signed rank test with continuity correction

data:  before and after
V = 10, p-value = 0.5879
alternative hypothesis: true location shift is not equal to 0
  • The wilcox.test function is used with paired = TRUE for paired data.

  • Output includes the test statistic (V) and the p-value.

  • A low p-value suggests a significant difference in the median values of the two groups.

Friedman Test

  • The Friedman test is a non-parametric alternative to one-way ANOVA with repeated measures.
  • It’s used to detect differences in treatments across multiple test attempts.

When to Use

  • Appropriate when dealing with ranked (ordinal) data.
  • Useful for non-normally distributed data or when the assumption of homogeneity of variances is violated.

Friedman test in R

# Sample data: Here, 'data' is a data frame with ranked data
# and 'group' is the grouping factor

library(tidyverse)
library(ggpubr)
library(rstatix)

data("selfesteem", package = "datarium")



selfesteem <- selfesteem %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
# Perform the Friedman test
res.fried <- selfesteem %>% friedman_test(score ~ time |id)
res.fried
# A tibble: 1 × 6
  .y.       n statistic    df        p method       
* <chr> <int>     <dbl> <dbl>    <dbl> <chr>        
1 score    10      18.2     2 0.000112 Friedman test

Effect Sizes

The Importance of Effect Sizes

  • Statistical significance alone is not enough.
  • Effect sizes measure the magnitude of the relationship or difference.

Pearson’s r: Correlation Coefficient

Strength of Relationships

  • Measures the strength and direction of a linear relationship.
  • Ranges from -1 to 1.

R Implementation

# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

# Calculate Pearson's r
cor(x, y, method = "pearson")
[1] 1

Interpreting the Correlation Coefficient

Calculating in R

# Perform Pearson's correlation test
x <- c(1, 2, 3, 4, 5)
y <- c(2, 3, 4, 5, 6)

cor.test(x, y, method = "pearson")

    Pearson's product-moment correlation

data:  x and y
t = 82191237, df = 3, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 1 1
sample estimates:
cor 
  1 

Large Correlation?

  • Interpretation varies by discipline.
  • Greater than 0.5 often considered large in social sciences.

Standardized Mean Difference

Differences in Means

  • Understanding the difference between two group means.
  • Expressed as a ratio of mean difference to standard deviation.

R Implementation

# Assuming 'group1' and 'group2' are vectors of data

group1 <- c(1, 4, 3, 4, 5)
group2 <- c(2, 3, 4, 5, 6)
theta <- (mean(group1) - mean(group2)) / sd(c(group1, group2))

theta
[1] -0.4014898

Cohen’s d

Calculating Effect Size

  • Measure of effect size for the difference between two means.

R Implementation


Cohen's d

d estimate: -0.3872983 (small)
95 percent confidence interval:
    lower     upper 
-1.859353  1.084756 

Interpreting Cohen’s d

  • d = 0.2: Small effect.
  • d = 0.5: Medium effect.
  • d = 0.8: Large effect.