# Chapter 9 Independent Samples *t*-Test

The independent samples *t*-test compares the means of two different (or independent) samples.

For example, let’s say that we were interested in determining if the salary of professors was different depending on whether they were part of an applied or theoretical discipline.

For this example, we will use the `datasetSalaries`

data set.

## 9.1 Null and research hypotheses

### 9.1.1 Traditional approach

\(H_1: \mu_{Applied} - \mu_{Theoretical} \ne 0\) or equivalently \(\mu_{Applied} \ne \mu_{Theoretical}\)

The null hypothesis states there is no difference in the salary of professors who are in an applied discipline compared to a theoretical discipline. The research hypothesis states there is a difference in the salary of professors who are in an applied discipline compared to a theoretical discipline.

### 9.1.2 GLM approach

\[Model: Salary = \beta_0 + \beta_1*Discipline + \varepsilon\] \[H_0: \beta_1 = 0\] \[H_1: \beta_1 \ne 0\]

In addition to the intercept (\(\beta_0\)), we now have a predictor `discipline`

along with its associated slope (\(\beta_1\)). In this model, the slope represents the change in `salary`

over the change in `discipline`

, and the intercept (\(\beta_{0}\)) represents the value when `discipline`

is 0.

The null hypothesis states that the slope associated with `discipline`

is equal to zero. In other words, there is no difference in the salary of professors who are in different disciplines. The alternative hypothesis states that the slope associated with `discipline`

is not equal to zero. In other words, there is a difference in the salary of professors that are in different disciplines.

The interpretation of the slope and intercept depends on how `discipline`

is coded. Thus, it is always a good idea to check how this categorical IV is coded, which can be done using the `contrasts()`

function.

```
## Theoretical
## Applied 0
## Theoretical 1
```

We can see that `Applied`

is coded as `0`

and `Theoretical`

is coded as `1`

. Given, that the difference of coding scheme of `discipline`

is 1, the slope represents the mean difference in salary for professors that are in theoretical disciplines compared to applied disciplines.^{9} Additionally, the intercept (\(\beta_{0}\)) represents the mean salary of professors in the applied discipline since 0 represents `Applied`

in the `discipline`

coding scheme.

If we wanted to change the coding scheme and code `Theoretical`

as `0`

and `Applied`

as `1`

, then our interpretation of the intercept would be the mean salary of professors in the theoretical discipline since `Theoretical`

is now 0. The slope would still have the same interpretation of the mean difference of salary for professors in different disciplines as the difference in the coding scheme is still 1. However, the sign would change.^{10}

These two types of coding schemes are known as dummy coding, which is R’s default coding scheme for categorical variables. Specifically, dummy coding is when one level of an IV is coded as 1 and all others are coded as 0. However, there are other coding schemes such as effects (also known deviant), helmert, polynomial, and orthogonal. For a good description of different contrasts in addition to applying and interpreting them, check out UCLA Statistical Consulting Group’s description. We will also go over coding categorical variables in more detail in the next chapter.

Our preferred contrast for the independent samples *t*-test is to use -0.5 for one group and 0.5 for the other group. We prefer this coding scheme because the slope will still provide the mean difference;^{11} however, since 0 lies in between the two groups, the intercept will now represent the mean of the group means. In this example case, it will be the mean of the mean salary of professors in the applied discipline and the mean salary of professors in the theoretical discipline.

We recommend assigning the group expected to have a higher value in the dependent variable as 0.5 and the other group as -0.5. In our example, we might expect that those in the applied discipline will have higher salaries and thus assign that group to 0.5, while the theoretical discipline to have lower salaries and thus assign them to -0.5. We can do this by using the concatenate `c()`

function to group the numbers together and assign them back to the contrast. The order inside the `c()`

function must be in alphanumerical order of the levels of the independent variable.

```
## [,1]
## Applied 0.5
## Theoretical -0.5
```

Note: If the difference in `discipline`

was not equal to 1, the estimate would equal the fraction of the difference. For example, if `Applied`

was coded as `-1`

and `Theoretical`

was coded as `1`

, the difference of `discipline`

is now 2 and the estimate would represent half of the salary mean difference.^{12} Thus, if we multiplied the estimate by 2, we would obtain the mean salary difference. Even though the estimate changes, the *t*-statistic and *p*-value will not change as the intercept and error will adjust proportionally to the coding scheme of the categorical IV (as long as they are unique values).

## 9.2 Statistical analysis

### 9.2.1 Traditional approach

To perform the traditional independent samples *t*-test, we can again use the `t.test()`

function. However, we will now enter the formula of the GLM into the first argument. For this test, we will assume the variances of each group are equal (not significantly different from each other); however, this should be tested.

```
##
## Two Sample t-test
##
## data: datasetSalaries$salary by datasetSalaries$discipline
## t = 3.1406, df = 395, p-value = 0.001813
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3545.70 15414.83
## sample estimates:
## mean in group Applied mean in group Theoretical
## 118028.7 108548.4
```

Note: There are other ways to enter the formula into `t.test()`

function depending on how the dataset is formatted.

From this output we can see that the *t*-statistic (`t`

) is `3.1406`

, degrees of freedom (`df`

) is `395`

, and `p-value`

is `0.001813`

, with the mean salary of professors in the `Applied`

discipline being `$118,028.70`

and the mean salary of professors in the `Theoretical`

discipline being `$108,548.40`

. Therefore, professors with `Applied`

disciplines earn significantly higher salaries than professors with `Theoretical`

disciplines.

### 9.2.2 GLM approach

```
##
## Call:
## lm(formula = salary ~ 1 + discipline, data = datasetSalaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50748 -24611 -4429 19138 113516
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 113289 1509 75.060 < 2e-16 ***
## discipline1 9480 3019 3.141 0.00181 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 29960 on 395 degrees of freedom
## Multiple R-squared: 0.02436, Adjusted R-squared: 0.02189
## F-statistic: 9.863 on 1 and 395 DF, p-value: 0.001813
```

Notice that in both analyses, the *t*-statistic (`t value`

) of `-3.14`

with `395`

degrees of freedom (df), and *p*-value of `.002`

are identical to the output from the `t.test()`

function.

We can also see that if we subtract the mean salary for professors in the theoretical discipline from the applied discipline from the `t.test()`

results, we obtain the same mean difference in the `estimate`

in the GLM results (i.e., `$118,028.70`

- `$108,548.40`

= `$9,480.30`

).

Furthermore, we can also see that the intercept is the mean of the mean salary from both those in the applied discipline and theoretical discipline (\(\frac{118028.70+108548.40}{2} = 113288.50\)).

## 9.3 Statistical decision

Given the *p*-value of `.002`

is less than the alpha level (\(\alpha\)) of 0.05, we will reject the null hypothesis.

## 9.4 APA statement

An independent samples *t*-test was performed to test if salary of professors was different depending on their discipline. The salary of professors was significantly higher for professors in applied disciplines (*M* = $118,029, *SD* = $29,459) than for professors in theoretical disciplines (*M* = $108,548, *SD* = $30,538), *t*(395) = -3.14, *p* = .002.

## 9.5 Visualization

```
# calculate descriptive statistics along with the 95% CI
dataset_summary <- datasetSalaries %>%
mutate(discipline = ifelse(discipline == "Applied", 0.5, -0.5)) %>%
group_by(discipline) %>%
summarize(
mean = mean(salary),
sd = sd(salary),
n = n(),
sem = sd / sqrt(n),
tcrit = abs(qt(0.05 / 2, df = n - 1)),
ME = tcrit * sem,
LL95CI = mean - ME,
UL95CI = mean + ME
)
mean_of_means <- mean(dataset_summary$mean)
# plot
datasetSalaries %>%
mutate(discipline = ifelse(discipline == "Applied", 0.5, -0.5)) %>%
ggplot(., aes(discipline, salary)) +
geom_jitter(alpha = 0.1, width = 0.05) +
geom_line(data = dataset_summary, aes(x = discipline, y = mean), color = "#3182bd") +
geom_errorbar(data = dataset_summary, aes(x = discipline, y = mean, ymin = LL95CI, ymax = UL95CI), width = 0.02, color = "#3182bd") +
geom_point(data = dataset_summary, aes(x = discipline, y = mean), size = 3, color = "#3182bd") +
geom_point(aes(x = 0, y = mean_of_means), size = 3, color = "#3182bd") +
labs(
x = "Discipline",
y = "9-Month Academic Salary (USD)",
caption = ""
) +
theme_classic() +
scale_y_continuous(
labels = scales::dollar
) +
scale_x_continuous(breaks = c(-1,-.5,0,.5,1)) +
annotate(geom = "text", x = -.5, y = 0, label = "Applied", size = 4) +
annotate(geom = "text", x = .5, y = 0, label = "Theoretical", size = 4)
```

\[b_1 = \frac{\Delta Y}{\Delta X} = \frac{\Delta Salary}{\Delta Discipline} = \frac{\Delta Salary}{1-0} = \frac{\Delta Salary}{1} = \Delta Salary\]↩

\[b_1 = \frac{\Delta Y}{\Delta X} = \frac{\Delta Salary}{\Delta Discipline} = \frac{\Delta Salary}{0-1} = \frac{\Delta Salary}{-1} = -\Delta Salary\]↩

\[b_1 = \frac{\Delta Y}{\Delta X} = \frac{\Delta Salary}{\Delta Discipline} = \frac{\Delta Salary}{0.5-(-0.5)} = \frac{\Delta Salary}{1} = \Delta Salary\]↩

\[b_1 = \frac{\Delta Y}{\Delta X} = \frac{\Delta Salary}{\Delta Discipline} = \frac{\Delta Salary}{1-(-1)} = \frac{\Delta Salary}{2}\]↩