Abstract
The likelihoodR package has been developed to allow users to obtain statistics according to the likelihood approach to statistical inference. Commonly used tests are available in the package, such as: t tests, ANOVA, correlation, regression and a range of categorical analyses. In addition, there is a sample size calculator for t tests, based upon the concepts of strength of evidence, and the probabilities of misleading and weak evidence.Maximum likelihood estimation (MLE) is well-understood and widely
used throughout statistics. In contrast, the use of the likelihood
function as a basis for inference is much less understood and even
confused with MLE. As Edwards wrote in his excellent book : "At one
recent international conference at which I laboured for three-quarters
of an hour to make clear the advantages of likelihood inference, the
chairman thanked me for my lecture on the Method of Maximum Likelihood"
(Edwards 1992) p 101. In the Epilogue of
this book on p 212, Edwards says that MLE is “a red herring”. To
clarify: MLE is used to estimate a parameter value according to a
supposed probability distribution, while likelihood inference is used to
compare two hypotheses through the ratio of their values on a likelihood
function.
All the main statistical approaches to scientific inference
(frequentist, Bayesian and information criterion) are based upon
calculated probabilities. The frequentist approach, for example,
typically uses a sampling distribution centred on a null hypothesis and
calculates the probability of obtaining the observed value or values
more extreme from the null. The likelihood approach, also known as the
evidential approach, differs in that it is simply based upon the
evidence provided by the observed data (not including more extreme
values) and represented by the likelihood function. The ratio of the
heights under the likelihood function according to the hypothesis values
tested provides the likelihood ratio. To create a linear scale of
evidence, the natural logarithm of this is taken to give us the log
likelihood ratio, also known as the support, a term that was first
defined by Harold Jeffreys (Jeffreys
1936).
The likelihood approach is not subject to the same criticisms often
leveled at the frequentist and Bayesian approaches (Goodman and Royall 1988; Edwards 1992; R. M. Royall
1997; Goodman 1999; Dixon 2003; Dienes 2008; Wasserstein and Lazar 2016;
Lakens 2021). Likelihood ratios provide objective measures of
evidence between competing hypotheses, unaffected by the intentions of
the investigator. Log likelihood ratio (support, see below)
values are proportional to the quantity of data, representing the weight
of evidence. This means that support values from independent studies can
simply be added together, e.g. for meta-analysis. Unlike other
approaches based on probabilities, likelihood ratios are unaffected by
transformations of variables.
Despite the apparent advantages of the evidential approach there is a
dearth of available resources in statistical computing. None of the
major commercial packages (e.g. SPSS, SAS, Minitab) provide likelihood
ratios or support values. This should not be confused with the wide
availability of likelihood ratio tests (also known as G-tests), which
ultimately provide p values according to the frequentist
approach. Because virtually no software is available for analysis, this
impacts on the use of the evidential approach in scientific reporting.
This also impacts on the teaching of the evidential approach, which then
negatively feeds back to reduced scientific reporting. The
likelihoodR package for R (P. M.
B. Cahusac 2021) is an attempt to address this situation and
calculations are based upon the recent book (P.
M. B. Cahusac 2020). R (Ihaka and
Gentleman 1996) is a widely used statistical platform with a huge
variety of packages.
The package always reports the support (log likelihood ratio). This
relative evidence scale ranges from -\(\infty\) to +\(\infty\), with zero representing no
evidence either way. A working interpretation for this scale has been
offered by (Goodman and Royall 1988), see
Table 1. No threshold need be applied, and S values are given
to just one decimal place, e.g. 2.3.
| S | Interpretation of H\(_{1}\) vs H\(_{2}\) |
|---|---|
| 0 | No evidence either way |
| 1 | Weak evidence |
| 2 | Moderate evidence |
| 3 | Strong evidence |
| 4 | Extremely strong evidence |
The support (S) values reported in the package are distinct
from the surprise/surprisal S-values described by (Palm 2012) based on Shannon information theory,
and that by (Greenland 2019) produced by
simply taking the negative log base 2 of the p value.
The likelihood intervals are reported wherever possible and these are
given in terms of support rather than likelihood ratio. A typical
likelihood interval (support interval) is the S-2 interval,
which numerically often closely corresponds to the frequentist 95%
confidence interval (P. M. B. Cahusac
2020). The S-2 interval represents an interval based
upon e\(^{-2}\) = 0.135 = 1 / 7.40
likelihood ratio interval. All the values within the interval would have
a likelihood ratio of no less than 1 / 7.40, or an S = -2. The
S-3 interval would be e\(^{-3}\) = 0.05 = 1 / 20.09, and so on.
Analyses also report other relevant statistics, such as t, F
and \(\chi^{2}\), as well as the
corresponding frequentist p value. Where possible the
likelihood function is given, decorated with hypothesis parameter values
shown as coloured lines. Currently there are 14 different statistical
tests implemented by the package, the summary for each are given in
Table 2 (listed alphabetically).
| Function | Description |
|---|---|
L_1way_ANOVA |
Independent samples one-way ANOVA |
L_1way_cat |
One-way categorical data analysis for binomial and multinomial |
L_1way_RM_ANOVA |
One-way repeated measures ANOVA |
L_2S_ttest |
Independent samples t test |
L_2way_cat |
Two-way categorical data analysis |
L_2way_Factorial_ANOVA |
Two-way independent samples factorial ANOVA |
L_corr |
Bivariate normal correlation |
L_efficacy |
Efficacy analysis for binomial categorical data |
L_logistic_regress |
Multiple logistic regression |
L_OR |
Odds ratio |
L_regress |
Bivariate regression for linear, quadratic and cubic comparisons |
L_RR |
Relative risk |
L_ttest |
One sample and related samples t test |
L_t_test_sample_size |
Sample size calculation using the evidential approach for t tests |
A range of tests for differences of continuous variables is
available. One function L_ttest performs a one sample and
related samples test. As well as specifying the null value, two
alternative hypothesis values can be specified, one in terms of the
measurements used and the other in terms of Cohen’s d.
L_ttest(data1, data2, null=0, d=0.5, alt.2=NULL, L.int=2, verb=TRUE)
Where the arguments are:
| data1 | a (non-empty) numeric vector of data values |
| data2 | a (non-empty) numeric vector of data values for related sample, default = NULL |
| null | value for the null hypothesis, default = 0 |
| d | Cohen’s effect size, default = 0.5 |
| alt.2 | value for an alternative hypothesis, in units used for data, default = NULL |
| L.int | likelihood interval given for a given support value, e.g. 2 or 3, default = 2 |
| verb | show output, default = TRUE |
As an example:
> # one sample Gosset's original additional hours of sleep data Cahusac (2020) p 29
> mysample <- c(0.7, -1.6, -0.2, -1.2, -0.1, 3.4, 3.7, 0.8, 0.0, 2.0)
> a1=L_ttest(mysample, d=0.5, alt.2=2, L.int=2)
Maximum support for the observed mean 0.75 (dashed line) against the null 0 (black line) = 0.892
Support for d of 0.5 (0.8945048, blue line) versus null = 0.856
Support for d versus 2nd alt Hypothesis 2 (green line) = 2.131
Support for 2nd alt Hypothesis versus null = -1.275
S-2 likelihood interval (red line) is from -0.44025 to 1.94025
t(9) = 1.326, p = 0.2175978, d = 0.419
L_ttest. The likelihood function
for the mean value given the data, with the MLE (sample mean)
represented by the vertical dashed line. The black line is the specified
null value (here at 0), the blue line is for a specified alternative
hypothesis (for an effect size of 0.5), and the green line represents a
second alternative hypothesis (for 2 hours). The red horizontal line
shows the S-2 likelihood interval.The assigned object a1 contains values of interest which
can be assessed in the usual way with a1$..., but entering
a1 on its own gives all the values:
$obs.mean |
the observed mean or difference in mean for related samples |
$df |
degrees of freedom |
$alt.H1 |
mean value according to specified d |
$alt.H2 |
specified second hypothesis value |
$S_max |
maximum support for observed mean against the null |
$S_10 |
support for d versus null |
$S_12 |
support for d versus specified second hypothesis |
$S_20 |
support for second hypothesis versus the null |
$like.int |
likelihood interval |
$L.int.spec |
specified likelihood interval in units of support |
$null.value |
null value |
$t.val |
t value for test against null |
$p.val |
p value for test against null |
$d.obs |
observed effect size |
For the related sample test a second vector of values of equal length
to data1 (and paired) is entered for data2. The independent
samples test function is L_2S_ttest. The data is entered as
the first argument followed by the vector of the same length coding
groups. Similar arguments to the one sample/related sample test can be
specified and similar output, but this time showing the difference in
means.
The package includes one-way ANOVA (L_1way_ANOVA), one-way
repeated measures ANOVA (L_1way_RM_ANOVA) and two-way
between-participants factorial ANOVA
(L_2way_Factorial_ANOVA). All these functions use
contrasts, employing the model comparison approach espoused by Glover
and Dixon (Dixon 2003, 2013; Glover and Dixon
2004). For the one-way analyses if the arguments for the
contrasts are not specified then they default to testing a linear and a
quadratic contrast. In the factorial ANOVA, no contrast comparisons are
made if the contrasts are left unspecified. If the first contrast is
specified then this is compared to the main effects, and if the second
contrast is specified then this is compared to the first contrast. Such
contrast comparisons are easy to do in this approach, and are difficult
or impossible to do using frequentist p values. The support
values for the between-participants analyses are adjusted using Akaike’s
correction (Hurvich and Tsai 1989). We
will look at L_2way_Factorial_ANOVA. The first line of
output gives the S for comparing the full model of main effects and
interaction with the null model. Using data from P. M. B. Cahusac (2020) p 91
> time <- c(6.4, 4.6, 6.4, 5.6, 5.9, 6.1, 6.3, 4.5, 4.8, 6.6, 7, 9.3, 7.9, 9.4, 8.2,
4.4, 4.2, 5, 6.9,4.5, 4, 4.3, 6.9, 5.5, 5.8, 4.4, 4.2, 5.1, 6.9, 4.5)
> Treatment = gl(3,5,30, labels=c("T1","T2","T3"))
> Health = gl(2,15,30, labels=c("Hemophiliac","Normal"))
> contrast1 <- c(-1, -1, 5, -1, -1, -1) # interaction Hemo T3 higher than others
> contrast2 <- c(-1, -1, -1, 1, 1, 1) # main effect of health status (Hemo higher)
> m1=L_2way_Factorial_ANOVA(time, Treatment, Health, contrast1, contrast2, verb=TRUE)
Support for full model (including interaction) versus null = 7.036
Support for full model versus main effects = 2.968
Support for contrast 1 versus main effects = 7.587
Support for contrast 1 versus contrast 2 = 8.462
First factor main effect F(2,24) = 5.039, p = 0.01488452, partial eta-squared = 0.296
Second factor main effect F(1,24) = 15.992, p = 0.0005281828, partial eta-squared = 0.4
Interaction F(2,24) = 6.219, p = 0.006664992, partial eta-squared = 0.341
Contrast 1 F(1,24) = 36.048, p = 3.373928e-06
L_2way_Factorial_ANOVA function. The interaction plot shows
the means for the 2 groups of patients (dashed line is hemophiliac,
solid line is normal). The horizontal axis represents the 3 different
treatments. An interaction is apparent from the plot, where the 3rd
treatment shows a clear difference between the hemophiliac and normal
patients.As before, the assigned object m1 contains values of
interest which can be assessed in the usual way with
m1$..., but entering m1 on its own gives all
the values.
Bivariate normal correlation uses the L_corr function, and
is demonstrated by using data from the heptathlon (P. M. B. Cahusac 2020) p 104
> m200 <- c(22.6,23.7,23.1,23.6,23.6,23.6,25.5,23.9,24.5,23.9,24.9,24.8,24.7,
25.0,24.6,24.9,25.0,25.6,24.8,25.5,25.7,24.9,26.6,25.2,26.2)
> m800 <- c(128.5,126.1,124.2,132.5,134.7,132.5,138.5,127.9,133.7,132.2,136.1,142.8,
125.8,131.5,137.1,134.9,146.7,133.9,146.4,144.0,133.4,138.0,139.2,137.3,163.4)
> m2=L_corr(m200, m800, null=0, exp.r=0.5, L.int=3, alpha=.05, verb=TRUE)
Support for observed correlation 0.6198 (dashed line) versus null of 0 (black line) = 5.776
Support for specified correlation of 0.5 (blue line) versus observed r = -0.338
Support for specified correlation versus null = 5.438
S-3 likelihood interval (red line) is from 0.19968 to 0.8474
P value = 0.00095
N = 25
L_corr function. This is the likelihood function for the
correlation coefficient, given the data. As before, the vertical dashed
line is the MLE (the sample r), the blue line is a specified
alternative value (0.5), and red horizontal line is the S-3
likelihood interval. The null value at 0 is outside of this interval and
the evidence against it is extremely strong.The assigned object m2 contains values of interest which can be
assessed in the usual way with m2$..., but entering m2 on its own gives
all the values.
The regression function L_regress only accommodates one
predictor, while the logistic regression function
L_logistic_regress allows up to 6 predictors, which need to
be dummy coded for nominal data with more than 2 levels.
There are 5 different tests included in the package (excluding
logistic regression mentioned above). The simplest is the one-way
categorical data analysis using the function L_1way_cat.
Two categories represents the binomial (giving the likelihood function
plot), while multiple categories represents the multinomial
distribution. The two-way categorical analysis uses the
L_2way_cat function. For these two functions an additional
evidence-based statistic S for the variance is calculated. This uses the
formula given by P. M. B. Cahusac (2020) p
158 and derived from Edwards (1992) p
187:
\[\label{eqn}
S = \frac{df}{2} \left( log \frac{df}{\chi_{df}^{2}} \right) -
\frac{1}{2}(df - \chi_{df}^{2}), (\#eq:eqn)\] where df
is the degrees of freedom. This is most useful to test the variance in
the model, specifically whether data are "too good to be true", i.e. the
data fit a particular hypothesis closer than we would expect by chance
(Edwards 1986). Using just 2 categories in
the one-way analysis can be demonstrated:
> obs <- c(18,5); exp.p <- c(0.7, 0.3) # observed and expected values
> m3 <- L_1way_cat(obs, exp.p, verb = TRUE)
Binomial support for difference of MLE 0.7826087 (dashed line)
from 0.7 (blue line) with 1 df = 0.398
Support for variance differing more than expected = 0.019
S-2 likelihood interval (red line) from 0.5853 to 0.91794
Chi-square(1) = 0.747, p = 0.3872968
Likelihood ratio test G(1) = 0.795, p = 0.37258, N = 23
Likelihood-based 95% confidence interval from 0.58958 to 0.91597
L_1way_cat function. This shows the likelihood function for
the proportion, given the data. The vertical dashed line is the MLE, the
blue line is the alternative hypothesis value (0.7), and the red
horizontal line is the S-2 likelihood interval.As previously, the assigned object m3 contains values of
interest which can be assessed in the usual way with
m3$..., but entering m3 on its own gives all
the values.
Some of the functions provide likelihood-based % confidence interval
(Aitkin et al. 1989).
Finally, there are functions to calculate the odds ratio
L_OR, relative risk L_RR and binomial efficacy
L_efficacy.
The main challenge faced by the evidential researcher is of obtaining
sufficiently strong evidence for or against one of two specified
hypotheses. Like the probability of a Type II error, this probability is
large with a small sample size and decreases as sample size increases.
The function L_t_test_sample_size can be used to calculate
the pre-study sample size for all the t tests (P. M. B. Cahusac and Mansour 2022). The
combined misleading and weak probability (R. M.
Royall 1997, 2000; R. Royall 2004) is entered, with a default of
0.05, together with the strength of evidence desired (default = 3). For
a paired samples test, where we wish to calculate sample size with a
combined .2 probability of obtaining misleading or weak evidence,
strength of evidence S = 2 and effect size 0.5, we would obtain
a value of 38 by using the following:
> L_t_test_sample_size(MW = 0.2, sd = 1, d = 0.5, S = 2, paired = TRUE)
For 1 sample, or related samples, t test with M1 + W1 probability of 0.2
Strength of evidence required is 2, and effect size of 0.5
Required sample size = 38
The somewhat comparable calculation for Type II error of 0.2, two-sided alpha = 0.05 and same effect size of 0.5 produces a sample size of 34 (using stats::power.t.test(power = .80, delta = 0.5, sig.level=0.05, type="paired")).
The functions described for the likelihoodR package may be useful for
those researchers and statisticians who wish to use the evidential
approach for their data analysis (P. Cahusac
2020). In addition to the advantages mentioned earlier in the
introduction there are other desirable features. First, categorical data
analyses are not restricted by normality assumptions, and support values
for independent components of cross-tabulated data sum precisely and
algebraically (unlike such calculations in chi-square analyses). Second,
in categorical and measurement analyses it is possible to show that the
data fit the null (or other) hypothesis too well (e.g. for detection of
data fraud). Third, analyses are versatile with unlimited complexity for
model comparisons within a dataset, for example in ANOVA (Glover and Dixon 2004).
As far as the author is aware, no other packages are available in R or
other platforms. Currently users calculate likelihood ratios manually.
This package addresses this shortcoming and hopefully will encourage
more users to express their results in terms of log likelihood
ratios.
One of the products of the likelihoodR package is that a module has been
developed for jamovi (The jamovi project
2021), named jeva, which includes many of these
functions. Hopefully this will encourage further interest in the
likelihood approach and facilitate teaching and research. The equivalent
jamovi analysis is given earlier for the L_ttest produces
output given in Figure 5. The output produced by jamovi is identical to
that produced earlier by the package, although simpler in that it lacks
the option of comparing an effect size (d, illustrated by the blue line
in package output). The null versus observed (1st line of Support
output) is -0.892, while in the package it is 0.892, the positive value
being due to comparing observed versus the null. Other outputs match
apart from decimal rounding, which can be selected in jamovi. The
t, degrees of freedom (df) and p are the same for the
null versus observed, although the jamovi output includes another line
giving these statistics for the alternative hypothesis versus the
observed. The jamovi output includes group descriptive statistics
(although these are available from the assigned object in the package,
e.g. $obs.mean. The jamovi output also includes the option
of a descriptives plot (not shown) which displays the mean with
specified likelihood interval.
The likelihoodR package described in this article provides a large
number of functions, currently many more than envisaged for the jamovi
module. As such, it will provide a major reference package for users
interested in the likelihood approach.
L_ttest function). The screenshot should
be compared directly with Figure 1. On the left side is the dialog box
where a variable can be selected (Drug A), and various settings chosen,
including the null and alternative hypothesis values, and to display the
likelihood function. An additional option gives an explanation about the
obtained support values and likelihood interval, and their
interpretation. On the right side is a summary of the analysis, at the
top showing the S values for hypothesis comparisons (see text
for the interpretation of the tabulated values). Below this is shown the
support interval, and finally below that the likelihood function. The
line colours for the likelihood function are the same as those given in
Figure 1, although there is no option for a specified effect size (no
green line).