JASP stands for Jeffrey’s Amazing Statistics Program in recognition of the pioneer of Bayesian

inference Sir Harold Jeffreys. This is a free multi-platform open-source statistics package, developed

and continually updated by a group of researchers at the University of Amsterdam. They aimed to

develop a free, open-source programme that includes both standard and more advanced statistical

techniques with a major emphasis on providing a simple intuitive user interface.

In contrast to many statistical packages, JASP provides a simple drag and drop interface, easy access

menus, intuitive analysis with real-time computation and display of all results. All tables and graphs

are presented in APA format and can be copied directly and/or saved independently. Tables can also

be exported from JASP in LaTeX format

JASP can be downloaded free from the website and is available for Windows,

Mac OS X and Linux. You can also download a pre-installed Windows version that will run directly from

a USB or external hard drive without the need to install it locally. The WIX installer for Windows

enables you to choose a path for the installation of JASP – however, this may be blocked in some

institutions by local Administrative rights.

The programme also includes a data library with an initial collection of over 50 datasets from Andy

Fields book, Discovering Statistics using IBM SPSS statistics1 and The Introduction to the Practice of

Statistics2 by Moore, McCabe and Craig.

Since May 2018 JASP can also be run directly in your browser via rollApp™ without having to install it

on your computer ( However, this may not be the latest version

of JASP.

Keep an eye on the JASP site since there are regular updates as well as helpful videos and blog posts!!

This book is a collection of standalone handouts covering the most common standard (frequentist)

statistical analyses used by students studying Biological Sciences. Datasets used in this document are

available for download from

Dr Mark Goss-Sampson

Centre for Science and Medicine in Sport & Exercise

University of Greenwich


1 A Field. (2017) Discovering Statistics Using IBM SPSS Statistics (5th Ed.) SAGE Publications. 2 D Moore, G McCabe, B Craig. (2011) Introduction to the Practice of Statistics (7th Ed.) W H Freeman.

The main menu can be accessed by clicking on the top-left icon.


JASP has its own .jasp format but can open a variety of

different dataset formats such as:

• .csv (comma separated values) can be saved in Excel

• .txt (plain text) also can be saved in Excel

• .tsv (tab-separated values) also can be saved in Excel

• .sav (IBM SPSS data file)

• .ods (Open Document spreadsheet)

You can open recent files, browse your computer files,

access the Open Science Framework (OSF) or open the

wide range of examples that are packaged with the Data

Library in JASP.

Save/Save as:

Using these options the data file, any annotations and the analysis

can be saved in the .jasp format


Results can be exported to either an HTML file or as a PDF

Data can be exported to either a .csv, .tsv or .txt file

Sync data:

Used to synchronize with any updates in the current data file (also

can use Ctrl-Y)


As it states - it closes the current file but not JASP


There are three sections that users can use to tweak JASP to suit their needs

In the Data Preferences section users can:

• Synchronize/update the data automatically when the data file is saved (default)

• Set the default spreadsheet editor (i.e. Excel, SPSS etc)

• Change the threshold so that JASP more readily distinguishes between nominal and scale data

• Add a custom missing value code

In the Results Preferences section users can:

• Set JASP to return exact p values i.e. P=0.00087 rather than P<.001

• Fix the number of decimals for data in tables – makes tables easier to read/publish

• Change the pixel resolution of the graph plots

• Select when copying graphs whether they have a white or transparent background.

In the Interface Preferences section users can now define a user font and pick between two different

themes; a light theme (default) and a dark theme. The preferred language currently supports English,

German and Dutch only. In this section, there is also the ability to change the system size (zoom) for

accessibility and the scroll speeds.

In the Advanced Preferences section, most users will probably never have to change any of the default


Comparison of the dark and light themes in JASP

JASP has a streamlined interface to switch between the spreadsheet, analysis and results views.

The vertical bars highlighted above allows for the windows to be dragged right or left by clicking and

dragging the three vertical dots

The individual windows can also be completely collapsed using the right or left arrow icons

If you click the Results icon a range of options is provided including:

• Edit title

• Copy

• Export results

• Add notes

• Remove all

• Refresh all

The ‘add notes’ option allows the results output to be easily annotated and then exported to an HTML

or PDF file by going to File > Export Results.

The Add notes menu provides many options to change text font, colour size etc.

You can change the size of all the tables and graphs using ctrl+ (increase) ctrl- (decrease) ctrl= (back

to default size). Graphs can also be resized by dragging the bottom right corner of the graph.

As previously mentioned, all tables and figures are APA standard and can just be copied into any other

document. Since all images can be copied/saved with either a white or transparent background. This

can be selected in Preferences > Advanced as described earlier.

There are many further resources on using JASP on the website

DATA HANDLING IN JASP For this section open England injuries.csv

All files must have a header label in the first row. Once loaded, the dataset appears in the window:

For large datasets, there is a hand icon which allows easy scrolling through the data.

On import JASP makes a best guess at assigning data to the different variable types:

Nominal Ordinal Continuous

If JASP has incorrectly identified the data type just click on the appropriate variable data icon in the

column title to change it to the correct format.

If you have coded the data you can click on the variable name to open up the following window in

which you can label each code. These labels now replace the codes in the spreadsheet view. If you

save this as a .jasp file these codes, as well as all analyses and notes, will be saved automatically. This

makes the data analysis fully reproducible.

In this window, you can also carry out simple filtering of data, for example, if you untick the Wales

label it will not be used in subsequent analyses.

Clicking this icon in the spreadsheet window opens up a much more comprehensive set of data

filtering options:

Using this option will not be covered in this document. For detailed information on using more

complex filters refer to the following link:


By default, JASP plots data in the Value order (i.e. 1-4). The order can be changed by highlighting the

label and moving it up or down using the appropriate arrows:

Move up

Move down

Reverse order


If you need to edit the data in the spreadsheet just double click on a cell and the data should open up

in the original spreadsheet i.e. Excel. Once you have edited your data and saved the original

spreadsheet JASP will automatically update to reflect the changes that were made, provided that you

have not changed the file name.

The main analysis options can be accessed from the main toolbar. Currently, JASP offers the following

frequentist (parametric and non-parametric standard statistics) and alternative Bayesian tests:


• Descriptive stats


• Correlation

• Linear regression Logistic regression


• Independent

• Paired

• One sample


• Binomial test

• Multinomial test

• Contingency tables

• Log-linear regression*


• Independent

• Repeated measures




• Principal Component Analysis (PCA)*

• Exploratory Factor Analysis (EFA)*

• Confirmatory Factor Analysis (CFA)*

Mixed Models*

• Linear Mixed Models Generalised linear mixed models

* Not covered in this document

BY clicking on the + icon on the top-right menu bar you can also access advanced options that allow the addition of optional modules. Once ticked they will be added to the main analysis ribbon. These


See the JASP website for more information on these advanced modules

Audit Network analysis

BAIN Reliability analysis

Distributions SEM

Equivalence tests Summary statistics

JAGS Visual modelling

Machine learning Learning Bayes

Meta-analysis (included in this guide) R (beta)

Once you have selected your required analysis all the possible statistical options appear in the left

window and output in the right window.

JASP provides the ability to rename and ‘stack’ the results output thereby organising multiple


The individual analyses can be renamed using the pen icon or deleted using the red cross.

By clicking on the analysis in this list will then take you to the appropriate part of the results output

window. They can also be rearranged by dragging and dropping each of the analyses.

The green + icon produces a copy of the chosen analysis

The blue information icon provides detailed information on each of the statistical procedures used

and includes a search option.

13 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

DESCRIPTIVE STATISTICS Presentation of all the raw data is very difficult for a reader to visualise or to draw any inference on.

Descriptive statistics and related plots are a succinct way of describing and summarising data but do

not test any hypotheses. There are various types of statistics that are used to describe data:

• Measures of central tendency

• Measures of dispersion

• Percentile values

• Measures of distribution

• Descriptive plots

To explore these measures, load Descriptive data.csv into JASP. Go to Descriptives > Descriptive

statistics and move the Variable data to the Variables box on the right.

The Statistics menu can now be opened to see the various options available.

This can be defined as the tendency for variable values to cluster around a central value. The three

ways of describing this central value are mean, median or mode. If the whole population is considered

we the term population mean / median/mode is used. If a sample/subset of the population is being

analysed the term sample mean/ median/mode is used. The measures of central tendency move

toward a constant value when the sample size is sufficient to be representative of the population.

In the Statistics options make sure that everything is unticked apart from mean, median and mode.

The mean, M or x̅ (17.71) is equal to the sum of all the values divided by the number of values in the

dataset i.e. the average of the values. It is used for describing continuous data. It provides a simple

statistical model of the centre of distribution of the values and is a theoretical estimate of the ‘typical

value’. However, it can be influenced heavily by ‘extreme’ scores.

The median, Mdn (17.9) is the middle value in a dataset that has been ordered from the smallest to

largest value and is the normal measure used for ordinal or non-parametric continuous data. Less

sensitive to outliers and skewed data

The mode (20.0) is the most frequent value in the dataset and is usually the highest bar in a distribution



In the Statistics options make sure that the following options are ticked

Standard deviation, S or SD (6.94) is used to quantify the amount of dispersion of data values around

the mean. A low standard deviation indicates that the values are close to the mean, while a high

standard deviation indicates that the values are dispersed over a wider range.

Variance (S2 = 48.1) is another estimate of how far the data is spread from the mean. It is also the

square of the standard deviation.

The standard error of the mean, SE (0.24) is a measure of how far the sample mean of the data is

expected to be from the true population mean. As the size of the sample data grows larger the SE

decreases compared to S and the true mean of the population is known with greater specificity.

MAD, median absolute deviation, a robust measure of the spread of data. It is relatively unaffected

by data that is not normally distributed. Reporting median +/- MAD for data that is not normally

distributed is equivalent to mean +/- SD for normally distributed data.

MAD Robust: Median absolute deviation of the data points, adjusted by a factor for asymptotically

normal consistency.

IQR - Interquartile Range is similar to the MAD but is less robust (see Boxplots).

Confidence intervals (CI), although not shown in the general Descriptive statistics output, these are

used in many other statistical tests. When sampling from a population to get an estimate of the mean,

confidence intervals are a range of values within which you are n% confident the true mean is

included. A 95% CI is, therefore, a range of values that one can be 95% certain contains the true mean

of the population. This is not the same as a range that contains 95% of ALL the values.

For example, in a normal distribution, 95% of the data are expected to be within ± 1.96 SD of the mean

and 99% within ± 2.576 SD.

95% CI = M ± 1.96 * the standard error of the mean.

Based on the data so far, M = 17.71, SE = 0.24, this will be 17.71 ± (1.96 * 0.24) or 17.71 ± 0.47.

Therefore the 95% CI for this dataset is 17.24 - 18.18 and suggests that the true mean is likely to be

within this range 95% of the time


In the Statistics options make sure that everything is unticked apart from Quartiles.

Quartiles are where datasets are split into 4 equal quarters, normally based on rank ordering of

median values. For example, in this dataset

1 1 2 2 3 3 4 4 4 4 5 5 5 6 7 8 8 9 10 10 10

25% 50% 75%

The median value that splits data by 50% = 50th percentile = 5

The median value of left side = 25th percentile = 3

The median value of right side = 75th percentile = 8

From this the Interquartile range (IQR) range can be calculated, this is the difference between the 75th

and 25th percentiles i.e. 5. These values are used to construct the descriptive boxplots later. The IQR

can also be shown by ticking this option in the Dispersion menu.


Skewness describes the shift of the distribution away from a normal distribution. Negative skewness

shows that the mode moves to the right resulting in a dominant left tail. Positive skewness shows

that the mode moves to the left resulting in a dominant right tail.

Kurtosis describes how heavy or light the tails are. Positive kurtosis results in an increase in the

“pointiness” of the distribution with heavy (longer) tails while negative kurtosis exhibit a much more

uniform or flatter distribution with light (shorter) tails.

Negative skewness Positive skewness

+ kurtosis


- kurtosis

In the Statistics options make sure that everything is unticked apart from skewness, kurtosis and

Shapiro-Wilk test.

We can use the Descriptives output to calculate skewness and kurtosis. For a normal data distribution,

both values should be close to zero. The Shapiro-Wilk test is used to assess whether or not the data is

significantly different from a normal distribution. (see - Exploring data integrity in JASP for more


DESCRIPTIVE PLOTS IN JASP Currently, JASP produces a range of descriptive plots:

Again, using Descriptive data.csv with the variable data in the Variables box, go to the statistics

options and under Plots tick Distribution plots, Boxplots – Boxplot Element and Q-Q plots.

The Distribution plot is based on splitting the data into frequency bins, this is then overlaid with the

distribution curve. As mentioned before, the highest bar is the mode (most frequent value of the

dataset. In this case, the curve looks approximately symmetrical suggesting that the data is

approximately normally distributed. The second distribution plot is from another dataset which shows

that the data is positively skewed.

20 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The boxplots visualise several statistics described above in one plot:

• Median value

• 25 and 75% quartiles

• Interquartile range (IQR) i.e. 75% - 25% quartile values

• Maximum and minimum values plotted with outliers excluded

• Outliers are shown if requested

Maximum value

Median value

Minimum value

75% quartile

25% quartile


Top 25%

Bottom 25%


Go back to the statistics options, in Descriptive plots tick both Boxplot and Violin Element, look at how

the plot has changed. Next tick Boxplot, Violin and Jitter Elements. The Violin plot has taken the

smoothed distribution curve from the Distribution plot, rotated it 90o and superimposed it on the

boxplot. The jitter plot has further added all the data points.

A Q-Q plot (quantile-quantile plot) can be used to visually assess if a set of data comes from a normal

distribution. Q-Q plots take the sample data, sort it in ascending order, and then plot them against

quantiles (percentiles) calculated from a theoretical distribution. If the data is normally distributed,

the points will fall on or close to the 45-degree reference line. If the data is not normally distributed,

the points will deviate from the reference line.

Boxplot + Violin plot Boxplot + Violin + Jitter plot

Scatter Plots

JASP can produce scatterplots of various types and to be able to include smooth or linear regression

lines. There are also options to add distributions to these either in the form of density plots or


Pie charts

Also, users can plot piecharts when working with categorical or other frequency data.

Plot colour palettes

Users can choose from between 5 different colour palettes using the drop-down menu.

SPLITTING DATA FILES If there is a grouping variable (categorical or ordinal) descriptive statistics and plots can be produced

for each group. Using Descriptive data.csv with the variable data in the Variables box now add Group

to the Split box.

The output will be as follows:

24 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

EXPLORING DATA INTEGRITY Sample data is used to estimate parameters of the population whereby a parameter is a measurable

characteristic of a population, such as a mean, standard deviation, standard error or confidence

intervals etc.

What is the difference between a statistic and a parameter? If you randomly polled a selection of

students about the quality of their student bar and you find that 75% of them were happy with it. That

is a sample statistic since only a sample of the population were asked. You calculated what the

population was likely to do based on the sample. If you asked all the students in the university and

90% were happy you have a parameter since you asked the whole university population.

Bias can be defined as the tendency of a measurement to over or under-estimate the value of a

population parameter. There are many types of bias that can appear in research design and data

collection including:

• Participant selection bias – some being more likely to be selected for study than others

• Participant exclusion bias - due to the systematic exclusion of certain individuals from the


• Analytical bias - due to the way that the results are evaluated

However statistical bias can affect a) parameter estimates, b) standard errors and confidence intervals

or c) test statistics and p values. So how can we check for bias?


Outliers are data points that are abnormally outside all other data points. Outliers can be due to a

variety of things such as errors in data input or analytical errors at the point of data collection Boxplots

are an easy way to visualise such data points where outliers are outside the upper (75% + 1.5 * IQR)

or lower (25% - 1.5 * IQR) quartiles

Boxplots show:

• Median value

• 25 & 75% quartiles

• IQR – Inter quartile range

• Max & min values plotted

with outliers excluded

• Outliers shown if requested

Load Exploring Data.csv into JASP. Under Descriptives > Descriptive Statistics, add Variable 1 to the

Variables box. In Plots tick the following Boxplots, Label Outliers, and BoxPlot Element.

The resulting Boxplot on the left looks very compressed and an obvious outlier is labelled as being in

row 38 of the dataset. This can be traced back to a data input error in which 91.7 was input instead of

917. The graph on the right shows the BoxPlot for the ‘clean’ data.

27 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

How you deal with an outlier depends on the cause. Most parametric tests are highly sensitive to

outliers while non-parametric tests are generally not.

Correct it? – Check the original data to make sure that it isn’t an input error, if it is, correct it, and

rerun the analysis.

Keep it? - Even in datasets of normally distributed, data outliers may be expected for large sample

sizes and should not automatically be discarded if that is the case.

Delete it? – This is a controversial practice in small datasets where a normal distribution cannot be

assumed. Outliers resulting from an instrument reading error may be excluded but it should be verified


Replace it? – Also known as winsorizing. This technique replaces the outlier values with the relevant

maximum and/or minimum values found after excluding the outlier.

Whatever method you use must be justified in your statistical methodology and subsequent analysis.


When using parametric tests, we make a series of assumptions about our data and bias will occur if

these assumptions are violated, in particular:

• Normality

• Homogeneity of variance or homoscedasticity

Many statistical tests are an omnibus of tests of which some will check these assumptions.

Normality does not mean necessarily that the data is normally distributed per se but it is whether or

not the dataset can be well modelled by a normal distribution. Normality can be explored in a variety

of ways:

• Numerically

• Visually / graphically

• Statistically

Numerically we can use the Descriptives output to calculate skewness and kurtosis. For a normal data

distribution, both values should be close to zero. To determine the significance of skewness or kurtosis

we calculate their z-scores by dividing them by their associated standard errors:

Skewness Z = skewness

Skewness standard error Kurtosis Z =


kurtosis standard error

Z score significance: p<0.05 if z >1.96 p<0.01 if z >2.58 p<0.001 if z >3.29

Using Exploring data.csv, go to Descriptives>Descriptive Statistics move Variable 3 to the Variables

box, in the Statistics drop-down menu select Mean, Std Deviation, Skewness and as shown below with

the corresponding output table.

It can be seen that both skewness and kurtosis are not close to 0. The positive skewness suggests that

data is distributed more on the left (see graphs later) while the negative kurtosis suggests a flat

distribution. When calculating their z scores it can be seen that the data is significantly skewed p<0.05.

Skewness Z = 0.839

0.337 = 2.49 Kurtosis Z =


0.662 = 0.614

[As a note of caution skewness and kurtosis many appear significant in large datasets even though the

distribution is normal.]

Now add Variable 2 to the Variables box and in Plots tick Distribution plot. This will show the following

two graphs:

It is quite easy to visualise that Variable 2 has a symmetrical distribution. Variable 3 is skewed to the

left as confirmed by the skewness Z score.

Another graphical check for normality is a Q-Q plot. Q-Q plots are available in Descriptives and are

also produced as part of the Assumption Checks used in linear regression and ANOVA. Q-Q plots show

the quantiles of the actual data against those expected for a normal distribution.

If data are normally distributed all the points will be close to the diagonal reference line. If the points

‘sag’ above or below the line there is a problem with kurtosis. If the points snake around the line then

the problem is skewness. Below are Q-Q plots for Variables 2 and 3. Compare these to the previous

distribution plots and the skewness/kurtosis z scores above.

Variable 2 Variable 3

30 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The following Q-Q plot scenarios are possible:

The Shapiro-Wilk test is a statistical way used by JASP to check the assumption of normality. It is also

used in the Independent (distribution of the two groups) and Paired (distribution of differences

between pairs) t-tests. The test results in a W value; where small values indicate your sample is not

normally distributed (the null hypothesis that your population is normally distributed if your values

are under a certain threshold can, therefore, be rejected).

In Descriptives, the Shapiro-Wilk test can be selected in the Distribution tests. The Shapiro-Wilk

output table shows no significant deviation in normality for Variable 2 but a significant deviation

(p<.001) for Variable 3.

The most important limitation is that the test has can be biased by sample size. The larger the sample,

the more likely you’ll get a statistically significant result.

Testing the assumption of normality – A cautionary note!

For most parametric tests to be reliable, one of the assumptions is that the data is approximately

normally distributed. A normal distribution peaks in the middle and is symmetrical about the mean.

However, data does not need to be perfectly normally distributed for the tests to be reliable.

So, having gone on about testing for normality – is it necessary?

The Central Limit Theorem states that as the sample size gets larger i.e. >30 data points the

distribution of the sampling means approaches a normal distribution. So the more data points you

have the more normal the distribution will look and the closer your sample mean approximates the

population mean.

Large datasets may result in significant tests of normality i.e. Shapiro-Wilk or significant skewness and

kurtosis z-scores when the distribution graphs look fairly normal. Conversely, small datasets will

reduce the statistical power to detect non-normality.

However, data that does not meet the assumption of normality is going to result in poor results for

certain types of test (i.e. ones that state that the assumption must be met!). How closely does your

data need to be normally distributed? This is a judgment call best made by eyeballing the data.


Transform the data and redo the normality checks on the transformed data. Common transformations

include taking the log or square root of the data.

Use non-parametric tests since these are distribution-free tests and can be used instead of their

parametric equivalent.


Levene’s test is commonly used to test the null hypothesis that variances in different groups are equal.

The result from the test (F) is reported as a p-value, if not significant then you can say that the null

hypothesis stands — that the variances are equal; if the p-value is significant then the implication is

that the variances are unequal. Levene’s test is included in the Independent t-test and ANOVA in

JASP as part of the Assumption Checks.

Using Exploring data.csv, go to T-Tests>Independent Samples t-test move Variable 1 to the Variables

box and Group to the Grouping variable and tick Assumption Checks > Equality of variances.

In this case, there is no significant difference in variance between the two groups F (1) = 0.218, p=.643.

The assumption of homoscedasticity (equal variance) is important in linear regression models as is

linearity. It assumes that the variance of the data around the regression line is the same for all

predictor data points. Heteroscedasticity (the violation of homoscedasticity) is present when the

variance differs across the values of an independent variable. This can be visually assessed in linear

regression by plotting actual residuals against predicted residuals

33 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

If homoscedasticity and linearity are not violated there should be no relationship between what the

model predicts and its errors as shown in the graph on the left. Any sort of funnelling (middle graph)

suggests that homoscedasticity has been violated and any curve (right graph) suggests that linearity

assumptions have not been met.

DATA TRANSFORMATION The ability to compute new variables or to transform data was introduced in version 0.9.1. In

some cases, it may be useful to compute the differences between repeated measures or, to

make a dataset more normally distributed, you can apply a log transform for example. When

a dataset is opened there will be a plus sign (+) at the end of the columns.

Clicking on the + opens up a small dialogue window where you can;

• Enter the name of a new variable or the transformed variable

• Select whether you enter the R code directly or use the commands built into JASP

• Select what data type is required

Once you have named the new variable and chose the other options – click create.

If you choose the manual option rather than the R code, this opens all the built-in create and

transform options. Although not obvious, you can scroll the left and right-hand options to see

more variables or more operators respectively.

For example, we want to create a column of data showing the difference between variable 2

and variable 3. Once you have entered the column name in the Create Computed Column

dialogue window, its name will appear in the spreadsheet window. The mathematical

operation now needs to be defined. In this case drag variable 2 into the equation box, drag

the ‘minus’ sign down and then drag in variable 3.

If you have made a mistake, i.e. used the wrong variable or operator, remove it by dragging

the item into the dustbin in the bottom right corner.

36 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

When you are happy with the equation/operation, click compute column and the data will be


If you decide that you do not want to keep the derived data, you can remove the column by

clicking the other dustbin icon next to the R.

Another example is to do a log transformation of the data. In the following case variable 1 has

been transformed by scrolling the operators on the left and selecting the log10(y) option.

Replace the “y” with the variable that you want to transform and then click Compute column.

When finished, click the X to close the dialogue.

37 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The two graphs below show the untransformed and the log10 transformed data. The skewed

data has been transformed into a profile with a more normal distribution

The Export function will also export any new data variables that have been created.


Log10 transformed

EFFECT SIZE When performing a hypothesis test on data we determine the relevant statistic (r, t, F etc) and p-value

to decide whether to accept or reject the null hypothesis. A small p-value, <0.05 in most analyses

provides evidence against the null hypothesis whereas a large p-value >0.05 only means that there is

not enough evidence to reject the null hypothesis. A lower p-value is sometimes incorrectly

interpreted as meaning there is a stronger relationship of difference between variables. So what is

needed is not just null hypothesis testing but also a method of determining precisely how large the

effects seen in the data are.

An effect size is a statistical measure used to determine the strength of the relationship or difference

between variables. Unlike a p-value, effect sizes can be used to quantitatively compare the results

of different studies.

For example, comparing heights between 11 and 12-year-old children may show that the 12-year-

olds are significantly taller but it is difficult to visually see a difference i.e. small effect size. However,

a significant difference in height between 11 and 16-year-old children is obvious to see (large effect


The effect size is usually measured in three ways:

• the standardized mean difference

• correlation coefficient

• odds ratio

When looking at differences between groups most techniques are primarily based on the differences

between the means divided by the average standard deviations. The values derived can then be used

to describe the magnitude of the differences. The effect sizes calculated in JASP for t-tests and ANOVA

are shown below:

When analysing bivariate or multivariate relationships the effect sizes are the correlation


When analysing categorical relationships via contingency tables i.e. chi-square test Phi is only used for

2x2 tables while Cramer’s V and be used for any table size.

For a 2 × 2 contingency table, we can also define the odds ratio measure of effect size.

ONE SAMPLE T-TEST Research is normally carried out in sample populations, but how close does the sample reflect the

whole population? The parametric one-sample t-test determines whether the sample mean is

statistically different from a known or hypothesized population mean.

The null hypothesis (Ho) tested is that the sample mean is equal to the population mean.


Three assumptions are required for a one-sample t-test to provide a valid result:

• The test variable should be measured on a continuous scale.

• The test variable data should be independent i.e. no relationship between any of the data


• The data should be approximately normally distributed

• There should be no significant outliers.


Open one sample t-test.csv, this contains two columns of data representing the height (cm) and body

masses (kg) of a sample population of males used in a study. In 2017 the average adult male in the UK

population was 178 cm tall and has a body mass of 83.6 kg.

Go to T-Tests > One-Sample t-test and in the first instance add height to the analysis box on the right.

Then tick the following options and add 178 as the test value:

41 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The output should contain three tables.

The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the heights are

normally distributed, therefore this assumption is not violated. If this showed a significant difference

the analysis should be repeated using the non-parametric equivalent, Wilcoxon’s signed-rank test

tested against the population median height.

This table shows that there are no significant differences between the means p =.706

The descriptive data shows that the mean height of the sample population was 177.6 cm compared

to the average 178 cm UK male.

Repeat the procedure by replacing height with mass and change the test value to 83.6.

42 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the masses are

normally distributed.

This table shows that there is a significant difference between the mean sample (72.9 kg) and

population body mass (83.6 kg) p <.001


A one-sample t-test showed no significant difference in height compared to the population mean (t

(22) = -0.382, p= .706), however, the participants were significantly lighter than the UK male

population average (t (22) =-7.159, p<.001).

BINOMIAL TEST The binomial test is effectively a non-parametric version of the one-sample t-test for use with

dichotomous (i.e. yes/no) categorical datasets. This tests whether or not the sample frequency is

statistically different from a known or hypothesized population frequency.

The null hypothesis (Ho) tested is that the sample data frequency is equal to the expected population


ASSUMPTIONS Three assumptions are required for a binomial test to provide a valid result:

• The test variable should be a dichotomous scale (such as yes/no, male/female etc.).

• The sample responses should be independent

• The sample size is less, but representative of the population


Open binomial.csv, this contains one column of data showing the number of students using either a

Windows laptop or a MacBook at University. In January 2018, when comparing just the two operating

systems, the UK market share of Windows was 86% and Mac IOS 14%.3

Go to Frequencies >Binomial test. Move the Laptop variable to the data window and set the Test value

to 0.86 (86%). Also, tick Descriptive plots.



44 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The following table and graph show that the frequencies of both laptops are significantly less than

86%. In particular, these students are using significantly fewer Windows laptops than was expected

compared to the UK market share.

Is this the same for MacBook users? Go back to the Options window and change the test value to

0.14 (14%). This time both frequencies are significantly higher than 14%. This shows that students

are using significantly more MacBooks than was expected compared to the UK market share.

45 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The UK proportion of Windows and MacBook users was reported to be 86% and 14% respectively. In

a cohort of University students (N=90), a Binomial test revealed that the proportion of students using

Windows laptops was significantly less (59.6%, p<.001) and those using MacBooks significantly more

(40.4%, p<.001) than expected.

MULTINOMIAL TEST The multinomial test is effectively an extended version of the Binomial test for use with categorical

datasets containing three or more factors. This tests whether or not the sample frequency is

statistically different from a hypothesized population frequency (multinomial test) or a known (Chi-

square ‘goodness-of-fit’ test).

The null hypothesis (Ho) tested is that the sample frequency is equal to the expected population


ASSUMPTIONS Three assumptions are required for a multinomial test to provide a valid result:

• The test variable should be a categorical scale containing 3 or more factors

• The sample responses should be independent

• The sample size is less, but representative of the population


Open multinomial.csv. This contains three columns of data showing the number of different coloured

M&Ms counted in five bags. Without any prior knowledge, it could be assumed that the different

coloured M&Ms are equally distributed.

Go to Frequencies > Multinomial test. Move colour of the M&Ms to Factor and the observed number

of M&Ms to counts. Tick Descriptives and Descriptives Plots.

47 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

As can be seen in the Descriptive table, the test assumes an equal expectation for the proportions of

coloured M&Ms (36 of each colour). The Multinomial test results show that the observed distribution

is significantly different (p<.001) to an equal distribution.

However, further research shows that the manufacturer produces coloured M&Ms in different ratios:

Colour Blue Brown Green Orange Red Yellow

Proportion 24 13 16 20 13 14

These values can now be used as the expected counts, so move the Expected variable to the Expected

Counts box. This automatically runs the χ2 ‘goodness-of-fit’ test leaving the Hypothesis options greyed


As can be seen in the Descriptives table, JASP has calculated the expected numbers of the different

coloured M&Ms based on the manufacturers reported production ratio. The results of the test show

that the observed proportions of the different coloured M&Ms are significantly different (χ2 =74.5,

p<.001) to those proportions stated by the manufacturer.

49 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

MULTINOMIAL AND Χ2 ‘GOODNESS-OF-FIT’ TEST. JASP also provides another option whereby both tests can be run at the same time. Go back to the

Options window and only add Colour to the Factor and Observed to the Counts boxes, remove the

expected counts if the variable is still there. In Hypotheses now tick the χ2 test. This will open up a

small spreadsheet window showing the colour and Ho (a) with each cell have 1 in it. This is assuming

that the proportions of each colour are equal (multinomial test).

In this window, add another column which will automatically be labelled Ho (b). The expected

proportions of each colour can now be typed in.

Now when the analysis is run, the results of the tests for the two hypotheses are shown. Ho (a) is

testing the null hypothesis that the proportions of each colour are equally distributed, while Ho (b) is

testing the null hypothesis that the proportions are the same as those expected. As can be seen, both

hypotheses are rejected. In particular, evidence indicates that the colours of plain M&M's do not

match the manufacturers published proportions.

INDEPENDENT T-TEST The parametric independent t-test, also known as Student’s t-test, is used to determine if there is a

statistical difference between the means of two independent groups. The test requires a continuous

dependent variable (i.e. body mass) and an independent variable comprising 2 groups (i.e. males and


This test produces a t-score which is a ration of the differences between the two groups and the

differences within the two groups:

t = 𝒎𝒆𝒂𝒏 𝒈𝒓𝒐𝒖𝒑 𝟏 − 𝒎𝒆𝒂𝒏 𝒈𝒓𝒐𝒖𝒑 𝟐

𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 𝒐𝒇 𝒕𝒉𝒆 𝒎𝒆𝒂𝒏 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒄𝒆𝒔

A large t-score indicates that there is a greater difference between groups. The smaller the t-score,

the more similarity there is between groups. A t-score of 5 means that the groups are five times as

different from each other as they are within each other.

The null hypothesis (Ho) tested is that the population means from the two unrelated groups are equal


Group independence:

Both groups must be independent of each other. Each participant will only provide one data point for

one group. For example participant 1 can only be in either a male or female group – not both.

Repeated measures are assessed using the Paired t-test.

Normality of the dependent variable:

The dependent variable should also be measured on a continuous scale and be approximately

normally distributed with no significant outliers. This can be checked using the Shapiro-Wilk test. The

t-test is fairly robust and small deviations from normality are normally acceptable. However, this is

not the case if the group sizes are very different. A rule of thumb is that the ratio between the group

sizes should be <1.5 (i.e. group A = 12 participants and group B = >8 participants).

If normality is violated you can try transforming your data (for example log values, square root values)

or, and if the group sizes are very different, use the Mann-Whitney U test which is a non-parametric

equivalent that does not require the assumption of normality (see later).

X = mean

S = standard deviation

n = number of data points

Homogeneity of variance:

The variances of the dependent variable should be equal in each group. This can be tested using

Levene's Test of Equality of Variances.

If the Levene's Test is statistically significant, indicating that the group variances are unequal we can

correct for this violation by using an adjusted t-statistic based on the Welch method.


Open Independent t-test.csv, this contains weight loss on a self-controlled 10-week diet between men

and women. Its good practice to check the Distribution and boxplots in Descriptives to visually check

for distribution and outliers.

Go to T-Tests > Independent Samples t-test and put weight loss in the Dependent variable box and

gender (independent variable) in the Grouping Variable box.

Unequal variance Equal variance

In the analysis window tick the following options:


The output should consist of four tables and one graph. Firstly we need to check that the parametric

assumptions required are not violated.

Shapiro-Wilk test shows that both groups have normally distributed data, therefore, the assumption

of normality is not violated. If one or both were significant you should consider using the non-

parametric equivalent Mann-Whitney test.

53 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Levene’s test shows that there is no difference in the variance, therefore, the assumption of

homogeneity of variance is not violated. If Levene’s test was significant Welch’s adjusted t-statistic,

degrees of freedom and p values should be reported.

This table shows the two computed t-statistics (Student and Welch). Remember the t-statistic is

derived from the mean difference divided by the standard error of the difference. Both show that

there is a significant statistical difference between the two groups (p<.001) and Cohen’s d suggests

that this is a large effect.

From the descriptive data, it can be seen that females had a higher weight loss than males.


An independent t-test showed that females lost significantly more weight over 10 weeks dieting than

males t(85)=6.16, p<.001. Cohen’s d (1.322) suggests that this is a large effect.

MANN-WITNEY U TEST If you find that your data is not normally distributed (significant Shapiro-Wilk test result) or is ordinal

by nature, the equivalent non-parametric independent test is the Mann-Whitney U test.

Open Mann-Whitney pain.csv which contains subjective pain scores (0-10) with and without ice

therapy. NOTE: make sure that Treatment is categorical and pain score is ordinal. Go to T-Tests >

Independent t-tests and put pain score in the Dependent variable box and use Treatment as the

grouping variable.

In the analysis options only tick:

✓ Mann-Whitney

✓ Location parameter

✓ Effect size

There is no reason to repeat the assumption checks since Mann-Whitney does not require the

assumption of normality or homogeneity of variance required by parametric tests.


This time you will only get one table:

The Mann-Whitney U-statistic (JASP reports this as W since it is an adaptation of Wilcoxon’s signed-

rank test) is highly significant. U=207, p<.001.

The location parameter, the Hodges–Lehmann estimate, is the median difference between the two

groups. The rank-biserial correlation (rB) can be considered as an effect size and is interpreted the

same as Pearson’s r, so 0.84 is a large effect size.

For non-parametric data, you should report median and MAD (or IQR) values as your descriptive

statistics and use boxplots instead of line graphs and confidence intervals, SD/SE bars. Go to

Descriptive statistics, put Pain score into the variable box and Split the file by Treatment.

55 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


A Mann-Whitney test showed that Ice therapy significantly reduces pain scores (Mdn = 3) compared

to the control group (Mdn = 7), U=207, p<.001.

PAIRED SAMPLES T-TEST As with the Independent t-test, there are both parametric and non-parametric options available in

JASP. The parametric paired-samples t-test (also known as the dependent sample t-test or repeated

measures t-test) compares the means between two related groups on the same continuous,

dependent variable. For example, looking at weight loss pre and post 10 weeks dieting.

The paired t statistic = mean of the differences between group pairs

the standard error of the mean differences

With the paired t-test, the null hypothesis (Ho) is that the pairwise difference between the two

groups is zero.


Four assumptions are required for a paired t-test to provide a valid result:

• The dependent variable should be measured on a continuous scale.

• The independent variable should consist of 2 categorical related/matched groups, i.e. each

participant is matched in both groups

• The differences between the matched pairs should be approximately normally distributed

• There should be no significant outliers in the differences between the 2 groups.


Open Paired t-test.csv in JASP. This contains two columns of paired data, pre-diet body mass and post

4 weeks of dieting. Go to T-Tests > Paired Samples t-test. Ctrl-click both variables and add them to the

analysis box on the right.

In the analysis options tick the following:

57 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The output should consist of three tables and one graph.

The assumption check of normality (Shapiro-Wilk) is not significant suggesting that the pairwise

differences are normally distributed, therefore the assumption is not violated. If this showed a

significant difference the analysis should be repeated using the non-parametric equivalent,

Wilcoxon’s signed-rank test.

This shows that there is a significant difference in body mass between the pre and post dieting

conditions, with a mean difference (location parameter) of 3.783kg. Cohen’s d states that this is a

large effect.

58 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The descriptive statistics and plot show that there was a reduction in body mass following 4 weeks of



On average participants lost 3.78 kg (SE: 0.29 kg) body mass following a 4-week diet plan. A paired

samples t-test showed this decrease to be significant (t (77) =13.04, p<.001). Cohen’s d suggests that

this is a large effect

If you find that your data is not normally distributed (significant Shapiro-Wilk test result) or is ordinal

by nature, the equivalent non-parametric independent test is the Wilcoxon’s signed-rank test. Open

Wilcoxon’s rank.csv. This has two columns one with pre-anxiety and post hypnotherapy anxiety scores

(from 0 - 50). In the dataset view make sure that both variables are assigned to the ordinal data type.

Go to T-Tests > Paired Samples t-test and follow the same instructions as above but now only tick the

following options:

There will be only one table in the output:

The Wilcoxon W-statistic is highly significant, p<0.001.

The location parameter, the Hodges–Lehmann estimate, is the median difference between the two

groups. The rank-biserial correlation (rB) can be considered as an effect size and is interpreted the

same as Pearson’s r, so 0.48 is a medium to large effect size.

Effect size Trivial Small Medium Large

Rank -biserial (rB) <0.1




60 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

For non-parametric data, you should report median values as your descriptive statistics and use

boxplots instead of line graphs and confidence intervals, SD/SE bars.


A Wilcoxon’s signed-rank test showed that hypnotherapy significantly reduces anxiety scores (Mdn =

15) compared to pre-therapy (Mdn =22) scores, W=322, p<.001.

CORRELATION ANALYSIS Correlation is a statistical technique that can be used to determine if, and how strongly, pairs of

variables are associated. Correlation is only appropriate for quantifiable data in which numbers are

meaningful, such as continuous or ordinal data. It cannot be used for purely categorical data for which

we have to use contingency table analysis (see Chi-square analysis in JASP).

Essentially do different variables co-vary? i.e. are changes in one variable reflected in similar changes

to another variable? If one variable deviates from its mean does the other variable deviate from its

mean in either the same or opposite direction? This can be assessed by measuring covariance,

however, this is not standardised. For example, we can measure the covariance of two variables which

are measured in meters, however, if we convert the same values to centimetres, we get the same

relationship but with a completely different covariance value.

To overcome this, standardised covariance is used which is known as Pearson’s correlation coefficient (or "r"). It ranges from -1.0 to +1.0. The closer r is to +1 or -1, the more closely the two

variables are related. If r is close to 0, there is no relationship. If r is (+) then as one variable increases

the other also increases. If r is (-) then as one increases, the other decreases (sometimes referred to

as an "inverse" correlation).

The correlation coefficient (r) should not be confused with R2 (coefficient of determination) or R

(multiple correlation coefficient as used in the regression analysis).

The main assumption in this analysis is that the data have a normal distribution and are linear. This

analysis will not work well with curvilinear relationships.

Covariance = 4.7 Covariance = 470

The analysis tests the null hypothesis (H0) that there is no association between the two variables

From the example, data open Jump height correlation.csv which contains 2 columns of data, jump

height (m) and explosive leg power (W). Firstly run the Descriptive statistics and check the boxplots

for any outliers.

To run the correlation analysis go to Regression > Correlation. Move the 2 variables to the analysis

box on the right. Tick

✓ Pearson,

✓ Report significance,

✓ Flag significant correlations

Under Plots

✓ Scatter plots

✓ Heatmap

63 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The first table shows the correlation matrix with Pearson’s r value and its p-value. This shows a highly

significant correlation (p<.001) with a large r value close to 1 (r= 0.984) and that we can reject the null


For simple correlations like this it is easier to look at the pairwise table (go back to analysis and tick

the Display pairwise table option. This replaces the correlation matrix in the results which may be

easier to read.

The Pearson’s r value is an effect size where <0.1 is trivial, 0.1 -0.3 is a small effect, 0.3 – 0.5 a moderate

effect and >0.5 a large effect.

The plot provides a simple visualisation of this strong positive correlation (r = 0.984, p<.001) which is

also highlighted by the heatmap (more relevant when looking at multiple correlations).

64 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


If you take the correlation coefficient r and square it you get the coefficient of determination (R2). This

is a statistical measure of the proportion of variance in one variable that is explained by the other

variable. Or:

R2= Explained variation / Total variation

R2 is always between 0 and 100% where:

• 0% indicates that the model explains none of the variability of the response data around its

mean and

• 100% indicates that the model explains all the variability of the response data around its


In the example above r = 0.984, so R2 = 0.968. This suggests that jump height accounts for 96.8% of

the variance in explosive leg power.


Pearson’s correlation showed a significant correlation between jump height and leg power (r = 0.984,

p<.001) jump height accounting for 96.8% of the variance in leg power.


If your data is ordinal or is continuous data that has violated the assumptions required for parametric

testing (normality and/or variance) you need to use the non-parametric alternatives to Pearson’s

correlation coefficient.

The alternatives are Spearman’s (rho) or Kendall’s (tau) correlation coefficients. Both are based on

ranking data and are not affected by outliers or normality/variance violations.

Spearman's rho is usually used for ordinal scale data and Kendall's tau is used in small samples or when

many values with the same score (ties). In most cases, Kendall’s tau and Spearman’s rank correlation

coefficients are very similar and thus invariably lead to the same inferences.

The effect sizes are the same as Pearson’s r. The main difference is that rho2 can be used as an

approximate non-parametric coefficient of determination but the same is not true for Kendall’s tau.

From the example data, open Non-parametric correlation.csv which contains 2 columns of data, a

creativity score and position in the ‘World’s biggest liar’ competition (thanks to Andy Field).

Run the analysis as before but now using Spearman and Kendall’s tau-b coefficients instead of


65 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

As can be seen there is a significant correlation between creativity scores and final position in the

‘World’s biggest liar’ competition, the higher the score the better the final competition position.

However, the effect size is only moderate.

66 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


direction i.e. which variable causes the other to change. So it cannot be used to state the one thing

causes the other. Often a significant correlation means absolutely nothing and is purely by chance

especially if you correlate thousands of variables. This can be seen in the following strange


Pedestrians killed in a collision with a railway train correlates with rainfall in Missouri:

Number of honey-producing bee colonies (1000’s) correlates strongly with the marriage rate in

South Carolina (per 1000 marriages)

67 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

REGRESSION Whereas correlation tests for associations between variables, regression is the next step commonly

used for predictive analysis, i.e. to predict a dependent outcome variable from one (simple regression)

or more (multiple regression) independent predictor variables.

Regression results in a hypothetical model of the relationship between the outcome and predictor

variable(s). The model used is a linear one defined by the formula;

y = c + b*x + ε

• y = estimated dependent outcome variable score,

• c = constant,

• b = regression coefficient and

• x = score on the independent predictor variable

• ε = random error component (based on residuals)

Linear regression provides both the constant and regression coefficient(s).

Linear regression makes the following assumptions:

1. Linear relationship: important to check for outliers since linear regression is sensitive to their


2. Independence of variables

3. Multivariate normality: requires all variables to be normally distributed

4. Homoscedasticity: homogeneity of variance of the residuals

5. Minimal multicollinearity /autocorrelation: when the independent variables/residuals are

too highly correlated with each other.

With regard to sample sizes, there are many different ‘rules of thumb’ in the literature ranging from

10-15 data points per predictor in the model i.e. 4 predictor variables will each require between 40

and 60 data points each to 50 +(8 * number of predictors) for each variable. So for 4 variables that

would require 82 data point for each variable. Effectively the bigger your sample size the better your


Most regression analysis will produce the best model available, but how good is it actually and how

much error is in the model?

This can be determined by looking at ‘the goodness of fit’ using the sums of squares. This is a measure

of how close the actual data points are close to the modelled regression line.

68 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The vertical difference between the data points and the predicted regression line is known as the

residuals. These values are squared to remove the negative numbers and then summed to give SSR.

This is effectively the error of the model or the ‘goodness of fit’, obviously the smaller the value the

less error in the model.

The vertical difference between the data points and the mean of the outcome variable can be

calculated. These values are squared to remove the negative numbers and then summed to give the

total sum of the squares SST. This shows how good the mean value is as a model of the outcome


Values above the

line are positive

Values below the

line are negative

The vertical difference between the mean of the outcome variable and the predicted regression line

is now determined. Again these values are squared to remove the negative numbers and then

summed to give the model sum of squares (SSM). This indicates how better the model is compared to

just using the mean of the outcome variable. SST is the total sum of the squares.

So, the larger the SSM the better the model is at predicting the outcome compared to the mean value

alone. If this is accompanied by a small SSR the model also has a small error.

R2 is similar to the coefficient of determination in correlation in that it shows how much of the

variation in the outcome variable can be predicted by the predictor variable(s).

R2 = SSM


In regression, the model is assessed by the F statistic based on the improvement in the prediction of

the model SSM and the residual error SSR. The larger the F value the better the model.

F = Mean SSM

Mean SSR

SIMPLE REGRESSION Regression tests the null hypothesis (Ho) that there will be no significant prediction of the dependent

(outcome) variable by the predictor variable(s).

Open Rugby kick regression.csv. This dataset contains rugby kick data including distance kicked,

right/left leg strength and flexibility and bilateral leg strength.

Firstly go to Descriptives > Descriptive statistics and check the boxplots for any outliers. In this case,

there should be none, though it is good practice to check.

For this simple regression go to Regression > Linear regression and put distance into the Dependent

Variable (outcome) and R_Strength into the Covariates (Predictor) box. Tick the following options in

the Statistics options:


You will now get the following outputs:

Here it can be seen that the correlation (R) between the two variables is high (0.784). The R2 value of

0.614 tells us that right leg strength accounts for 61.4% of the variance in kick distance. Durbin-

Watson checks for correlations between residuals, which can invalidate the test. This should be above

1 and below 3 and ideally around 2.

71 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The ANOVA table shows all the sums of squares mentioned earlier. With regression being the model

and Residual being the error. The F-statistic is significant p=0.002. This tells us that the model is a

significantly better predictor of kicking distance that the mean distance.

Report as F (1, 11) = 17.53, p<.001.

This table gives the coefficients (unstandardized) that can be put into the linear equation.

y = c + b*x

y = estimated dependent outcome variable score,

c = constant (intercept)

b = regression coefficient (R_strength)

x = score on the independent predictor variable

For example for a leg strength of 60 kg the distance kicked can be predicted by the following:

Distance = 57.105 + (6.452 * 60) = 454.6 m

FURTHER ASSUMPTION CHECKS In Plots checks, tick the following two options:

72 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

This will result in two graphs:

This graph shows a balanced random distribution of the residuals around the baseline suggesting that

the assumption of homoscedasticity has not been violated. (See Exploring data integrity in JASP for

further details.

The Q-Q plot shows that the standardized residuals fit nicely along the diagonal suggesting that both

assumptions or normality and linearity have also not been violated.


Linear regression shows that right leg strength can significantly predict kicking distance F (1, 11) =

17.53, p<.001 using the following regression equation:

Distance = 57.105 + (6.452 * Right leg strength)

MULTIPLE REGRESSION The model used is still a linear one defined by the formula;

y = c + b*x + ε

▪ y = estimated dependent outcome variable score,

▪ c = constant,

▪ b = regression coefficient and

▪ x = score on the independent predictor variable

▪ ε = random error component (based on residuals)

However, we now have more than 1 regression coefficient and predictor score i.e.

y = c + b1*x1 + b2*x2 + b3*x3 …….. bn*xn

Data entry methods. If predictors are uncorrelated their order of entry has little effect on the model. In most cases,

predictor variables are correlated to some extent and thus, the order in which the predictors are

entered can make a difference. The different methods are subject to much debate in the area.

Forced entry (Enter): This is the default method in which all the predictors are forced into the model

in the order they appear in the Covariates box. This is considered to be the best method.

Blockwise entry (Hierarchical entry): The researcher, normally based on prior knowledge and previous

studies, decides the order in which the known predictors are entered first depending on their

importance in predicting the outcome. Additional predictors are added in further steps.

Stepwise (Backward entry): All predictors are initially entered in the model and then the contribution

of each is calculated. Predictors with less than a given level of contribution (p<0.1) are removed. This

process repeats until all the predictors are statistically significant.

Stepwise (Forward entry): The predictor with the highest simple correlation with the outcome variable

is entered first. Subsequent predictors selected based on the size of their semi-partial correlation with

the outcome variable. This is repeated until all predictors that contribute significant unique variance

to the model have been included in the model.

Stepwise entry: Same as the Forward method, except that every time a predictor is added to the

model, a removal test is made of the least useful predictor. The model is constantly reassessed to see

whether any redundant predictors can be removed.

There are many reported disadvantages of using stepwise data entry methods, however, Backward

entry methods can be useful for exploring previously unused predictors or for fine-tuning the model

to select the best predictors from the available options.

Open Rugby kick regression.csv that we used for simple regression. Go to Regression > Linear

regression and put distance into the Dependent Variable (outcome) and now add all the other

variables into the Covariates (Predictor) box.

In the Variable section leave the Method as Enter. Tick the following options in the Statistics options,

Estimates, Model fit, Collinearity diagnostics and Durbin-Watson.

75 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


You will now get the following outputs:

This provides information on a model based on the H0 (no predictors) and the alternative H1.

The adjusted R2 (used for multiple predictors) shows that they can predict 68.1% of the outcome

variance. Durbin-Watson checks for correlations between residuals is between 1 and 3 as required.

The ANOVA table shows the F-statistic to be significant p=0.017 suggesting that the model is a

significantly better predictor of kicking distance that the mean distance.

This table shows both the H0 and H1 models and the constant (intercept) and regression coefficients

(unstandardized) for all the predictors forced into the model. Even though the ANOVA shows the

model to be significant none of the predictor regression coefficients is significant!

The collinearity statistics, Tolerance and VIF (Variance Inflation Factor) check the assumption of

multicollinearity. As a rule of thumb if VIF >10 and tolerance <0.1 the assumptions have been greatly

violated. If the average VIF >1 and tolerance <0.2 the model may be biased. In this case, the average

VIF is quite large (around 5).

76 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The casewise diagnostics table is empty! This is good news. This will highlight any cases (rows) that

have residuals which are 3 or more standard deviations away from the mean. These cases with the

largest errors may well be outliers. Too many outliers will have an impact on the model and should be

dealt with in the usual way (see Exploring Data Integrity).

As a comparison re-run the analyses but now choose Backward as the method of data entry.

The outputs are as follows:

JASP has now calculated 4 potential regression models. It can be seen that each consecutive model

increases the adjusted R2, with model 4 accounting for 73.5% of the outcome variance.

The Durbin-Watson score is also higher than with the forced entry method.

The ANOVA table indicates that each successive model is better as shown by the increasing F-value

and improving p-value.

77 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Model 1 is the same as the forced entry method first used. The table shows that as the least

significantly contributing predictors are sequentially removed, we end up with a model with two

significant predictor regression coefficients, right leg strength and bilateral leg strength.

Both tolerance and VIF are acceptable.

We now can report the Backward predictor entry results in a highly significant model F (2, 10) = 17.92,

p<.001 and a regression equation of

Distance = 46.251 + (3.914 * R_Strength) + (2.009 * Bilateral Strength)

TESTING FURTHER ASSUMPTIONS. As for the simple linear regression example, tick the following options.

78 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The balanced distribution of the residuals around the baseline suggests that the assumption of

homoscedasticity has not been violated.

The Q-Q plot shows that the standardized residuals fit along the diagonal suggesting that both

assumptions or normality and linearity have also not been violated.


Multiple linear regression using backward data entry shows that right leg and bilateral strength can

significantly predict kicking distance F(2,10) = 17.92, p<.001 using a regression equation of

Distance = 57.105 + (3.914 * R_Strength) + (2.009 * Bilateral Strength)

R2 provides information on how much variance is explained by the model using the predictors


F-statistic provides information as to how good the model is.

The unstandardized (b)-value provides a constant which reflects the strength of the relationship

between the predictor(s) and the outcome variable.

Violation of assumptions can be checked using Durbin-Watson value, tolerance/VIF values, Residual

vs predicted and Q-Q plots.

80 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

LOGISTIC REGRESSION In simple and multiple linear regression outcome and predictor variable(s) were continuous data.

What if the outcome was a binary/categorical measure? Can, for example, a yes or no outcome be

predicted by other categorical or continuous variables? The answer is yes if binary logistic regression

is used. This method is used to predict the probability of a binary yes or no outcome.

The null hypothesis tested is that there is no relationship between the outcome and the predictor


As can be seen in the graph below, a linear regression line between the yes and no responses would

be meaningless as a prediction model. Instead, a sigmoidal logistic regression curve is fitted with a

minimum of 0 and a maximum of 1. It can be seen that some predictor values overlap between yes

and no. For example, a prediction value of 5 would give an equal 50% probability of being a yes or no

outcome. Thresholds are therefore calculated to determine if a predictor data value will be classified

as a yes or no outcome.


• The dependent variable must be binary i.e. yes or no, male or female, good or bad.

• One or more independent (predictor variables) which can be continuous or categorical


• A linear relationship between any continuous independent variables and the logit

transformation (natural log of the odds that the outcome equals one of the categories) of the

dependent variable.


AIC (Akaike Information Criteria) and BIC (Bayesian Information Criteria) are measures of fit for the

model, the best model will have the lowest AIC and BIC values.

Outcome = No

Outcome = Yes

P ro

b a

b il

it y

o f

o u

tc o

m e

= Y

e s

81 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Four pseudo R2 values are calculated in JASP, McFadden, Nagelkerke, Tjur and Cox & Snell. These are

analogous to R2 in linear regression and all give different values. What constitutes a good R2 value

varies, however, they are useful when comparing different models for the same data. The model with

the largest R2 statistic is considered to be the best.

The Wald test is used to determine statistical significance for each of the independent variables.

The confusion matrix is a table showing actual vs predicted outcomes and can be used to determine

the accuracy of the model. From this sensitivity and specificity can be derived.

Sensitivity is the percentage of cases that had the observed outcome was correctly predicted by the

model (i.e., true positives).

Specificity is the percentage of observations that were also correctly predicted as not having the

observed outcome (i.e., true negatives).


Open Heart attack.csv in JASP. This contains 4 columns of data, Patient ID, did they have a second

heart attack (yes/no), whether they were prescribed exercise (yes/no) and their stress levels (high

value = high stress).

Put the outcome variable (2nd heart attack) into the Dependent variable, add the stress levels to

Covariates and Exercise prescription to Factors. Leave the data entry method as Enter.

In the Statistics options tick Estimates, Odds ratios, Confusion matrix, Sensitivity and Specificity.

82 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The initial output should comprise of 4 tables.

The model summary shows that H1 (with the lowest AIC and BIC scores) suggests a significant

relationship (X2(37) =21.257, p<.001) between the outcome (2nd heart attack) and the predictor

variables (exercise prescription and stress levels).

McFadden's R2 = 0.383. It is suggested that a range from 0.2 to 0.4 indicates a good model fit.

83 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Both stress level and exercise prescription are significant predictor variables (p=.031 and .022

respectively). The most important values in the coefficients table are the odds ratios. For the

continuous predictor, an odds ratio of greater than 1 suggests a positive relationship while < 1 implies

a negative relationship. This suggests that high-stress levels are significantly related to an increased

probability of having a second heart attack. Having an exercise intervention is related to a significantly

reduced probability of a second heart attack. The odds ratio of 0.13 can be interpreted as only having

a 13% probability of a 2nd heart attack if undergoing an exercise intervention.

The confusion matrix shows that the 15 true negative and positive cases were predicted by the model

while the error, false negatives and positives, were found in 5 cases. This is confirmed in the

Performance metrics where both sensitivity (% of cases that had the outcome correctly predicted) and

specificity (% of cases correctly predicted as not having the outcome (i.e., true negatives) are both



These findings can be easily visualised through the inferential plots.

84 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

As stress levels increase the probability of having a second heart attack increases.

No exercise intervention increases the probability of a 2nd heart attack while it is reduced when it had

been put in place.


Logistic regression was performed to ascertain the effects of stress and exercise intervention on the

likelihood that participants have a 2nd heart attack. The logistic regression model was statistically

significant, χ2 (37) = 21.257, p < .001. The model correctly classified 75.0% of cases. Increasing stress

was associated with an increased likelihood of a 2nd heart attack, but decreasing stress was associated

with a reduction in the likelihood. The presence of an exercise intervention programme reduced the

probability of a 2nd heart attack to 13%.

Whereas t-tests compare the means of two groups/conditions, one-way analysis of variance (ANOVA) compares the means of 3 or more groups/conditions. There are both independent and repeated

measures ANOVAs available in JASP. ANOVA has been described as an ‘omnibus test’ which results in

an F-statistic that compares whether the datasets overall explained variance is significantly greater

than the unexplained variance. The null hypothesis tested is that there is no significant difference

between the means of all the groups. If the null hypothesis is rejected, ANOVA just states that there

is a significant difference between the groups but not where those differences occur. To determine

where the group differences are, post hoc (From the Latin post hoc, "after this") tests are subsequently


Why not just multiple pairwise comparisons? If there are 4 groups (A, B, C, D) for example and the

differences were compared using multiple t-tests:

• A vs. B P<0.05 95% no type I error

• A vs. C P<0.05 95% no type I error

• A vs. D P<0.05 95% no type I error

• B vs. C P<0.05 95% no type I error

• B vs. D P<0.05 95% no type I error

• C vs. D P<0.05 95% no type I error

Assuming that each test was independent, the overall probability would be:

0.95 * 0.95 * 0.95 * 0.95 * 0.95 * 0.95 = 0.735

This is known as familywise error or, cumulative Type I error, and in this case results in only a 73.5%

probability of no Type I error whereby the null hypothesis could be rejected when it is true. This is

overcome by using post hoc tests that make multiple pairwise comparisons with stricter acceptance

criteria to prevent familywise error.


The independent ANOVA makes the same assumptions as most other parametric tests.

• The independent variable must be categorical and the dependent variable must be


• The groups should be independent of each other.

• The dependent variable should be approximately normally distributed.

• There should be no significant outliers.

• There should be homogeneity of variance between the groups otherwise the p-value for the

F-statistic may not be reliable.

The first 2 assumptions are usually controlled through the use of appropriate research method design.

If the last three assumptions are violated then the non-parametric equivalent, Kruskal-Wallis should

be considered instead.

Contrasts are ‘a priori’ tests (i.e. planned comparisons before any data were collected). As an

example, researchers may want to compare the effects of some new drugs to the currently

prescribed one. These should only be a small set of comparisons in an attempt to reduce

familywise error. The choice must be based on the scientific questions being asked, and

chosen during the experimental design. Hence the term planned comparisons. Therefore

they are looking at specified mean differences and therefore can be used if the ANOVA F test

is insignificant.

JASP provides 6 planned contrast enabling different types of comparisons:

Deviation: the mean of each level of the independent variable is compared to the overall

mean (the mean when all the levels are taken together).

Simple: the mean of each level is compared to the mean of a specified level, for example with

the mean of the control group.

Difference: the mean of each level is compared to the mean of the previous levels.

Helmert: the mean of each level is compared to the mean of the subsequent levels.

Repeated: By selecting this contrast, the mean of each level is compared to the mean of the

following level.2

Polynomial: tests polynomial trends in the data.


Post hoc tests are tests that were decided upon after the data have been collected. They can

only be carried out if the ANOVA F test is significant.

JASP provides 5 alternatives for use with the independent group ANOVA tests:

Bonferroni – can be very conservative but gives guaranteed control over Type I error at the risk of

reducing statistical power. Does not assume independence of the comparisons.

Holm – the Holm-Bonferroni test which is a sequential Bonferroni method that is less conservative

than the original Bonferroni test.

Tukey – one of the most commonly used tests and provides controlled Type I error for groups with

the same sample size and equal group variance.

Scheffe – controls for the overall confidence level when the group sample sizes are different.

Sidak – similar to Bonferroni but assumes that each comparison is independent of the others. Slightly

more powerful than Bonferroni.

87 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

JASP also provides 4 Types of post hoc

Standard – as above

Games-Howell – used when you are unsure about the equality of group variances

Dunnett’s – used to compare all the groups to one group i.e. the control group

Dunn – a non-parametric post hoc test used for testing small sub-sets of pairs.


JASP provides 3 alternative effect size calculations for use with the independent group ANOVA tests:

Eta squared (η2) - accurate for the sample variance explained but overestimates the population

variance. This can make it difficult to compare the effect of a single variable in different studies.

Partial Eta squared (ηp2) – this solves the problem relating to population variance overestimation

allowing for comparison of the effect of the same variable in different studies.

Omega squared (ω2) – Normally, statistical bias gets very small as sample size increases, but for small

samples (n<30) ω2 provides an unbiased effect size measure.

Test Measure Trivial Small Medium Large


Omega squared














Load Independent ANOVA diets.csv. This contains A column containing the 3 diets used (A, B and C)

and another column containing the absolute amount of weight loss after 8 weeks on one of 3 different

diets For good practice check the descriptive statistics and the boxplots for any extreme outliers.

Go to ANOVA > ANOVA, put weight loss into the Dependent Variable and the Diet groupings into the

Fixed Factors box. In the first instance tick Descriptive statistics and ω2 as the effect size;

88 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

In Assumptions checks, tick all options:

This should result in 3 tables and one Q-Q plot.


The main ANOVA table shows that the F-statistic is significant (p<.001) and that there is a large effect

size. Therefore, there is a significant difference between the means of the 3 diet groups.

89 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


Before accepting this any violations in the assumptions required for an ANOVA should be checked.

Levene’s test shows that homogeneity of variance is not significant. However, if Levene’s test shows

a significant difference in variance, the Brown-Forsythe or Welch correction should be reported.

The Q-Q plot shows that the data appear to be normally distributed and linear.

The descriptive statistics suggest that Diet 3 results in the highest weight loss after 8 weeks.

90 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


If for example, one planned to compare the effects of diets B and C to diet A. Click on the drop-

down menu and select ‘simple’ next to diet. This will test the significance between the first category

in the list with the remaining categories.

As can be seen, only diet C is significantly different from diet A (t(69) = 4.326, p<.001.

If the ANOVA reports no significant difference you can go no further in the analysis.


If the ANOVA is significant post hoc testing can now be carried out. In Post Hoc Tests add Diet to the

analysis box on the right, tick Standard type, use Tukey for the post hoc correction and tick flag

significant comparisons.

91 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Also in Descriptive Plots add the Factor Diet to the horizontal axis and tick display error bars.

Post hoc testing shows that there is no significant difference between weight loss on diets A and B.

However, It is significantly higher in diet C compared to diet A (p<.001) and diet B (p=.001). Cohen’s d

shows that these differences have a large effect size.


Independent one way ANOVA showed a significant effect of the type of diet on weight loss after 10

weeks (F (2, 69) =46.184, p<.001, ω2 = 0.214.

Post hoc testing using Tukey’s correction revealed that diet C resulted in significantly greater weight

loss than diet A (p<.001) or diet B (p=.001). There were no significant differences in weight loss

between diets A and B (p=.777).

If your data fails parametric assumption tests or is nominal, the Kruskal-Wallis H test is a non-

parametric equivalent to the independent samples ANOVA. It can be used for comparing two or more

independent samples of equal or different sample sizes. Like the Mann-Whitney and Wilcoxon’s tests,

it is a rank-based test.

As with the ANOVA, Kruskal-Wallis H test (also known as the "one-way ANOVA on ranks") is an

omnibus test which does not specify which specific groups of the independent variable are statistically

significantly different from each other. To do this, JASP provides the option for running Dunn’s post

hoc test. This multiple comparisons test can be very conservative in particular for large numbers of


Load Kruskal-Wallis ANOVA.csv dataset into JASP. This dataset contains subjective pain scores for

participants undergoing no treatment (control), cryotherapy or combined cryotherapy-compression

for delayed onset muscle soreness after exercise.


Go to ANOVA >ANOVA. In the analysis window add Pain score to the dependent variable and

treatment to the fixed factors. Check that the pain score is set to ordinal. This will automatically run

the normal independent ANOVA. Under Assumption Checks tick both Homogeneity tests and Q-Q


93 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Although the ANOVA indicates a significant result, the data has not met the assumptions of

homogeneity of variance as seen by the significant Levene’s test and only shows linearity in the middle

of the Q-Q plot and curves off at the extremities indicating more extreme values. Added to the fact

that the dependent variable is based on subjective pain scores suggest the use of a non-parametric


Return to the statistics options and open the Nonparametrics option at the bottom. For the Kruskal-

Wallis test Move the Treatment variable to the box on the right. In Post Hoc tests move treatment to

the right box and tick Dunn’s post hoc test and flag significant comparisons.


Two tables are shown in the output. The Kruskal-Wallis test shows that there is a significant difference

between the three treatment modalities.

94 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The Dunn’s post hoc test provides its own p-value as well as those for Bonferroni and Holm’s

Bonferroni correction. As can be seen, both treatment conditions are significantly different from the

controls but not from each other.


Pain scores were significantly affected by treatment modality H (2) = 19.693, p<.001. Pairwise

comparisons showed that both cryotherapy and cryotherapy with compression significantly reduces

pain scores (p=.002 and p<.001 respectively) compared to the control group. There were no significant

differences between cryotherapy and cryotherapy with compression (p=.102).

The one-way repeated measures ANOVA (RMANOVA) is used to assess if there is a difference in means between 3 or more groups (where the participants are the same in each group) that have been

tested multiple times or under different conditions. Such a research design, for example, could be that

the same participants were tested for an outcome measure at 1, 2 and 3 weeks or that the outcome

was tested under conditions 1, 2 and 3.

The null hypothesis tested is that there is no significant difference between the means of the

differences between all the groups.

The independent variable should be categorical and the dependent variable needs to be a continuous

measure. In this analysis the independent categories are termed levels i.e. these are the related

groups. So in the case where an outcome was measured at weeks 1, 2 and 3, the 3 levels would be

week 1, week 2 and week 3.

The F-statistic is calculated by dividing the mean squares for the variable (variance explained by the

model) by its error mean squares (unexplained variance). The larger the F-statistic, the more likely it

is that the independent variable will have had a significant effect on the dependent variable.


The RMANOVA makes the same assumptions as most other parametric tests.

• The dependent variable should be approximately normally distributed.

• There should be no significant outliers.

• Sphericity, which relates to the equality of the variances of the differences between levels of

the repeated measures factor.

If the assumptions are violated then the non-parametric equivalent, Friedman’s test should be

considered instead and is described later in this section.


If a study has 3 levels (A, B and C) sphericity assumes the following:

Variance (A-B) ≈ Variance (A-C) ≈ Variance (B-C)

RMANOVA checks the assumption of sphericity using Mauchly’s (pronounced Mockley’s) test of

sphericity. This tests the null hypothesis that the variances of the differences are equal. In many

cases, repeated measures violate the assumption of sphericity which can lead to Type I error. If this is

the case corrections to the F-statistic can be applied.

JASP offers two methods of correcting the F-statistic, the Greenhouse-Geisser and the Huynh-Feldt

epsilon (ε) corrections. A general rule of thumb is that if the ε values are <0.75 then use the

Greenhouse-Geisser correction and if they are >0.75 then use the Huynh-Feldt correction.

Post hoc testing is limited in RMANOVA, JASP provides two alternatives:

Bonferroni – can be very conservative but gives guaranteed control over Type I error at the risk of

reducing statistical power.

Holm – the Holm-Bonferroni test which is a sequential Bonferroni method that is less conservative

than the original Bonferroni test.

If you ask for either Tukey or Scheffe post hoc corrections JASP will return a NaN (not a number) error.


JASP provides the same alternative effect size calculations that are used with the independent group

ANOVA tests:

Eta squared (η2) - accurate for the sample variance explained but overestimates the population

variance. This can make it difficult to compare the effect of a single variable in different studies.

Partial Eta squared (ηp2) – this solves the problem relating to population variance overestimation

allowing for comparison of the effect of the same variable in different studies. This appears to be the

most commonly reported effect size in repeated measures ANOVA

Omega squared (ω2) – Normally, statistical bias gets very small as sample size increases, but for small

samples (n<30) ω2 provides an unbiased effect size measure.

Levels of effect size:

Test Measure Trivial Small Medium Large


Partial Eta

Omega squared














Load Repeated ANOVA cholesterol.csv. This contains one column with the participant IDs and 3

columns one for each repeated measurement of blood cholesterol following an intervention. For good

practice check the descriptive statistics and the boxplots for any extreme outliers.

Go to ANOVA > Repeated measures ANOVA. As stated above, the independent variable (repeated

measures factor) has levels, in this case, there are 3 levels. Rename RM Factor 1 to Time post-

intervention and then rename 3 levels to Week 0, week 3 and week 6 accordingly.

97 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Once these have been done they will appear in the Repeated Measures Cells. Now add the appropriate

data to the appropriate level.

Tick Descriptive Statistics, Estimates of effect size and ω2.

Under Assumption Checks tick Sphericity tests and all Sphericity correction options.

The output should consist of 4 tables. The third table, between-subject effects, can be ignored for this


The within-subjects effects table reports a large F-statistic which is highly significant (p<.001) and has

a small to medium effect size (0.058). This table shows the statistics for sphericity assumed (none) and

the two correction methods. The main differences are in the degrees of freedom (df) and the value of

the mean square. Under the table, it is noted that the assumption of sphericity has been violated.

The following table gives the results of Mauchly’s test of sphericity. It can be seen that there is a

significant difference (p<.001) in the variances of the differences between the groups. Greenhouse-

Geisser and the Huynh-Feldt epsilon (ε) values are below 0.75. Therefore the ANOVA result should be

reported based on the Greenhouse-Geisser correction:

To provide a cleaner table, go back to Assumption Checks and only tick Greenhouse-Geisser for

sphericity correction.

There is a significant difference between the means of the differences between all the groups F (1.235,

21.0) =212.3, p<.001, ω2 = 0.058.

99 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The descriptive data suggest that blood cholesterol levels were higher at week 0 compared to weeks

3 and 6.

However, if the ANOVA reports no significant difference you can go no further in the



If the ANOVA is significant, post hoc testing can now be carried out. In Post Hoc Tests add Time post-

intervention to the analysis box on the right, tick Effect size and, in this case, use Holm for the post

hoc correction.

Also in Descriptive Plots add the Factor – Time post-intervention to the horizontal axis and tick display

error bars.

Post hoc testing shows that there are significant differences in blood cholesterol levels between all of

the time point combinations and are associated with large effect sizes.

Since Mauchly’s test of sphericity was significant, the Greenhouse-Geisser correction was used. This

showed that cholesterol levels differed significantly between F (1.235, 21.0) =212.3, p<.001, ω2 =


Post hoc testing using the Bonferroni correction revealed that cholesterol levels decreased

significantly as time increased, weeks 0 – 3 (mean difference=0.566 units, p<.001) and weeks 3 – 6

(mean difference = 0.063 units, p=.004).

FRIEDMAN’S REPEATED MEASURES ANOVA If parametric assumptions are violated or the data is ordinal you should consider using the non-

parametric alternative, Friedman’s test. Similar to the Kruskal-Wallis test, the Friedman’s test is used

for one-way repeated measures analysis of variance by ranks and doesn’t assume the data comes from

a particular distribution. This test is another omnibus test which does not specify which specific groups

of the independent variable are statistically significantly different from each other. To do this, JASP

provides the option for running Conover’s post hoc test if Friedman’s test is significant.

101 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Load Friedman RMANOVA.csv into JASP. This has 3 columns of subjective pain ratings measured at

18, 36 and 48 hours post-exercise. Check that the pain scores are set to ordinal data.


Go to ANOVA > Repeated measures ANOVA. The independent variable (repeated measures factor)

has 3 levels. Rename RM Factor 1 to Time and then rename 3 levels to 18 hours, 36 hours and w48

hours accordingly.

Once these have been done they will appear in the Repeated Measures Cells. Now add the appropriate

dataset to the appropriate level.

This will automatically produce the standard repeated measures within-subjects ANOVA table. To run

the Friedman’s test, expand the Nonparametrics tab, move Time to the RM factor box and tick

Conover’s post hoc tests.

102 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


Two tables should be produced.

Friedman’s test shows that time has a significant effect on pain perception. Connor’s post hoc pairwise

comparisons show that all pain perception is significantly different between each time point.


Time has a significant effect on subjective pain scores χ2 (2) = 26.77, p<.001. Pairwise comparisons

showed that pain perception is significantly different between each time point (all p<0.001).

ANOVA can be used to compare the means of one variable (dependent) in two or more groups,

whereas analysis of covariance (ANCOVA) sits between ANOVA and regression and compares the means of one (dependent) variable in two or more groups while taking into account the variability of

other continuous variables (COVARIATES). ANCOVA checks for differences in ‘adjusted’ means (i.e.

adjusted for the effects of the covariate). A covariate may not usually be part of the main research

question but could influence the dependent variable and therefore needs to be adjusted or controlled

for. As long as a good covariate is used ANCOVA will have improved statistical power and control over


Control for – to subtract statistically the effects of a variable (a control variable) to see what a

relationship would be without it (Vogt 1977).

Hold constant – to “subtract” the effects of a variable from a complex relationship to study what the

relationship would be if the variable were, in fact, a constant. Holding a variable constant essentially

means assigning it an average value (Vogt 1977).

Statistical control – using statistical techniques to isolate or “subtract” variance in the dependent

variable attributable to variables that are not the subject of the study (Vogt, 1999).

For example, when looking for a difference in weight loss between three diets it would be appropriate

to take into account the individuals pre-trial bodyweight since heavier people may lose

proportionately more weight.

Type of diet


Weight loss

Starting body weight








The null hypothesis tested is that there is no significant difference between the ‘adjusted’ means of

all the groups.


ANCOVA makes the same assumptions as the independent ANOVA makes. However, there are two

further assumptions:

• The relationship between the dependent and covariate variables are linear.

• Homogeneity of regression i.e. the regression lines for each of the independent groups are

parallel to each other.


JASP provides 4 alternatives for use with the independent group ANOVA tests:

Bonferroni – can be very conservative but gives guaranteed control over Type I error at the risk of

reducing statistical power.

Holm – the Holm-Bonferroni test which is a sequential Bonferroni method that is less conservative

than the original Bonferroni test.

Tukey – one of the most commonly used tests and provides controlled Type I error for groups with

the same sample size and equal group variance.

Scheffe – controls for the overall confidence level when the group sample sizes are different.

JASP also provides 4 Types

Standard – as above

Games-Howell – used when you are unsure about the equality of group variances

Dunnett’s – used to compare all the groups to one group i.e. the control group

Dunn – a non-parametric post hoc test used for testing small sub-sets of pairs.


D e

p e

n d

e n

t v

a ri

a b


Diet 1

Diet 2

Diet 3


D e

p e

n d

e n

t v

a ri

a b


Diet 1

Diet 2

Diet 3

Homogeneity of regression Assumption violated

Homogeneity of regression

JASP provides 3 alternative effect size calculations for use with the independent group ANOVA tests:

Eta squared (η2) - accurate for the sample variance explained but overestimates the population

variance. This can make it difficult to compare the effect of a single variable in different studies.

Partial Eta squared (ηp2) – this solves the problem relating to population variance overestimation

allowing for comparison of the effect of the same variable in different studies.

Omega squared (ω2) – Normally, statistical bias gets very small as sample size increases, but for small

samples (n<30) ω2 provides an unbiased effect size measure.

Test Measure Trivial Small Medium Large

ANOVA Eta Partial Eta Omega squared

<0.1 <0.01 <0.01

0.1 0.01 0.01

0.25 0.06 0.06

0.37 0.14 0.14


Load ANCOVA hangover.csv. This dataset has been adapted from the one provided by Andy Field

(2017). The morning after a Fresher’s ball students were given either water, coffee or a Barocca to

drink. Two hours later they reported how well they felt (from 0 – awful to 10 –very well). At the same

time, data were collected on how drunk they were the night before (0-10).

Initially, run an ANOVA with wellness as the dependent variable and the type of drink as the fixed


As can be seen from the results, homogeneity of variance has not been violated while the ANOVA

shows that there is no significant difference in the wellness scores between any of the morning drinks.

F(2,27)=1.714, p=.199. However, this may be related to how drunk the students were the night before!

106 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Go to ANOVA > ANCOVA, put wellness as the dependent variable and the type of drink as the fixed

factor. Now add drunkenness to the Covariate(s) box. In the first instance tick Descriptive statistics

and ω2 as the effect size;

In Assumption Checks tick both options

In Marginal Means move drink to the right

107 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

This should result in 4 tables and one Q-Q plot.


It can be seen that the covariate (drunkenness) significantly predicts wellness (p<.001). The effects of

the type of drink on wellness, when adjusted for the effects of drunkenness are now significant


It can be seen that Levene’s test is significant, unlike in ANOVA, no homogeneity of variance

corrections (i.e. Welch) are provided. For ANCOVA this can be ignored. The Q-Q plot appears to be


The descriptive statistics show the unadjusted means for wellness in the three drink groups.

108 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The marginal means are now the wellness means having been adjusted for the effects of the covariate



As previously mentioned the assumption of homogeneity of regression is important in ANCOVA. This

can be tested by looking at the interaction between the type of drink and the drunkenness scores. Go

to Model, drink and drunkenness will have been automatically added as individual Model terms. Now

highlight both drink and drunkenness and add them both to Model terms.

The ANOVA table now has an extra row showing the interaction between the type of drink and

drunkenness. This is not significant (p=.885), i.e. the relationships between drunkenness and wellness

are the same in each drink group. If this is significant there will be concerns over the validity of the

main ANCOVA analysis.

Having checked this, go back and remove the interaction term from the Model terms.

If the ANCOVA reports no significant difference you can go no further in the analysis.

109 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


If the ANCOVA is significant post hoc testing can now be carried out. In Post Hoc Tests add Drink to

the analysis box on the right, tick Effect size and, in this case, use Tukey for the post hoc correction.

Also, tick flag significant comparisons.

Post hoc testing shows that there is no significant difference between coffee and water on wellness.

However, wellness scores were significantly higher after drinking a Barocca.

This can be seen from the Descriptive plots.

110 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The covariate, drunkenness, was significantly related to the morning after wellness, F (1,26) = 33.03,

p<.001, ω2 = 0.427. There was also a significant effect of drink on wellness after controlling for

drunkenness, F (2, 26) = 7.47, p=.003, ω2 = 0.173.

Post hoc testing using Tukey’s correction revealed that drinking a Barocca resulted in significantly

greater wellness compared to water (p<.004) or coffee (p=.01). There were no significant differences

in wellness between water and coffee (p=.973).

TWO-WAY INDEPENDENT ANOVA One-way ANOVA tests situations when only one independent variable is manipulated, two-way

ANOVA is used when more than 1 independent variable has been manipulated. In this case,

independent variables are known as factors.


CONDITION 1 Group 1 Dependent variable Group 2 Dependent variable

CONDITION 2 Group 1 Dependent variable Group 2 Dependent variable

CONDITION 3 Group 1 Dependent variable Group 2 Dependent variable

The factors are split into levels, therefore, in this case, Factor 1 has 3 levels and Factor 2 has 2 levels.

A “main effect” is the effect of one of the independent variables on the dependent variable, ignoring

the effects of any other independent variables. There are 2 main effects tested both of which are

“between-subjects”: in this case comparing differences between factor 1 (i.e. condition) and

differences between factor 2 (i.e. groups). Interaction is where one factor influences the other factor.

The two-way independent ANOVA is another omnibus test that is used to test 2 null hypotheses:

1. There is no significant between-subject effect i.e. no significant difference between the

means of the groups in either of the factors.

2. There is no significant interaction effect i.e. no significant group differences across



Like all other parametric tests, mixed factor ANOVA makes a series of assumptions which should either

be addressed in the research design or can the tested for.

• The independent variables (factors) should have at least two categorical independent groups


• The dependent variable should be continuous and approximately normally distributed for all

combinations of factors.

• There should be homogeneity of variance for each of the combination of factors.

• There should be no significant outliers.


Open 2-way independent ANOVA.csv in JASP. This comprises on 3 columns of data, Factor 1 – gender

with 2 levels (male and female), Factor 2 - supplement with 3 levels (control, carbohydrate CHO and

protein) and the dependent variable (explosive jump power. In Descriptive statistics check the data

for significant outliers. Go to ANOVA >ANOVA, add Jump power to the Dependent variable, Gender

and Supplement to the Fixed factors.

112 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Tick Descriptive statistics and Estimates of effect size (ω2).

In Descriptive plots, add supplement to the horizontal axis and Gender to separate lines. In Additional


113 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The output should comprise 2 tables and one plot.

The ANOVA table shows that there are significant main effects for both Gender and Supplement

(p=0.003 and p<.001 respectively) with medium and large effect sizes respectively. This suggests that

there is a significant difference in jump power between genders, irrespective of Supplement, and

significant differences between supplements, irrespective of Gender.

There is also a significant interaction between Gender and Supplement (p<.001) which also has a

medium to large effect size (0.138). This suggests that the differences in jump power between genders

are affected somehow by the type of supplement used.

The Descriptive statistics and plot suggest that the main differences are between genders when using

a protein supplement.

114 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


In Assumption Checks, tick Homogeneity tests and Q-Q plot of residuals.

Levene’s test shows no significant difference in variance within the dependent variable groups, thus

homogeneity of variance has not been violated.

The Q-Q plot shows that the data appear to be normally distributed and linear. We can now accept

the ANOVA result since none of these assumptions has been violated.

However, if the ANOVA reports no significant difference you can go no further with the



Go to the analysis options and Simple Main Effects. Here add Gender to the Simple effect

factor and Supplement to the Moderator Factor 1. Simple main effects are effectively limited

pairwise comparisons.

115 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

This table shows that there are no gender differences in jump power between the control or

CHO groups (p=.116 and p=0.058 respectively). However, there is a significant difference

(p<.001) in jump power between genders in the protein supplement group.


There are two ways of testing for a difference between (combinations of) cells: post hoc tests

and contrast analysis. JASP has a range of different contrast tests available, including custom

contrasts. For example, we can contrast the three different supplements. Open up the

Contrasts menu and next to Supplement click on the drop-down menu and select custom.

This will add another series of options to this window.

116 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

In this window contrasts can be added, in this case, there are three contrasts which can be


This will result in the following table:

Comparing this table to the post hoc analysis below the estimates of the differences in

marginal means are the same, as well as their standard errors and t-statistics. However, both

the p-values and confidence intervals vary: the corrected p-values are typically higher, and

the confidence intervals wider, for the post hoc analysis.

CHO vs Protein

CHO vs Control

Control vs Protein

117 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


If the ANOVA is significant post hoc testing can now be carried out. In Post Hoc Tests add Supplement

and the Gender*Supplement to the analysis box on the right, tick Effect size and, in this case, use

Tukey for the post hoc correction. Also for ease of viewing tick Flag significant comparisons.

Post hoc testing is not done for Gender since there are only 2 levels.

Post hoc testing shows no significant difference between the control and CHO, supplement group,

irrespective of Gender, but significant differences between Control and Protein (p<.001) and between

CHO and Protein (p<.001).

The post hoc comparisons for the interactions decomposes the results further.


A two-way ANOVA was used to examine the effect of gender and supplement type on explosive jump

power. There were significant main effects for both gender (F (1, 42) = 9.59, p=.003, ω2 = 0.058) and

118 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Supplement (F (2, 42) = 30.07, p<.001, ω2 = 0.477). There was a statistically significant interaction

between the effects of gender and supplement on explosive jump power (F (2, 42) = 11.1, p<.001, ω2

= 0.138).

Tukey’s post hoc correction showed that explosive leg power was significantly higher in the protein

group compared to the control or CHO groups (t=-1.919, p<.001 and t=-1.782, p<.001 respectively).

Simple main effects showed that jump power was significantly higher in males on a protein

supplement compared to females (F (1) =28.06, p<.001).

TWO-WAY REPEATED MEASURES ANOVA Two-Way Repeated Measures ANOVA means that there are two factors in the experiment, for

example, different treatments and different conditions. "Repeated-measures" means that the same

subject received more than one treatment and/or more than one condition.

Independent variable (Factor 2)

Independent variable (Factor 1) = time

Participant Time 1 Time 2 Time 3

Condition 1 1 Dependent variable

Dependent variable

Dependent variable

2 Dependent variable

Dependent variable

Dependent variable

3 Dependent variable

Dependent variable

Dependent variable

Condition 2 1 Dependent variable

Dependent variable

Dependent variable

2 Dependent variable

Dependent variable

Dependent variable

3 Dependent variable

Dependent variable

Dependent variable

The factors are split into levels, therefore, in this case, Factor 1 has 3 repeated levels and Factor 2 has

2 repeated levels.

A “main effect” is the effect of one of the independent variables on the dependent variable, ignoring

the effects of any other independent variables. There are 2 main effects tested both of which are

“between-subjects”: in this case comparing differences between factor 1 (i.e. condition) and

differences between factor 2 (i.e. groups). Interaction is where one factor influences the other factor.

The two-way repeated ANOVA is another omnibus test that is used to test the following main effect

null hypotheses:

H01: the dependent variable scores are the same for each level in factor 1 (ignoring factor 2).

H02: the dependent variable scores are the same for each level in factor 2 (ignoring factor 1).

The null hypothesis for the interaction between the two factors is:

H03: the two factors are independent or that interaction effect is not present.


Like all other parametric tests, two-way repeated ANOVA makes a series of assumptions which should

either be addressed in the research design or can the tested for.

• The independent variables (factors) should have at least two categorical related groups


• The dependent variable should be continuous and approximately normally distributed for all

combinations of factors.

120 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

• Sphericity i.e. the variances of the differences between all combinations of related groups

must be equal.

• There should be no significant outliers.


Open 2-way repeated ANOVA.csv in JASP. This comprises of 4 columns of data (“sit and reach”

flexibility scores for two factors, Factor 1 with 2 levels (stretch and no stretch) and Factor 2 with 2

levels (warm-up and no warm-up). In Descriptive statistics check the data for significant outliers. Go

to ANOVA > Repeated Measures ANOVA. Firstly each Factor and its levels should be defined, for RM

Factor 1 – define this as Stretching and its levels as stretch and no stretch. Then define RM Factor 2 as

Warm-up and its levels as warm-up and no warm-up. Then add the appropriate column of data to the

assigned repeated measures cells.

Also, tick Descriptive statistics and estimates of effect size - ω2.

In Descriptive plots add the Stretching factor to the horizontal axis and Warm-up factor to

separate lines. Tick the display error bars option.

The output should comprise 3 tables and one plot. The Between-Subjects Effects table can be ignored

in this analysis.

The ANOVA within-subjects effects table shows that there are significant main effects for both stretch

(p<.001) and warm-up (p<.001) on sit and reach distance. Both of these are associated with large effect

sizes. There is also a significant interaction between stretch and warm-up (p<.001), this suggests that

that the effects of performing a stretch on sit and reach distance are different depending on whether

or not a warm-up had been performed. These findings can be seen in both the descriptive data and


122 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


In this case, there are no assumption checks. Sphericity can only be tested when there are at least three levels and homogeneity requires at least two unrelated data sets. If a factor has more than 2 levels Mauchly’s test of Sphericity should also be run and the appropriate corrected F value used if necessary (See Repeated Measures ANOVA for details).

However, if the ANOVA reports no significant difference you can go no further with the


123 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


Now go to the analysis options and Simple Main Effects. Here add Warm up to the Simple effect factor

and Stretch to the Moderator Factor 1. Simple main effects are effectively pairwise comparisons.

This table shows that when moderating for warm-up there is a significant difference (p<.001) in sit

and reach performance when a stretch was also carried out but not without a stretch (p=.072).

124 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

We can now moderate for stretch by changing the Simple Main Effects to use Stretch as the simple

effect factor and warm up as the moderator factor. We can also replot the descriptives with a warm-

up on the horizontal axis and stretch as separate lines.

In this case, when controlling for Stretch there were significant differences between both warm-up

and no warm-up.

Both of these simple main effects can be visualised in their descriptive plots.


If the ANOVA is significant post hoc testing can now be carried out. In Post Hoc Tests add stretch,

warm-up and the Stretching*warm-up interaction to the analysis box on the right, tick Effect size and,

in this case, use Holm for the post hoc correction. Tick Flag significant comparisons.

125 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Post hoc testing for the main effects confirms that there are significant differences in sit and reach

distance when comparing the two levels of each factor. This is further decomposed in the Post hoc

comparisons for the interaction.

126 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


A two-way ANOVA was used to examine the effect of stretch and warm-up type on sit and teach

performance. There were significant main effects for stretch (F (1, 11) = 123.4, p<.001, ω2 = 0.647) and

warm-up (F (1, 11) = 68.69, p<.001, ω2 = 0.404). There was a statistically significant interaction

between the effects of stretch and warm up on sit and reach performance (F (1, 11) = 29.64, p<.001,

ω2 = 0.215).

Simple main effects showed that sit and reach performance was significantly higher when both a

stretch and warm-up had been done (F (1) =112.6, p<.001).

127 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

repeated measures ANOVA involving more than 1 independent variable (known as factors).

The factors are split into levels, therefore, in this case, Factor 1 has 3 levels and Factor 2 has

2 levels. This results in 6 possible combinations.

A “main effect” is the effect of one of the independent variables on the dependent variable,

ignoring the effects of any other independent variables. There are 2 main effects tested: in

this case comparing data across factor 1 (i.e. time) is known as the “within-subjects” factor

while comparing differences between factor 2 (i.e. groups) is known as the “between-

subjects” factor. Interaction is where one factor influences the other factor.

The main effect of time or condition tests the following i.e. irrespective of which group the

data is in:

The main effect of group tests the following i.e. irrespective of which condition the data is in:

Simple main effects are effectively pairwise comparisons:

Independent variable (Factor 2)

Independent variable (Factor 1) = time or condition

Time/condition 1 Time/condition 2 Time/condition 3 Group 1 Dependent variable Dependent variable Dependent variable

Group 2 Dependent variable Dependent variable Dependent variable

Independent variable (Factor 2)

Independent variable (Factor 1) = time or condition

Time/condition 1 Time/condition 2 Time/condition 3 Group 1

All data All data All data Group 2

Independent variable (Factor 2)

Independent variable (Factor 1) = time or condition

Time/condition 1 Time/condition 2 Time/condition 3 Group 1 All data Group 2 All data

Independent variable (Factor 2)

Independent variable (Factor 1) = time or condition

Time/condition 1 Time/condition 2 Time/condition 3 Group 1 Data Data Data

Group 2 Data Data Data

* *



* * *

128 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

A mixed factor ANOVA is another omnibus test that is used to test 3 null hypotheses:

3. There is no significant within-subject effect i.e. no significant difference between the

means of the differences between all the conditions/times.

4. There is no significant between-subject effect i.e. no significant difference between

the means of the groups.

5. There is no significant interaction effect i.e. no significant group differences across



Like all other parametric tests, mixed factor ANOVA makes a series of assumptions which

should either be addressed in the research design or can the tested for.

• The “within-subjects” factor should contain at least two related (repeated measures)

categorical groups (levels)

• The “between-subjects” factor should have at least two categorical independent

groups (levels).

• The dependent variable should be continuous and approximately normally distributed

for all combinations of factors.

• There should be homogeneity of variance for each of the groups and, if more than 2

levels) sphericity between the related groups.

• There should be no significant outliers.


Open 2-way Mixed ANOVA.csv in JASP. This contains 4 columns of data relating to the type

of weightlifting grip and speed of the lift at 3 different loads (%1RM). Column 1 contains the

grip type, columns 2-4 contain the 3 repeated measures (30, 50 and70%). Check for significant

outliers using boxplots then go to ANOVA > Repeated measures ANOVA.

Define the Repeated Measures Factor, %1RM, and add 3 levels (30, 50 and 70%). Add the

appropriate variable to the Repeated measures Cells and add Grip to the Between-Subjects


129 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Additionally, tick Descriptive statistics and Estimates of effect size (ω2).

In Descriptive plots, move %1RM to the horizontal axis and Grip to separate lines. It is now

possible to add a title for the vertical axis.

130 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


The output should initially comprise of 3 tables and 1 graph.

For the main effect for %1RM, the within-subjects effects table reports a large F-statistic

which is highly significant (p<.001) and has a large effect size (0.744). Therefore, irrespective

of grip type, there is a significant difference between the three %1RM loads.

However, JASP has reported under the table that the assumption of sphericity has been

violated. This will be addressed in the next section.

Finally, there is a significant interaction between %1RM and grip (p<.001) which also has a

large effect size (0.499). This suggests that the differences between the %1RM loads are

affected somehow by the type of grip used.

For the main effect for grip, the between-subjects table shows a significant difference

between grips (p< .001), irrespective of %1RM.

From the descriptive data and the plot, it appears that there is a larger difference between

the two grips at the high 70% RM load.

131 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


In Assumptions Checks, tick Sphericity tests, Sphericity corrections and Homogeneity tests.

Mauchly’s test of sphericity is significant so that assumption has been violated, therefore, the

Greenhouse-Geisser correction should be used since epsilon is <0.75. Go back to Assumption

Checks and in Sphericity corrections leave Greenhouse-Geisser only ticked. This will result in

an updated Within-Subjects Effects table:

132 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

Levene’s test shows that there is no difference in variance in the dependent variable between

the two grip types.

However, if the ANOVA reports no significant difference you can go no

further in the analysis.


If the ANOVA is significant post hoc testing can now be carried out. In Post Hoc Tests add

%1RM to the analysis box on the right, tick Effect size and, in this case, use Holm for the post

hoc correction. Only Bonferroni or Holm’s corrections are available for repeated measures.

133 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

The post hoc tests show that irrespective of grip type each load is significantly different from each of

the other loads, and as seen from the plot, lift velocity significantly decreases as load increases.

Finally, In Simple main effects add Grip to the Simple effect factor and %1RM to Moderator

factor 1

These results show that there is a significant difference in lift speed between the two grips at

30% 1RM and also at the higher 70% 1RM loads (p=0.035 and p<0.001 respectively).

134 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


Using the Greenhouse-Geisser correction, there was a significant main effect of load (F= (1.48,

26.64) = 115.45, p<.001). Bonferroni corrected post hoc testing showed that there was a

significant sequential decline in lift speed from 30-50% 1RM (p=.035) and 50-70% 1RM


There was a significant main effect for grip type (F (1, 18) = 20.925, p<.001) showing an overall

higher lift speed using the traditional rather than the reverse grip.

Using the Greenhouse-Geisser correction, there was a significant %1RM x Grip interaction (F

(1.48, 26.64) = 12.00, p<.001) showing that the type of grip affected lift velocity over the

%1RM loads.

135 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

CHI-SQUARE TEST FOR ASSOCIATION The chi-square (χ2) test for independence (also known as Pearson's χ2 test or the χ2 test of association)

can be used to determine if a relationship exists between two or more categorical variables. The test

produces a contingency table, or cross-tabulation, which displays the cross-grouping of the categorical


The χ2 test checks the null hypothesis that there is no association between two categorical variables.

It compares the observed frequencies of the data with frequencies which would be expected if there

was no association between the two variables.

The analysis requires two assumptions to be met:

1. The two variables must be categorical data (nominal or ordinal)

2. Each variable should comprise two or more independent categorical groups

Most statistical tests fit a model to the observed data with a null hypothesis that there is no difference

between the observed and modelled (expected) data. The error or deviation of the model is calculated


Deviation = ∑ (𝒐𝒃𝒔𝒆𝒓𝒗𝒆𝒅 −𝒎𝒐𝒅𝒆𝒍) 𝟐

Most parametric models are based around population means and standard deviations. The χ2 model,

however, is based on expected frequencies.

How are the expected frequencies calculated? For example, we categorised 100 people into male,

female, short and tall. If there was an equal distribution between the 4 categories expected frequency

= 100/4 or 25% but the actual observed data does not have an equal frequency distribution.



Male Female Row


Tall 25 25 50

Short 25 25 50

Column Total 50 50

The model based on expected values can be calculated by:

Model (expected) = (𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑥 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)/100

Model – tall male = (81 x 71) /100 = 57.5

Model – tall female = (81 x 29) /100 = 23.5

Model – small male = (19 x 71) /100 = 13.5

Model – small female = (19 x 29) /100 = 5.5



Male Female Row


Tall 57 24 81

Short 14 5 19

Column Total 71 29

136 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

These values can then be added to the contingency table:

Male (M) Female (F) Row Total

Tall (T) 57 24 81

Expected 57.5 23.5

Short (S) 14 5 19

Expected 13.5 5.5

Column Total 71 29

the χ2 statistic is derived from ∑ (𝐨𝐛𝐬𝐞𝐫𝐯𝐞𝐝 −𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝)




χ2 tests are only valid when you have a reasonable sample size, that is, less than 20% of cells have an

expected count of less than 5 and none have an expected count of less than 1.


The dataset Titanic survival is a classic dataset used for machine learning and contains data on 1309

passengers and crew who were on board the Titanic when it sank in 1912. We can use this to look at

associations between survival and other factors. The dependent variable is ‘Survival’ and possible

independent values are all the other variables.

137 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

By convention, the independent variable is usually placed in the contingency table columns and the

dependent variable is placed in the rows.

Open Titanic survival chi square.csv in JASP, add survived to rows as the dependent variable and sex

into columns as the independent variable.

In statistics, tick all the following options:

In Cells, tick the following:

138 | P a g e JASP 0.14 - Dr Mark Goss-Sampson


First, look at the Contingency table output.

Remember that χ2 tests are only valid when you have a reasonable sample size, i.e. less than 20% of

cells have an expected count of less than 5 and none have an expected count of less than 1.

From this table, looking at % within rows, it can be seen that more males died on the Titanic compared

to females and more females survived compared to males. But is there a significant association

between gender and survival?

The statistical results are shown below:

χ2 statistic (χ2 (1) = 365.9, p <.001) suggest that there is a significant association between gender and


χ2 continuity correction can be used to prevent overestimation of statistical significance for small

datasets. This is mainly used when at least one cell of the table has an expected count smaller than 5.

139 | P a g e JASP 0.14 - Dr Mark Goss-Sampson

As a note of caution, this correction may overcorrect and result in an overly conservative result that

fails to reject the null hypothesis when it should (a type II error).

The likelihood ratio is an alternative to the Pearson chi-square. It is based on maximum-likelihood

theory. For large samples, it is identical to Pearson χ2. It is recommended in particular for small

samples sizes i.e. <30.

Nominal measures, Phi (2 x 2 contingency tables only) and Cramer's V (most popular) are both tests

of the strength of association (i.e. effect sizes). Both values are in the range of 0 (no association) to 1

(complete association). It can be seen that the strength of association between the variables has a

large effect size.

The Contingency coefficient is an adjusted Phi value and is only suggested for large contingency tables

such as 5 x 5 tables or larger.

Effect size 4 df Small Moderate Large

Phi and Cramer’s V (2x2 only) 1 0.1 0.3 0.5

Cramer’s V 2 0.07 0.21 0.35

Cramer’s V 3 0.06 0.17 0.29

Cramer’s V 4 0.05 0.15 0.25

Cramer’s V 5 0.04 0.13 0.22

JASP also provides the Odds ratio (OR) which is used to compare the relative odds of the occurrence

of the outcome of interest (survival), given exposure to the variable of interest (in this case gender).

For some reason, JASP calculates OR as a natural log. To convert this from a log value calculate the

natural antilog value (using Microsoft calculator, input number then click on Inv followed by ex), in this

case, it is 11.3. This suggests that male passengers had 11.3 times more chance of dying than females.

4 Kim HY. Statistical notes for clinical researchers: Chi-squared test and Fisher's exact test. Restor. Dent. Endod. 2017; 42(2):152-155.

How is this calculated? Use the counts from the contingency table in the following:

Odds[males] = Died/Survived = 682/162 = 4.209

Odds[females] = Died/Survived = 127/339 = 0.374

OR = Odds[males] / Odds [females] = 11.3


We can also further decompose the contingency table as a form of post hoc testing by converting the

counts and expected counts in each cell to a standardised residual. This can tell us if the observed

counts and expected counts are significantly different in each cell.

The standardized residual for a cell in a table is a version of the standard z-score, calculated as

z = observed — expected


In the special case where df = 1, the calculation of the standardized residual incorporates a correction


z = |observed — expected| — 0.5


The resulting value of z is then given a positive sign if observed>expected and a negative sign if

observed<expected. Z-score significances are shown below.

z-score P-value

<-1.96 or > 1.96 <0.05

<-2.58 or > 2.58 <0.01

<-3.29 or > 3.29 <0.001

When the z-scores are calculated for each cell in the contingency table we can see that significantly

fewer women died than expected and significantly more males died than expected p<.001.

Female No z= - 9.5

Male No z = 7.0

Female Yes z = 12.0

Male Yes z = -8.9

A meta-analysis is a statistical analysis that integrates results from multiple studies, providing a single

numerical value of the overall treatment effect for that group of studies. The difference between

statistical data analysis and a meta-analysis is shown below. Effectively each study becomes a

“participant” in the meta-analysis.

Statistical analysis Meta-analysis

Participant 1 Individual data Study 1 Study data

Participant 2 Individual data Study 2 Study data

Participant 3 Individual data Study 3 Study data

Participant 4 Individual data Study 4 Study data

Participant 5 Individual data Study 5 Study data

Overall group data & statistics Overall study group data & statistics

Effect size and calculations

To perform a Meta-analysis in JASP the overall effect size (ES) and standard error (SE) of the study

needs to be provided. An ES is a dimensionless estimate (no units) that indicates both direction and

magnitude of the effect/outcome. The standard error measures the dispersion of different sample

means taken from the same population and can be estimated from a single sample standard deviation.

Some studies will provide this information, although many do not. However, they will provide results

such as:

• Central tendencies and dispersion

• T or F statistics

• P-values

• Correlation coefficients

• Chi-square

All of these can be converted to an ES and the SE also determined using David Wilsons “Meta-analysis

effect size calculator”

For example, a study comparing a treatment to a control group may provide the variable mean, SD

and n of each group. From this the ES, a standardised mean difference (d) and estimated error variance

(v) can be calculated:

The 95% confidence intervals (CI) can be used to calculate the SE using the following equation:

SE = (upper CI limit – lower CI limit) / 3.92. NB: 3.92 = 2 x SD =95%

SE = (1.7487- (-0.0791) / 3.92

SE = 0.466

A quicker way is to find the square root of the estimated error variance (v)

SE = √0.2174 = 0.466

Therefore, for this study the overall ES = 0.835 and SE = 0.466.

These data can put into Excel and then used to perform a meta-analysis in JASP. The overall ES derived

from the meta-analysis is calculated by combining the effect sizes of the included studies.

To interpret the meta-analysis output, one needs to understand the following concepts,

heterogeneity, the model, effect size, and the forest plot.


Heterogeneity describes any variability that may exist between the different studies. It is the opposite

of homogeneity, which means that the population/data/results are the same. There are 3 types:

• Clinical: Differences in participants, interventions, or outcomes

• Methodological: Differences in study design, risk of bias

• Statistical: Variation in intervention effects or results

If there is no variability, then data can be described as homogenous. Meta-analysis is concerned with

statistical heterogeneity. Statistical heterogeneity used to describe variability among data/studies and

occurs when the treatment effect estimates of a set of data/studies vary among one another. Studies

with methodological flaws and small studies may overestimate treatment effects and can contribute

to statistical heterogeneity.

The diagram above5 shows examples of forest plots exhibiting low and high heterogeneity. In the low

condition, all the studies are generally lined up to the right of the vertical axis and all confidence

intervals are overlapping. In the high condition, the studies are spread over either side of the vertical

decision line and there is little overlap of the confidence intervals. Apart from visual observation of






the level of heterogeneity meta-analysis provides quantitative statistical methods to measure


Tests for heterogeneity

Q Cochran Q test - values are given for 2 tests outlined below, (these tests have low statistical

power when only a few studies are included in the meta-analysis):

‘Omnibus test of the Model Coefficients’ tests the null hypothesis that all the ES are zero.

‘Test for residual heterogeneity’ tests the null hypothesis that all the effect sizes in the studies

are all equal (homogeneity of effect sizes).

τ2 Tau square is an estimate of the total amount of heterogeneity. This is interpreted as

systematic unexplained differences between the observed effects of the separate studies. It

is not affected by the number of studies but it is often hard to interpret how relevant the value

is from a practical standpoint.

I2 This measures the extent of heterogeneity not caused by sampling error. If there is high

heterogeneity possible sub-group analysis could be done. If the value is very low, there is no

point doing further subgroup analyses. I2 is not sensitive to changes in the number of studies

and is therefore used extensively in medical and psychological research, especially since there

is a “rule of thumb” to interpret it. A rough guide for interpretations has been suggested as


• 0% to 40%: might not be important.

• 30% to 60%: may represent moderate heterogeneity.

• 50% to 90%: may represent substantial heterogeneity.

• 75% to 100%: considerable heterogeneity.

H2 The between-study variance determined by equating the Q statistic to its expected value. H2

has a value of 1 in the case of homogeneity and heterogeneity is assumed to be present when


Meta-analysis models

There are two models commonly used in meta-analysis and each makes different assumptions relating

to the observed differences among the studies

Fixed effects model: this assumes that all studies share a common true ES i.e. the data is

homogeneous. All factors that could influence the ES are the same in all the study samples and

therefore very little heterogeneity. Between-study differences are assumed to be due to chance and

thus not incorporated into the model. Therefore, each study included in the meta-analysis is

estimating the same population treatment effect, which, in theory, represents the true population

treatment effect. Each study is weighted where more weight is given to studies with large samples

sizes i.e. more information.

6 Cochrane Handbook for Systematic Reviews of Interventions

This model answers the following question “What is the best estimate of the population effect size?”

Random effects model: this assumes a distribution of the treatment effect for some populations. i.e.

that the different studies are estimating different, yet related, intervention effects. Therefore,

heterogeneity cannot be explained because it is due to chance. This model assigns a more balanced

weighting between studies.

This model answers the question “What is the average treatment effect”.

It is therefore important to check for significant heterogeneity to

select the correct model for the meta-analysis.

Random effects model selection.

JASP provides 8 random effects estimators to estimate the 4 indices of heterogeneity (see tests for

heterogeneity above). All use a slightly different approach resulting in different pooled ES

estimates and CIs. To date, the most often used estimator in medical research is the

DerSimonian-Laird estimator however, recent studies have shown better estimates of

between-study variance using Maximum-Likelihood, Empirical Bayes and Sidak-Jonkman


The Forest Plot.

This plot provides a visual summary of the meta-analysis findings. Graphically it represents the ES and

95% CI for each study and an estimate of the overall ES and 95% CI for all the studies that were

included. As previously mentioned, it can be used to visually assess the level of heterogeneity.

As a rule of thumb, if all the individual study CIs cover the final combined ES and its 95% CI (diamond)

then there is low heterogeneity (or high homogeneity). In this case, 6 studies do not intersect with the


Funnel plot

Looking like an inverted funnel, this is a scatter plot of the intervention effect estimates from individual

studies against each study’s size or precision. The intervention effect should increase as the size of the

study increases. Studies with low power will cluster at the bottom while higher power studies will be

near the top. Ideally, the data points should have a symmetrical distribution. Funnel plot asymmetry

indicates that the meta-analysis results may be biased. Potential reasons for bias include:

• Publication bias (the tendency of authors/publishers to only publish studies with significant


• True heterogeneity

• Artefact (wrong choice of effect measure)

• Data irregularities (methodological design, data analysis etc)

Funnel plot asymmetry can be analysed statistically using either a Rank-test or Eggers test.

Running a meta-analysis in JASP

If you have not added the Meta-analysis module, click the blue cross at the top right corner

of JASP and tick ‘Meta-Analysis’, this will add it to the main menu ribbon. Go to ‘Meta-

Analysis’ and click ‘Classical Meta-Analysis’.

Open meta analysis.csv. This contains the effect sizes and SEM from 14 studies for changes

in DOMS pain scores in groups receiving no therapy vs cryotherapy. To start, add ES to ‘Effect

Size’, SEM to ‘Effect Size Standard Error’ and Author to ‘Study Labels’. Change the ‘Method’

to ‘Fixed Effects’.

Initial output

This should produce two tables:

The significance of the ES is shown in the ‘Fixed and Random Effects’ table:

Omnibus test of model coefficients tests the H0 that all the coefficients in the second table

are all zero i.e. the interventions have no significant effect. As can be seen, this is significant

and H0 can be rejected, thus the intervention has a significant effect.

The test of residual heterogeneity tests the H0 that all the effect sizes in all the studies are

equal. If this is not significant a fixed-effects model should be used. However, in this case, it

is significant and therefore a random-effects model would be more appropriate.

The coefficients table provides an estimate (Wald test) of the overall effect size of the

combined study and its significance. This confirms the result of the Omnibus test in the first


Now we know that a random-effects model is more appropriate to this dataset, return to the

options and change the ‘Method’ to ‘Restricted ML (Maximum Likelihood)’. This will then add

another table showing the different estimates of residual homogeneity:

Both τ2 and I2 show excess variance (heterogeneity) between the studies thus supporting the

use of a random-effects model.

Further options.

Return to the Statistics options and in model fit, tick ‘Forest Plot’, ‘Funnel Plot’ and ‘Rank Test

for Funnel Plot Asymmetry’:

The Forest plot shows the weighted effect sizes (size of the squares reflects the weight of each

study) and CIs used to determine the combined ES (diamond).

It can be easily seen that the overall effects of cryotherapy have a significant positive effect

on DOMS scores (ES = 0.66).

The funnel plot shows that the observed effects sizes appear to be symmetrically distributed

around the vertical axis (based on the overall effect size estimate, in this case, 0.66) and lie

within the 95% confidence triangle. Asymmetry is often reported as being indicative of

publication bias. This plot is accompanied by the ‘Rank Correlation Test’ for funnel plot

asymmetry which in this case is non-significant (p=.912).

Independent t-test

Design example:

Independent variable Group 1 Group 2

Dependent variable Data Data

Independent variable Dependent variable

Categorical Continuous

More dependent variables can be added if required

Paired samples t-test

Design example:

Independent variable Pre-test Post-test

Participant Dependent variable

1 Data Data

2 Data Data

3 Data Data

..n Data Data

Pre-test Post-test

Design example:

Simple correlation

Participant Variable 1 Variable 2 Variable 3 Variable 4 Variable ..n

1 Data Data Data Data Data

2 Data Data Data Data Data

3 Data Data Data Data Data

…n Data Data Data Data Data

Multiple correlations

Design example:

Simple Regression

Participant Outcome Predictor 1 Predictor 2 Predictor 3 Predictor ..n

1 Data Data Data Data Data

2 Data Data Data Data Data

3 Data Data Data Data Data

…n Data Data Data Data Data

Multiple regression

Logistic Regression

Design example:

Dependent Variable (categorical)

Factor (categorical)

Covariate (continuous)

Participant Outcome Predictor 1 Predictor 2

1 Data Data Data

2 Data Data Data

3 Data Data Data

…n Data Data Data

More factors and covariates can be added if required

One-way Independent ANOVA

Design example:

Independent variable Group 1 Group 2 Group 3 Group…n

Dependent variable Data Data Data Data

Independent variable Dependent variable

(Categorical) (Continuous)

More dependent variables can be added if required

One-way repeated measures ANOVA

Design example:

Independent variable (Factor)

Participant Level 1 Level 2 Level 3 Level..n

1 Data Data Data Data

2 Data Data Data Data

3 Data Data Data Data

4 Data Data Data Data

..n Data Data Data Data

Factor (time)


(Related groups)

More levels can be added if required

Two-way Independent ANOVA

Design example:

Factor 1 Supplement 1 Supplement 2

Factor 2 Dose 1 Dose 2 Dose 3 Dose 1 Dose 2 Dose 3

Dependent variable

Data Data Data Data Data Data

Factor 1 Factor 2 Dependent variable

More factors and dependent variables can be added if required

Two-way Repeated measures ANOVA

Design example:

Factor 1 Interventions

Level 1 i.e. intervention 1

Level 2 i.e. intervention 2

Factor 2 Time

Level 1 i.e time 1

Level 2 i.e time 2

Level 3 i.e time 3

Level 1 i.e time 1

Level 2 i.e time 2

Level 3 i.e time 3

1 Data Data Data Data Data Data

2 Data Data Data Data Data Data

3 Data Data Data Data Data Data

..n Data Data Data Data Data Data

Factor 1 levels 1-n Factor 2 levels 1-n

Two-way Mixed Factor ANOVA

Design example:

Factor 1 (Between subjects)

Group 1 Group 2

Factor 2 levels (Repeated measures)

Trial 1 Trial 2 Trial 3 Trial 1 Trial 2 Trial 3

1 Data Data Data Data Data Data

2 Data Data Data Data Data Data

3 Data Data Data Data Data Data

..n Data Data Data Data Data Data

Factor 1 Factor 2 levels

(Categorical) (Continuous)

Chi-squared - Contingency tables

Design example:

Participant Response 1 Response 2 Response 3 Response…n

1 Data Data Data Data

2 Data Data Data Data

3 Data Data Data Data

..n Data Data Data Data

All data should be categorical

SOME CONCEPTS IN FREQUENTIST STATISTICS The frequentist approach is the most commonly taught and used statistical methodology. It

describes sample data based on the frequency or proportion of the data from repeated

studies through which the probability of events is defined.

Frequentist statistics uses rigid frameworks including hypothesis testing, p values and

confidence intervals etc.

Hypothesis testing

A hypothesis can be defined as “a supposition or proposed explanation made based on

limited evidence as a starting point for further investigation”.

There are two simple types of hypotheses, a null hypothesis (H0) and an alternative or

experimental hypothesis (H1). The null hypothesis is the default position for most statistical

analyses in which it is stated that there is no relationship or difference between groups. The

alternative hypothesis states that there is a relationship or difference between groups has n

a direction of difference/relationship. For example, if a study was carried out to look at the

effects of a supplement on sprint time in one group of participants compared to the placebo


H0 = there is no difference in sprint times between the two groups

H1 = there is a difference in sprint times between the two groups

H2 = group 1 is greater than group 2

H3 = group 1 is less than group 2

Hypothesis testing refers to the strictly predefined procedures used to accept or reject the

hypotheses and the probability that this could be purely by chance. The confidence at which

a null hypothesis is accepted or rejected is called the level of significance. The level of

significance is denoted by α, usually 0.05 (5%). This is the level of probability of accepting an

effect as true (95%) and that there is only 5% of the result being purely by chance.

Different types of hypothesis can easily be selected in JASP, however, the null hypothesis is

always the default.

Type I and II errors

The probability of rejecting the null hypothesis, when it is, in fact, true, is called Type I error

whereas the probability of accepting the null hypothesis when it is not true is called Type II


The truth Not guilty (H0) Guilty (H1)

The verdict

Guilty (H1)

Type I error An innocent person goes to prison

Correct decision

Not guilty (H0) Correct decision

Type II error A guilty person goes free

Type I error is deemed the worst error to make in statistical analyses.

Statistical power is defined as the probability that the test will reject the null hypothesis when

the alternative hypothesis is true. For a set level of significance, if the sample size increases,

the probability of Type II error decreases, which therefore increases the statistical power.

Testing the hypothesis

The essence of hypothesis testing is to first define the null (or alternative) hypothesis, set

the criterion level α, usually 0.05 (5%), collect and analyse sample data. Use a test statistic to

determine how far (or the number of standard deviations) the sample mean is from the

population mean stated in the null hypothesis. The test statistic is then compared to a critical

value. This is a cut-off value defining the boundary where less than 5% of the sample means

can be obtained if the null hypothesis is true.

If the probability of obtaining a difference between the means by chance is less than 5% when

the null hypothesis has been proposed, the null hypothesis is rejected and the alternative

hypothesis can be accepted.

The p-value is the probability of obtaining a sample outcome, given that the value stated in

the null hypothesis is true. If the p-value is less than 5% (p < .05) the null hypothesis is

rejected. When the p-value is greater than 5% (p > .05), we accept the null hypothesis.

Effect size

An effect size is a standard measure that can be calculated from any number of statistical

analyses. If the null hypothesis is rejected the result is significant. This significance only

evaluates the probability of obtaining the sample outcome by chance but does not indicate

how big a difference (practical significance), nor can it be used to compare across different


The effect size indicates the magnitude of the difference between the groups. So for example,

if there was a significant decrease in 100m sprint times in a supplement compared to a

placebo group, the effect size would indicate how much more effective the intervention was.

Some common effect sizes are shown below.

Test Measure Trivial Small Medium Large

Between means Cohen’s d <0.2 0.2 0.5 0.8

Correlation Correlation coefficient (r) Rank -biserial (rB) Spearman’s rho

<0.1 <0.1 <0.1

0.1 0.1 0.1

0.3 0.3 0.3

0.5 0.5 0.5

Multiple Regression Multiple correlation coefficient (R)

<0.10 0.1 0.3 0.5

ANOVA Eta Partial Eta Omega squared

<0.1 <0.01 <0.01

0.1 0.01 0.01

0.25 0.06 0.06

0.37 0.14 0.14

Chi-squared Phi (2x2 tables only) Cramer’s V Odds ratio (2x2 tables only)

<0.1 <0.1 <1.5

0.1 0.1 1.5

0.3 0.3 3.5

0.5 0.5 9.0

In small datasets, there may be a moderate to large effect size but no significant differences.

This could suggest that the analysis lacked statistical power and that increasing the number

of data points may show a significant outcome. Conversely, when using large datasets,

significant testing can be misleading since small or trivial effects may produce statistically

significant results.


Most research collects information from a sample of the population of interest, it is normally

impossible to collect data from the whole population. We do, however, want to see how well

the collected data reflects the population in terms of the population mean, standard

deviations, proportions etc. based on parametric distribution functions. These measures are

the population parameters. Parameter estimates of these in the sample population are

statistics. Parametric statistics require assumptions to be made of the data including the

normality of distribution and homogeneity of variance.

In some cases these assumptions may be violated in that the data may be noticeably skewed:

Sometimes transforming the data can rectify this but not always. It is also common to collect

ordinal data (i.e. Likert scale ratings) for which terms such as mean and standard deviation

are meaningless. As such there are no parameters associated with ordinal (non-parametric)

data. The non-parametric counterparts include median values and quartiles.

In both of the cases described non-parametric statistical tests are available. There are

equivalents of most common classical parametric tests. These tests don’t assume normally

distributed data or population parameters and are based on sorting the data into ranks from

lowest to highest values. All subsequent calculations are done with these ranks rather than

with the actual data values.

Comparing one sample to a known or hypothesized population mean.

Testing relationships between two or more variables

Data type

Continuous Ordinal Nominal

Are parametric

assumptions met?

Yes No



Spearman’s or

Kendall’s tau




Continuous Ordinal Nominal

2 categories >2 categories


t-test One-sample

median test Binomial test Multinomial test

or Chi-square

‘goodness of fit’

Data type

Currently not available in


Predicting outcomes

Testing for differences between two independent groups

Data type

Continuous Ordinal Nominal

More than one

predictor variable?

No Yes










Currently not available in JASP

Data type

Continuous Ordinal Nominal

Are parametric

assumptions met?

Yes No



Mann-Whitney U


Chi-square or

Fischer’s Exact


Testing for differences between two related groups

Testing for differences between three or more independent groups

Data type

Continuous Ordinal Nominal

Are parametric

assumptions met?

Yes No

Paired samples t-test Wilcoxon’s test McNemar’s test

Currently not available in


Data type

Continuous Ordinal Nominal

Are parametric

assumptions met?

Yes No

ANOVA Kruskall-Wallis Chi-square

Contingency tables

Testing for differences between three or more related groups

Test for interactions between 2 or more independent variables

Data type

Continuous Ordinal Nominal

Are parametric

assumptions met?

Yes No

RMANOVA Friedman test Repeated measures

logistic regression

Data type

Continuous Ordinal Nominal

Are parametric

assumptions met?

Yes No



Ordered logistic


Factorial logistic


Applied Economics and Finance

Vol. 3, No. 2; May 2016

ISSN 2332-7294 E-ISSN 2332-7308

Published by Redfame Publishing



Capital Budgeting Theory and Practice:

A Review and Agenda for Future Research

Lingesiya Kengatharan 1

1 Department of Financial Management, University of Jaffna, Sri Lanka.

Correspondence: Lingesiya Kengatharan, Department of Financial Management, University of Jaffna, Sri Lanka.

Received: December 21, 2015 Accepted: January 14, 2016 Available online: February 4, 2016

doi:10.11114/aef.v3i2.1261 URL:


The main purpose of this research was to delineate unearth lacunae in the extant capital budgeting theory and practice

during the last two decades and ipso facto become springboard for future scholarships. Web of science search and iCat

search were used to locate research papers published during the last twenty years. Four criteria have been applied in

selection of research papers: be an empirical study, published in English language, appeared in peer reviewed journal

and full text research papers. These papers were collected from multiple databases including OneFile (GALE), SciVerse

ScienceDirect (Elsevier), Informa - Taylor & Francis (CrossRef), Wiley (CrossRef), Business (JSTOR), Arts & Sciences

(JSTOR), Proquest ,MEDLINE (NLM), and Wiley Online Library. Search parameters covered capital budgeting,

capital budgeting decision, capital budgeting theory, capital budgeting practices, capital budgeting methods, capital

budgeting models, capital budgeting tools, capital budgeting techniques, capital budgeting process and investment

decision. Thematic text analyses have been explored to analyses them. Recent studies lent credence on the use of more

sophisticated capital budgeting techniques along with many capital budgeting tools for incorporating risk.

Notwithstanding, it drew a distinction between developed and developing countries. Moreover, factors impinging on

choice of capital budgeting practice were identified, and bereft of behavioral finance and event study methodological

approach were highlighted. More extensive studies are imperative to build robust knowledge of capital budgeting theory

and practice in the chaotic environment. This research was well thought out in its design and contributed by stating the

known and unknown arena of capital budgeting during the last two decades. This scholarship facilitates to academics,

practitioners, policy makers, and stakeholders of the company.

Keywords: Capital budgeting theory and practices, capital budgeting tools for incorporating risk, discount rate

1. Introduction

Predominantly, area of capital and capital budgeting of financial management have been attracted many researchers

during the last five decades and the seminal studies culminated with presenting many theories (e.g., Markowitz,1952;

Modigliani & Miller,1958; Markowitz,1959; Miller & Modigliani,1961; Fama,1970; Black & Scholes,1973; Ross, 1976;

Roll,1977; Myers,1977; Myers,1984; Jensen,1986; Ritter,1991;Graham & Harvey, 2001; Myers,2003; Halov &

Heider,2004; Atkeson & Cole,2005;) and models (e.g.,Markowitz,1952; Sharpe,1963; Sharpe,1964; Linter,1965;

Roll,1977) time to time. Notwithstanding, due to the globalization, environmental changes and cutting edge advanced

technological developments, theories and models developed in the past do not applicable today and many of them are

criticized and their applicability in practice is intriguing (e.g., Malkiel, 2003; Bornholt, 2013). A curious instance

illustrated by Brounen, de Jong and Koedijk (2004) is that ‗Nobel Prize winning concepts like the capital asset pricing

model and capital structure theorems have been praised and taught in class rooms, but to what the extent to these

celebrated notions have also found their way into corporate board rooms remains somewhat opaque‘ (p.72). ‗Traditional

capital budgeting methods have been heavily criticized of discouraging the adoption of advanced manufacturing

technology and thus undermining the competitiveness of Western firms‘ (Slagmulder, Bruggeman & Wassenhove, 1995,

p.121). In a similar vein, many research scholars on their seminal scholarships argued that there are gaps in theory of

capital budgeting and its applicability (e.g., Mukheijee & Henderson, 1987; Arnold & Hatzopoulos, 2000; Graham &

Harvey, 2001; Cooper, Morgan, Redman & Smith, 2002; Brounen et al., 2004; Kersyte, 2011).

Firms operating in a dynamic environment must respond to changes to beat competitors and to sustain, survive and

grow in markets (Ghahremani, Aghaie & Abedzadeh, 2012). Most changes impinge on capital investment decisions,

which can invariably involve large sums of money over the long period (e.g.,Peterson & Fabozzi, 2002, Cooper et al.,

2002; Dayananda, Irons, Harrison, Herbohn & Rowland, 2002) and these decisions are critical in managing strategic

change and sustaining long term corporate performance (Emmanuel, Harris & Komakech, 2010). Capital investment

decision can be acquisitions, investing new facilities, new product development, employing new technology and

adoption of new business processes or some combination of these (Emmanuel et al., 2010). Capital budgeting

investment decisions are critical to survival and long term success for firms due to many factors and those factors are

commonly named as uncertainty. The global financial crisis is epitomized this truth. One of the most intractable issues

confronted by researchers is how to identify, capture, and evaluate uncertainties associated with long term projects

(Haka, 2006). Sources of uncertainty range from the mundane (cash flow estimation, number and sources of estimation

error, etc.) to the more esoteric (complementarities among investments, options presented by investment opportunities,

opportunity cost of investments, etc.) (Haka, 2006). Since capital investment decision deals with large sum of fund,

scrupulous attention has been given in making decision. ‗Capital budgeting is as the procedures, routines, methods and

techniques used to identify investment opportunities, to develop initial ideas into specific investment proposals, to

evaluate and select a project and to control the investment project to assess forecast accuracy‘(Segelod,1997). Albeit

there are number of capital budgeting methods assist in making decision, number of other uncertainty factors have

deleterious penetration into making capital budgeting decision.

Nowadays, complex methods are used for making capital budgeting decision rather purely depends on theories of

capital budgeting because of uncertainty and other contingency factors (Singh, Jain &Yadav, 2012; Zhang, Huang

&Tang, 2011; Kersyte, 2011; Bock & Truck, 2011; Byrne & Davis, 2005;Cooper et al, 2002; Arnold & Hatzopoulos,

2000; Mao, 1970; and Dickerson, 1963). After the advent of full-fledged globalization and in the era of cutthroat

competition (Verma, Gupta & Batra, 2009), advanced developments in technologies, other macro environmental factors

and demographic factors are intruding into capital budgeting practices (Verbeeten, 2006). In a world of geo -political,

social as well as economic uncertainty, strategic financial management is of process of change, in turn requiring a re-

examination of the fundamental assumption (e.g, efficient market hypothesis, Fama,1970) that cut across traditional

boundaries of the financial management (Hill, 2008). With limited credit and other sources of financing in today‘s

uncertain and challenging economic environment, also required to be scrupulously evaluated the profitability and

successfulness of proposed capital investments and allocate limited capital is more vital than ever (Kester & Robbins,


Over the last 20 years, there have been many changes and challenges in making financial decision due to the global

financial crisis, fluctuations in value of money, advanced technology, interest rate, exchange rate and inflation rates‘

risks and dramatic changes in economic and business environment both in national as well as in global markets. Thus,

there is need to re- examine and re- study for re-building capital budgeting practices since it has considerable impact on

investment decision making. The investment decision making is not a simple or straightforward approach, the risk is an

important element in making investment decision. There are number of risk techniques employed by companies for

evaluating investment projects. However, there is problem in setting up theoretical model and applying that model into

practice (e.g: Arnold & Hatzopoulos, 2000; Digkerson, 1963). Thus, the theory is not purely able to apply at all times.

Sometimes theories developed in the past do not applicable today. There is no doubt, over the last two decades

corporate practices regarding capital budgeting practices have not been static, diverged from the theories.

This study presents systematic review on capital budgeting practices literature published in the last two decades. The

systematic review of literature is referred to as 'principally justified by the manner in which the reviewer proceeds, stage

by stage, with full transparency and explicitness about what is (and what is not) done, typically using a protocol to

guide the process' (Young, Ashby, Boaz & Grayson, 2002, p.220). Through this review, updating information about the

capital budgeting techniques which being used by firms and to compare the current usage of various techniques,

methods with those found in previous studies. This study is thus accumulatively builds a robust knowledge in the area

of capital budgeting practices and identifying unearth gaps will become springboard for future research. Therefore, this

research guides the researchers to reflect on and assess where they are in an area of capital budgeting practices and

guide future research directions.

1.1 Objective of the study

Examining empirical research on capital budgeting practices to date has been very useful in explaining importance of

capital budgeting practices for the long time success of the business organization. Nowadays, complex methods are used

for making capital budgeting decision rather purely depends on theories of capital budgeting. Advanced developments

in technologies, other macro environmental factors and demographic factors are intruding into capital budgeting

practices and thus some of the theories become out of use in well developed countries (e.g: payback period). Thus, the

main aim of this research is to demonstrate unearth gaps in the existing capital budgeting practices literature and to

suggest the directions for the future research .It will further attempt to

- Explain the capital budgeting theories and practices in different countries and demonstrate the disparities between theories and practices of capital budgeting

- Identify the factors that determine use of capital budgeting practices of a country or firm

1.2 Problem Statement

During the past twenty years (1993-2013), the theory of capital budgeting has been characterized by the many increased

applications on the basis of risk and uncertainty resulting from global economic, technological and advanced educational

changes e.g: inflation risk, interest rate and exchange rate risk. Capital budgeting is the backbone of the financial

management. Modern financial management theory generally assumes that the primary objective of a firm is to

maximize the wealth of its owners (Atrill, 2009). Uncertainty and risk are the major influence in making investment

decision and thus Mao (1970) says ‗a central aspect of any theory of capital budgeting is the concept of risk‘ (p.352). In

order to implement the objective of modern financial management theory, ‗financial executives need criteria for

choosing between alternative time patterns of project evaluations within his planning horizon' (Mao, 1970). There are

complexities in making investment decision and the theory could not always applicable in all situations. Problem

statement of this study is how far capital budgeting theory differentiates with practice and to demonstrate the nature

of the gaps in existing capital budgeting literature.

1.3 Research Questions

On the basis of background of research, the following research questions have been developed as the way to attain

research objectives.

 What are the capital budgeting theories and practices used by firms? Are there any disparities between

the capital budgeting theories and practices? If so how?

 What are the factors determines the use of capital budgeting practices? Are there different across

countries? If so how?

 What are the gaps in the existing capital budgeting literature?

2. Methodology

The main objective of this study is to find out gaps in extant capital budgeting literature during the past 20 years of

study. The methodology covers research philosophy, research approach, research strategy, methods of data collection

and data analysis. These entire methodological spheres used throughout the research have been below discussed in


2.1 Research Philosophy

One of the dominant philosophical concepts is the ‗ontological assumption‘ that enquires about nature of reality, and

any study absence of this assumption would be treated as 'blinded' (Easterby-Smith, Thorpe & Lowe, 2002, p. 27). This

research assumes that capital budgeting practices are different across firms/ nations and the ways of looking at capital

budgeting practices are not same at all the time. It can be further articulated that even when there are number of capital

budgeting theories, we cannot expect similar application at all situations and thus it is subject to changes. Thus, the

ontological assumption is of constructionism. Constructionist ontology‘s view that world is being internally constructed

and both individually and collectively generate meaning where we are not sure about what is real! Consequently, people

guess reality of the world with the experience of external indicators.

Another important philosophical assumption is the epistemological assumption. It enquires about what should be taken

as acceptable knowledge in a particular field (Easterby-Smith et al., 2002). The traditional practices do not applicable in

the contemporary borderless global businesses and thus try to understand the factors determine the use of capital

budgeting practices. It guides how can we understand and determine capital budgeting practices in different context and

in different geographical location. The knowledge can be attainable by text analysis with subject methods. Thus, it

offers what is already known about capital budgeting practices and captures the gaps in extant literature by

systematically reviewing literature.

This research takes interpretive approach on epistemology for answering research questions. The reality is not

independent of individual thought and thus all the research findings are not similar with one another (Blaikie, 2007).

Thus, this multiple reality is called ‗subjectivism‘. Findings could vary in different context such as nature of

measurement tools, geographical location, company‘s size, organizational practices, types of sectors and form of

methodology used. Thus, this research is organized by collecting relevant literature review and interpreting concepts of

relationship between researchers and research. Inductive approach is thus suited by exploring thematic text analysis.

2.2 Research approach

The research strategy leads to design qualitative research approach. This research covered sufficient researches carried

out during the past two decades in the area of capital budgeting. This research analyzed past literature by identifying

relevant themes and then thematic text analysis was employed. Thus, this research is ‗subjective‘ and adopts inductive

approach in order to answering research questions.

2.3 Research strategy

Research strategy tells about how research should be designed for answering a set of developed research questions and

consequently research aims are attained. As this research covers last twenty years of research papers carried out in the

area of capital budgeting from 1993 to 2013, this study adapts research strategy of longitudinal research design.

However, the collection of literature covers broad areas including different sectors, different locations/countries and

different size of firms. Thus, the systematic literature review sometimes takes comparative research design as well.

2.4 Data collection methods

Web of science search and iCat search were used to locate research papers published during the last twenty years. Web

of science is a mass search engine linking with mass database covering more than 10000 journals and 110 000

conference proceedings. However, all most all the databases (online the full text of electronic resources) have been

covered by iCat search which is subscribed and launched by Kingston University, London. Kingston University

library‘s access service was exploited for collecting all the research papers. Search parameters includes capital

budgeting, capital budgeting decision, capital budgeting theory, capital budgeting practices, capital budgeting methods,

capital budgeting models, capital budgeting tools, capital budgeting techniques, capital budgeting process and

investment decision.

Initially, there are 363 research papers identified during the last 20 years. Of them, 201 research papers were screened

and considered for this research to be reviewed based on the following criteria.

- An empirical study (i.e., sampling process, measurement , analysis): 363 papers were identified - Published in English language: Of 363, 264 were published in English. - Should be published in peer reviewed journal : Of 264, 239 were published in a peer reviewed journals - Full text research papers: Of 239, 201 papers were full text journal

These papers were collected from following databases: OneFile (GALE), SciVerse ScienceDirect (Elsevier), Informa -

Taylor & Francis (CrossRef), Wiley (CrossRef), Business (JSTOR), Arts & Sciences (JSTOR), MEDLINE (NLM),

SpringerLink, Wiley Online Library , Inderscience Journals , ERIC (U.S. Dept. of Education), Sage Publications

(CrossRef), INFORMS Journals, Health Reference Center Academic (Gale), University of Chicago Press Journals,

Emerald Management eJournals, Directory of Open Access Journals (DOAJ),IngentaConnect, IEEE (CrossRef). All

these papers were spread over across many journals including Journal of Banking and Finance, The Journal of Finance,

Journal of Accounting and Economics, Management Decision, Journal of Cleaner Production, Journal of Financial

Economics, Management Science, European Journal of Operational Research, Accounting Review, Journal of

Economic Behavior and Organization, Long Range Planning, Energy Policy, Accounting, Organizations and Society,

Computers and Mathematics with Applications.

2.5 Data analysis

As discussed, at the outset, Miles and Huberman‘s (1984) proposed strategy was carried out that involves collection,

reduction, displays and conclusions. Based on the set criteria, 363 research papers were reduced to 201 and they

analyzed using a coding procedure. Initially, collected research papers were grouped into themes or topics. Theme

represents the focused area of research and it is selective coding on grounded theory (Corbin & Strauss, 1990). Themes

were in terms of current theory and practices of capital budgeting, factors influencing on capital budgeting practices/

determinants of capital budgeting practices, capital budgeting methods/ models, supplementary tools for the capital

budgeting methods, influences of capital budgeting practices on investment decisions, component of capital budgeting

process, capital budgeting stages, and global capital budgeting practices.

A thematic analysis was employed to capture key themes and concepts in chosen research papers. In doing so, open

coding, as suggested by Strauss and Corbin (1998), was adopted. The analysis was focused on the concepts related to

capital budgeting practices and theories, research design, research sampling techniques, research approach, year of

publication, nature of industry and so on. The results of this analysis were presented below.

3. Results

3.1 Multi-disciplinary concepts of capital budgeting

During the past twenty years, a total of 202 research papers appeared in peer reviewed indexed journals were identified

across many academic journals. Majority of the papers appeared in Engineering Economist (N= 32) yielding 15.92%

followed by Managerial Finance (27), Public Budgeting & Finance (16), Financial Management(9), Journal of Banking

and Finance (8), Journal of Business Finance & Accounting (6), Accounting Education(5), Management Accounting

Research(5), The Journal of Finance(5), Journal of Corporate Accounting & Finance (4), Management Decision (4) and

The Review of Financial Studies. All of these journals represented 62.20 % of research papers in capital budgeting in

the last two decades. The reminder of the research papers appeared in many journals. Capital budgeting is thus

multi-disciplinary aspects and applied across many discipline. The table 1 below summarizes entire list of journals

contained capital budgeting research papers.

Table 1. Name of the journals: Capital budgeting research papers appeared during the past twenty years

Name of the Journal Number of




Engineering Economist 32 15.92%

Managerial Finance 27 13.43%

Public Budgeting & Finance 16 7.96%

Financial Management 9 4.48%

Journal of Banking and Finance 8 3.98%

Journal of Business Finance & Accounting 6 2.99%

Accounting Education 5 2.49%

Management Accounting Research 5 2.49%

The Journal of Finance 5 2.49%

Journal of Corporate Accounting & Finance 4 1.99%

Management Decision 4 1.99%

The Review of Financial Studies 4 1.99%

Three papers in each journal: Healthcare Financial

Management, Information Sciences, International Journal of

Energy Research, International Journal of Production Economics,

Journal of Financial Economics, Management Science,

Operations Research, The Journal of Business, Theoretical and

Applied Economics.

3 1.49%

Two papers in each journal: Accounting & Finance, Accounting

and Business Research, Accounting, Organizations and Society,

Computers & Industrial Engineering, Computers and

Mathematics with Applications, Contemporary Accounting

Research, European Financial Management, European Journal of

Operational Research, Health care strategic management,

Industrial Management & Data Systems, International Journal of

Business and Management, Journal of Accounting and

Economics, Journal of Computational and Applied Mathematics,

Journal of Information Technology, Journal of Marketing

Management, Journal of Small Business Management, Journal of

the International Academy for Case Studies, Journal of the

Operational Research Society, Long Range Planning, Managerial

and Decision Economics, The Bond Buyer, The Financial


2 1.00%

One paper in each journal: Academy of Marketing Studies

Journal, Accounting Review, Agricultural Finance Review.

1 0.50%

Applied Financial Economics, Australasian Radiology, Australian

Journal of Management, BuR : Business Research, Business

Forum, Wntr-Spring, Business Process Management Journal,

Canadian Journal of Anesthesia/Journal canadien d'anesthésie,

Computational Management Science, Computers and Chemical

Engineering, Cornell Hotel and Restaurant Administration

Quarterly, Energy Policy, European Management Journal, Expert

Systems With Applications, Forest Products Journal, Fuzzy Sets

and Systems, Healthcare financial management. IEEE

Transactions on Engineering Management, Industrial

Management, International Journal of Commerce and

Management, International Journal of Information Technology &

Decision Making, International Journal of Project Management,

International Journal of Quality & Reliability Management,

International Transactions in Operational Research, Journal of

Accounting Research , Journal of Cleaner Production, Journal of

Economic Behavior and Organization, Journal of Empirical

Finance, Journal of Hospitality & Tourism Research, Journal of

International Financial Management & Accounting, Journal of

International Money and Finance, Journal of Management

Accounting Research, Journal of Managerial Issues, Journal of

Property Investment & Finance, Journal of Public Health

Dentistry, Journal of Retail Banking, Journal of Risk and

Insurance, Journal of Teaching in International Business, journal

of the Healthcare Financial Management, Knowledge-Based

Systems, Management Accounting Quarterly, Mid-Atlantic

Journal of Business, Naval Research Logistics (NRL), New

Directions for Higher Education, Operations-Research-Spektrum,

Quarterly Review of Economics and Finance, Real Estate

Economics, Review of Agricultural Economics, Review of

Business, Review of Finance and Banking, Review of

Quantitative Finance and Accounting, Scandinavian Journal of

Management, South East European Journal of Economics and

Business, Strategic Finance, The Accounting Review, The

European Journal of Finance, The Financier, Spring-Winter, The

McKinsey Quarterly, Tsinghua Science & Technology, UTMS

Journal of Economics, Vision: The Journal of Business

Perspective, Journal of advances in management research.

Percentages calculated in terms of number of papers appeared in each journal (N = 202).

3.2 Major themes identified in Capital budgeting research

A total of 201 research papers in capital budgeting have been meticulously reviewed and consequently following major

themes have been identified: capital budgeting theory and practices, capital budgeting theory and practices in developed

countries, capital budgeting theory and practices in developing countries and factor affecting capital budgeting decision.

Findings discusses under identified themes.

3.3 Capital budgeting theory and practices

Capital budgeting decisions are crucial and complex and have attracted many research scholars in this field. According

to Dayananda et al. (2002), capital budgeting is the process of deciding investment projects which create in

maximization of shareholder value. Capital budgeting is mostly dealt with sizable investments in long term assets.

Assets can be either tangible such as building, plant, or equipment or intangible assets such as patents, new technology

or trade mark (Brealey & Myers, 2003). Capital budgeting is not a short term aspects, generally prepared a year in

advance and extendable to five, ten or even fifteen years in future (Brickley, 2006). And thus, Peterson and Fabozzi

(2002) define capital budgeting is the process of analyzing and selecting investment opportunities in long term assets

where its benefits last for more than one year.

Capital budgeting is a fundamental and used everywhere as a tool for planning, control, and allocation of scare

resources among competing demands. Capital budgeting is a vital part in financial planning and decision making since

capital budgeting tools leads better decision making and be able to justify selection of specific capital investments

among competing alternatives (Sekwat,1999).Decision to choose the best investment project among competing projects

is of critical and being taken by top management (Bowman & Hurry, 1993; McGrath, Ferrier & Mendelow, 2004) and

considerable attention is thus to be given to investigating the methods used in evaluating and selecting investment

projects (Sangster, 1993; Segelod, 1998).

The most prevalent capital budgeting techniques in the public finance literature include payback period (PB),

accounting rate of return (ARR), net present value (NPV), internal rate of return (IRR), benefit-cost ratio (BCR), and

profitability index (PI) (e.g., Sekwat,1999;Cooper et al.,2002). Among these methods, four methods .viz., NPV, IRR,

PB and ARR, have been identified as a predominant method and used in many studies (e.g., Pike,1996; Kester, Chang,

Echanis, Haikal, Isa, Skully,Tsui & Wang, 1999; Hermes, Smid, & Yao , 2007).

The PB model determines the length of time required to recover exactly the invested cash outlay. On the other hand, the

ARR is calculated as the ratio of the investment‘s average after tax income to its average book value (Cooper et al.,

2002). The PB period has been criticized for failing to make accurate assessments of project value as it does not

consider use of cash flows, time value of money, risk in a systematic manner and further it does not identify investment

projects that will maximize profits, therefore PB does not have theoretical justification (Pike, 1988; Lefley,1996).

Research scholars and practitioners criticized the ARR due to the ignorance of the time value of money (e.g., Cooper et

al., 2002; Ross, Waterfield, Jordan & Roberts, 2005). And PB methods failed to consider return from the capital

investment after the initial outlay recovered, yet it is also oft- used methods (e.g., Graham & Harvey, 2001; Brounen et

al., 2004; Bennouna, Meredith and Marchant, 2010). Researchers argued that the reasons behind widespread use of PB

method are of its easiness and of providing information about recovery of initial investment.

Thus, in the next generation, the NPV model came into practice where it measures the difference between present value

of the money in and present value of the money out (Cooper et al., 2002). If the NPV is positive, the capital investment

is accepted and vice versa. Alternatively, the IRR determines the rate at which capital investment can be acceptable and

thus equates the cost of the capital investment to the present value of that project (Cooper et al., 2002). In finance, the

methods of assessing capital budgeting using the concepts of the time value of money is called discounted cash

flow (DCF) analysis. The NPV and IRR methods are called discounted cash flow (DCF) methods. The PB and ARR

methods are considered to be non-DCF methods. ‗Capital budgeting theory assumes that projects are evaluated based on

economic merit. Building upon certain economic assumptions, including the time value of money, risk aversion, and an

assumed goal of value maximization, sophisticated investment appraisal techniques such as NPV and IRR, have been

advocated in the literature‘ (Slagmulder et al., 1995,p.123).Notwithstanding, several researchers criticized that requisite

necessary information for NPV and IRR is commonly not known with certainty owing to longer periods, uncertainty

in future, higher degree of risk, ignore the size of the investment and absence of logical comparison on time value of

money (e.g., Sekwat,1999;Cooper et al.,2002; Hermes et al., 2007).Thus, in order to overcome both the time value of

money and the size of the investment, the PI model has been emerged. It is the ratio of the capital investment to its

outlay and the decision being made in terms of the highest PI (Cooper et al., 2002). If this method used carelessly with

constrained investment resources, it generates bad results (Brealey & Myers, 2003).

However, Graham and Harvey (2001) reported that twelve capital budgeting methods were in practice: NPV, IRR,

Annuity, Earning multiple (P/E), Adjusted present value (APV), PB, Discounted Payback, PI, ARR, Sensitivity analysis,

Value at risk and real options. However, all of them are not in usable at all situations in capital budgeting practices. For

example, IRR should not be the best method if investments are mutually exclusive or have multiple rates of return,

however, IRR is oft-exploited methods in practice (Graham & Harvey, 2001; Brounen et al., 2004; Bennouna et al.,


Of these methods, discounted payback considers time value of money but it still ignores cash flows after initial outlay

recovered. Value-at-risk (VAR) is to measure 'the worst expected loss over a given horizon under normal market

conditions at a given confidence level' (Jorion, 2006; p.12), is a relatively new method. The APV additionally covers the

value of financial side-effects of an investment to NPV, and treated as having no drawbacks principally (Ross et al.,


The greatest problems of the traditional present value models are that its complete reliance on quantifiable cash flows.

However, in a contemporary high tech world, many new projects entail complete redesign of the manufacturing

environment and computerized design is of paramount important to be innovative, higher qualities and speedier

response (Cooper et al., 2002). And thus, the theory of capital budgeting is diverged from its practices.

The complex nature of the capital investment in today‘s world incubates many new models into practices including

multi-attribute decision model, and analytical hierarchy process that are more subjective (Cooper et al., 2002). Modern

theoretical developments in finance views that DCF methods are not the best methods to select capital investment

projects: they have severe drawbacks in the analysis of investment projects if the information about future investment

decision is not available (Brennan & Schwartz, 1992; Trigeorgis, 1993; Dixit & Pindyck, 1994). In such a situation,

Real Options Reasoning (ROR) and Game Theory (GT) serves as better analytical tools to evaluate such investment

projects (Smit & Ankum, 1993). GT stresses that firm is having an incentive to invest early in the case of fear of

pre-emption (Smit, 2003)

Real option theory: Real option is closely related to corporate capital investment decision-making and has been

introduced as an alternative approach for investment appraisal under uncertainty. The starting point for real options

research was the criticism of traditional strategic investment decision-making and capital budgeting methods. In general,

a real option represents or reflects the option or options that a company has when it comes to deciding whether to invest

in a project, delay, put it on hold, expand or reduce an investment, or any other flexibility that it may have (Rigopoulos,

2014). ROT involves the use of investment evaluation tools and processes that properly account for both uncertainty and

the company‘s ability to react to new information (Verbeeten, 2006). ROT has operating flexibility (which enables the

management to make or revise decisions at a future time, such as expansion or abandonment of the project) and the

strategic option value (resulting from interdependence with future and follow-up investments, such as implementation in

phases and the postponement of investments) (Verbeeten, 2006). Many researchers have argued that the use of real

options analysis has an advantage over NPV, since NPV is not able to capture the value of managerial flexibility (e.g.,

Ingersoll & Ross, 1992; Trigeorgis, 1993; Dixit & Pindyck, 1994). For example, the management could delay, expand,

abandon, temporarily close or alter the operation during the project‘ life. Ross et al. (2005) argued that most capital

investment projects have options (i.e., the option to expand, the option to modify, the option to abandon), which have

value per se. Although this method has not been applied on a large scale in practice (Hermes et al., 2007), it is mostly

applicable in specific industries or situations. DCF techniques are used concurrently with real options in order to

determine the true NPV (Amram & Howe, 2002). Many research scholars have found that only a few firms have

employed real options (Graham & Harvey, 2001; Ryan & Ryan, 2002; Brounen et al., 2004; Block, 2007; Truong,

Partington & Peat,2008; Verma et al., 2009; Bennouna et al., 2010; Shinoda,2010, Singh et al.,2012; Andres, Fuente &


It is obvious that widespread use of sophisticated capital budgeting during the last two decades. Many earliest studies

investigated about capital budgeting decision rule, in contrast, recent researches attempted to focus on the use of

sophisticated capital budgeting practices (e.g., Miller & Waller, 2003). Application of sophisticated capital budgeting is

more complex, and required the firms to be able to expend cost, time and effort (Busby & Pitts, 1997; Miller & Waller,

2003). Thus, it is important to think about the appropriate level use of sophisticated capital budgeting practices to the net

benefits against costs. Anyhow, theory, in contrast, suggests that if uncertainty exists, use of sophisticated capital

budgeting practices is valuable and the costs would be offset by the gains from successful investments (Verbeeten, 2006).

If uncertainty exists, additional information needed to solve the problem of investment dilemma (Miller & Waller, 2003).

It was identified that Canadian firms seem to be increasingly using sophisticated methods when dealing with risk (for

example, sensitivity analysis, decision-tree analysis, Monte Carlo simulation, ROR, GT) (Bennouna et al., 2010).

Nowadays, there are number of other methods including the project-dependent (risk-adjusted) cost of capital (PDCC), the

weighted average cost of capital (WACC), the cost of debt (CD) used in capital budgeting practices. Among them PDCC

and WACC are said to be sophisticated method and CD is the least sophisticated method (Hermes et al., 2007).

3.4 Capital budgeting tools for incorporating risk

Overall, uncertainty affects future cash flows and causes estimation difficulties. Therefore, various risk analysis and

management science techniques have been developed to supplement the traditional present value based decision models.

Scholarship on the practice of capital budgeting in many countries has found that firms are increasingly employing

more sophisticated capital budgeting techniques in order to make investment decisions over several years (Klammer,

1973; Klammer & Walker,1984; Pike,1988; Jog & Srivastava,1995; Gilbert & Reichart,1995; Farragher, Kleiman &

Sahu,1999; Arnold & Hatzopoulos, 2000; Brounen et al.,2004; Truong et al., 2008; Baker, Dutta & Saadi,2011). In the

contemporary world, there are a number of sophisticated capital budgeting methods including the oft-cited: Monte

Carlo Simulations, Game theory decision rules , Real option pricing, Using certainty equivalents, Decision trees, CAPM

analysis / ß analysis, Adjusting expected values, Sensitivity analysis/break-even analysis, Scenario analysis, Adaptation

of required return/discount rate, IRR, NPV, uncertainty absorption in cash flows, and PB (e.g., Arnold & Hatzopoulos,

2000; Hall, 2000; Graham & Harvey, 2001; Ryan & Ryan, 2002; Murto & Keppo, 2002; Cooper et al., 2002; Smit, 2003;

Sandahl & Sjogren, 2003; Brounen et al., 2004; Lazaridis, 2004; Lord, Shanahan & Bogd, 2004; du Toit & Pie naar,

2005;Verbeeten, 2006; Elumilade, Asaolu & Ologunde, 2006; Hermes et al., 2007; Leon et al., 2008; Correia &

Cramer, 2008; Verma et al., 2009; Bennouna et al., 2010; Shinoda, 2010; Hall & Millard, 2010; Dragota et al, 2010;

Poudel et al., 2009; Kester & Robbins, 2011; Maroyi & Poll, 2012; Singh et al., 2012; Andres et al., 2015). Thus, the

complex models of capital budgeting practices are dependent on not only the use of DCF techniques, but also proper

cash flows, discount rates and the risk analysis (Brigham & Ehrhardt, 2002).

3.5Classification of Capital budgeting Practices

Capital budgeting practices help managers to select n out of N investment projects with the highest profits and an

acceptable ‗risk of ruin‘ (Verbeeten, 2006, p.108). By and large, all capital budgeting practices can be subsumed into the

categories of sophisticated, advanced and naive (e.g., Haka, 1987; Haka, Gordon & Pinches, 1985; Verbeeten, 2006;

Wolffsen, 2012). Naive practices includes PB, the adaptation of required payback and ARR, and the advanced /NPV

based, including Sensitivity analysis/break-even analysis, scenario analysis, the adaptation of required return/discount

rate, IRR, NPV, uncertainty absorption in cash flows, MIRR and PI. Farragher et al. (2001) suggested that a degree of

sophistication is represented by the use of DCF techniques and incorporating risk into the analysis. Sophisticated capital

budgeting methods generally include Monte Carlo simulations, GT, RO, using certainty equivalents, decision trees,

CAPM analysis / ß analysis, and adjusting expected values (Verbeeten, 2006; Wolffsen, 2012).

3.6 Capital budgeting theory and practices in developed countries

This section clearly discusses the capital budgeting theory and practices especially in developed countries. As

aforementioned, the capital budgeting practices are the investment decision taken for increasing shareholders value

(Dayananda et al., 2002).

Many studies have been conducted about capital budgeting practices in U.S. and Europe (e.g., Pike, 1996; Sangster,

1993; Block, 2007; Herme et al., 2007). Chadwell-Hatfield et al.(1997) conducted a survey among 118 manufacturing

firms in the U.S. Results showed that NPV (84%) and IRR (70%) were preferred primary methods. However, it was

clearly observed that two thirds of firms relied on shorter PB periods rather IRR or NPV. A seminal study carried out by

Graham and Harvey (2001) about ‗the theory and practice of corporate finance: evidence from the field‘ and the sample

consisted of 392 CFOs in the USA. In larger firms with high debt ratio, CFOs with MBA were more likely to use DCF

(75% NPV and IRR) than their counterparts. Larger firms applied risk-adjusted discount rate whereas small firms opted

for Monte Carlo simulation for adjusting risk. In addition, their findings further argued that PB method has not used

as a primary tool, however, it kept as a vital secondary tool. Very similar results were reported in Ryan and Ryan‘s

(2002) study where sample consisted of Fortune 1000 companies. Results were found that NPV was most popular

technique, followed by IRR. Most of the firms used sensitivity analysis, scenario analysis, inflation adjusted cash flows,

economic value added, and incremental IRR along with NPV and IRR. Block (1997) studied about capital budgeting

techniques across small business firms operating in the United States. The most popular method was the PB (42.7%),

followed by ARR (22.4%). Notwithstanding, researchers connotes that small business owners seemed to be increasingly

using DCF as the primary method for evaluating.

Cooper et al. (2002) studied capital budgeting practices in fortune 500 companies in America. Sample consisted of 102

chief financial officers reported that commonly used primary capital budgeting model is the IRR and the second is the

payback. Ken and Cherukuri (1991) found that IRR was mostly preferred method in larger companies operating in the

U.S. NPV was the next preferred method. The widely used discount rate was the WACC (78%) and the risk was

commonly measured by sensitivity analysis (80%).Almost similar results were reported in the survey of Fortune 100

firms by Bierman in 1993.

Arnold and Hatzopoulos (2000) conducted a study on "The gap between theory and practice in Capital Budgeting:

Evidence from the UK for 300 UK companies (comprising 100 large, 100 medium and small 100). Results of study

indicate that UK companies have increasingly adopted the analysis of financial textbooks prescribed. Stage has been

reached in which only a small minority do not make use of discounted cash flows, formal risk analysis, adjustment

corresponding inflation and post-audit in their study. Study reported however, managers still using simple rules of

thumb techniques in UK

Jog and Srivastava (1995) conducted a survey of capital budgeting practices in Corporate Canada and the results

showed that the most preferred method was the PB. Similar results were found in the UK in Pike‘s (1996) study. Further

results indicated that decreased use of ARR in Canada and the United Kingdom, respectively. It was identified that

Canadian firms seem to be increasingly using sophisticated methods when dealing with risk (for example, sensitivity

analysis, decision-tree analysis, Monte Carlo simulation, ROR, GT) (Bennouna et al., 2010).

Drury, Braund and Tayles (1993) surveyed 300 manufacturing companies in the UK about their capital budgeting

practices. Results showed that PB (86%) and IRR (80%) were mostly preferred methods across the sample. The widely

used risk analysis was the sensitivity analysis. In a seminal study of Brounen et al. (2004), four European countries

viz., U.K., France, Germany and the Netherlands consisting of 313 companies during 2002 and 2003 were examined.

Their result showed that 47% and 67% of the UK companies were used NPV and PB respectively as a primary tool for

evaluating capital budgeting decision whereas companies in Netherlands were used 70% of NPV and 65% of PB

methods. However, companies in France and Germany reported lower usages of both methods (42% for NPV, 50 % for

PB and 44% for NPV, 51 % for PB respectively). Previous studies have mainly conducted in the U.S. and the UK and

limited number of studies are also available for the Netherlands (e.g., Herst, Poirters & Spekreijse, 1997; Brounen et al.,


Many researches recognized that DCF is the dominant in capital budgeting evaluation methods in the UK (e.g., Arnold

& Hatzopoulos, 2000), the USA (e.g., Ryan & Ryan, 2002) and in Canada (e.g., Payne et al., 1999). However, most of

the US firms use DCF techniques in comparison with firms in European countries (e.g., Brounen et al., 2004). There is

still some reluctance in this field due to the technical aspects of DCF (e.g., Cary, 2008; Magni, 2009). In 1993, Bierman

and Smidt opined that the DCF methods are the pre-eminent investment decision tool and thus, it is imperative to

manager to learn about its uses. Anyhow, NPV, IRR and PB are the most popular methods among North American and

Western European companies (Graham & Harvey, 2001; Brounen et al., 2004).

Sekwat (1999) studied capital budgeting practices among 321 Tennessee municipal governments. His results showed

that most of the municipal government‘s organizations are using benefit cost ratio (62.5 %) and payback methods

(61.5%), and financial officers were in reluctant using IRR, ARR and even NPV methods. Holmen (2005) conducted a

survey of capital budgeting techniques, used for FDI‘s by Swedish firms and found that larger firms were preferred to

use NPV and IRR methods. However, the most preferred method was the PB (79%). In a survey of capital budgeting

practices of Australian listed companies, Truong et al., 2008 found that NPV, IRR and PB were the most popular capital

budgeting evaluation methods. Researchers were also identified the use of real option across the sample but not yet part

of the mainstream.

In 2009, Kester and Robbins surveyed about capital budgeting techniques used by Irish listed companies. Results

revealed that they use DCF methods and reported that most prevalent method was NPV, followed by PB, and IRR.

Scenario analysis and sensitivity analyses were found to be most important tools for incorporating risk. WACC was the

most important widespread method employed for calculating discount rate. On the other hand, Lazaridis (2004) studied

capital budgeting practices in Cyprus. The PB was found as the most preferred method and not NPV.

Shinoda (2010) carried out a survey of capital budgeting in Japan. Questionnaire has been administered to collect data

from a sample of 225 companies listed on Tokyo Stock Exchange. Results showed that firms were using combination of

PB and NPV for evaluating capital investment projects.

In summary, many studies have found that increasing use of sophisticated capital budgeting techniques among many

developed countries: US, UK, European and Australian companies (Freeman & Hobbes, 1991;Shao & Shao, 1996;

Pike, 1996; Herst, Poirters & Spekreijse, 1997; Brounen et al., 2004 ; Truong et al., 2008). However, US companies

seem to be using more DCF methods as compared to European countries.

3.7 Capital budgeting theory and practices in developing countries

There is dearth of studies carried out on capital budgeting practices in developing countries during the last two decades.

In comparison with developed countries, the results of the most studies show a different picture. In most of the

developing countries, PB method was the dominant methods in evaluating capital investment. Kester et al.(1999)

surveyed a total of 226 companies across six countries: Australia, Hong Kong, Indonesia, Malaysia, Philippines and

Singapore. Results showed that PB is still important method and the DCF methods have become increasingly important.

In five Asian countries, 95% of firms used PB method and 88% of them use NPV in evaluating projects. However, both

methods were treated as equally important. Kester et al. (1999) noted that sophistication of capital budgeting techniques

within the developing countries in Asia has been increased very rapidly during the last decade.

Babu and Sharma (1996) studied Indian industries‘ capital budgeting practices and the findings showed that 90% of the

companies were using capital budgeting methods. Of them 75% of companies reported that they were adopting DCF

methods in evaluating capital budgeting, among them IRR was most popular. Sensitivity analysis was found to be

popular in assessing risk. In 1998, Jain and Kumar studied about comparative capital budgeting practices: the Indian

context and sampled 96 nongovernment companies where listed in Bombay Stock Exchange and five companies of

South East Asia. They observed that most preferred capital budgeting techniques was the PB (80% companies),

followed by NPV and IRR. Sensitivity analysis was the preferred risk assessment method.

Cherukuri (1996) surveyed about capital budgeting practices: a comparative study of India and select South East Asian

countries,‖ with those of Hong Kong, Malaysia and Singapore and a sample consisted of top 300 non-government

companies. This study found that of DCF methods, 51% of companies used IRR, followed by NPV (30%). Of non DCF

methods,PB (38%) is the dominant method and the next widely used method was ARR (19%). The non DCF methods

were used as supplement to DCF methods. WACC is the widely used discount rate and Sensitivity analysis was mainly

used for risk assessment. A recent survey of capital budgeting Practices in corporate India, conducted by Verma et

al.(2009), took a sample of 30 manufacturing companies in India. The results confirmed findings of Cherukuri

(1996).This study showed that most preferred method is IRR (56.7%), followed by NPV (50%) and PB (36.7%).WACC

(43.3%) is the widely used discount rate and Sensitivity analysis (36.7%) was mainly used for risk assessment.

Researchers further observed that increasing adoption of DCF rather traditional use of non-discounted techniques. In

2012, Singh et al. studied on capital budgeting decision sampling from 31 listed companies in India. Albeit capital

budgeting decision continued in India, all sampled firms reported that they are using DCF techniques in combining with

non-DCF techniques. Of discounted cash flow techniques, more than three quarters of the sampled companies use IRR

which more preferred than NPV that used by half of the sampled companies. Further it has been reported that half of

the companies use real option techniques in selecting their capital investment projects. Long term capital is of financing

source to finance fixed assets (net) and working capital (net) in India. Most of the variables are country specific;

researchers call for further detailed research considering sectorial analysis of the constituent sectors of the sample

companies would be shed new light on this area.

Hermes et al. (2007) carried out a comparative study of the Dutch and Chinese firms about capital budgeting practices.

66.7% of the Dutch CFOs stated that they used WACC and only 9.5 % of them used PDCC. Small firms use CD most

often (22.7%) in comparison with larger firms (5.0%). In the Dutch firms, 89% of CFOs reported that they used NPV

methods however, 2% of CFOs stated that they used the ARR which is the least popular method. In contrast, 53.3% of

Chinese firms indicated that they use WACC, and just 15.7% of CFOs of Chinese firms use PDCC. However, 28.9% of

CFOs reported that they use CD which is higher than that of the Dutch counterparts. Chinese CFOs stated that they

more likely to use NPV and PB methods (89% and 84% respectively) in evaluating capital budgeting projects. Thus, on

average, Dutch CFOs use more sophisticated capital budgeting techniques than Chinese CFOs do.

In 2008, Leon et al. conducted a survey of capital budgeting practices of listed companies in Indonesia. DCF was

mainly adopted methods in those companies as primary evaluation tool for capital investment projects. The most

prevalent risk assessment tools were scenario and sensitivity analysis. Results supported that CAPM was not so popular

Recently, a survey of capital budgeting practices have been conducted by Khamees, Al-Fayoumi, and Al-Thuneibat

(2010) in Jordan. Results reported that both DCF and non DCF method were still popular in evaluating capital

budgeting investment. Surprisingly, the most popular method was PI, followed by PB.

Most recently, Maroyi and Poll (2012) conducted a survey of capital budgeting practices in listed mining companies in

South Africa. Results showed that NPV, IRR and PB were the most prevalent methods in evaluating larger investment

projects. Results further indicated that PB was found to be continual use of method. Following table summarizes the

key findings on capital budgeting literature

Table 1. Key findings on capital budgeting studies during last two decades (from 1993 to 2013)

Author/s Population Most popular capital

budgeting method

Methods for evaluating

risk in Capital Budgeting

Drury, Braund & Tayles


300 UK



PBP and IRR Sensitivity analysis.

Babu & Sharma (1995) 73 Indian


DCF Methods Sensitivity analysis and

adjustment of discount

rate methods

Jog & Srivastava


582 Canadian


IRR and PBP Sensitivity analysis

Pike (1996) Large UK



Kester & Chang (1996) 54 companies IRR and PBP Scenario and sensitivity


Farragher, Kleiman &

Sahu (1999)

379 US

companies in the

Standard & Poor‘s

industrial index

DCF Methods : NPV Capital Assets Pricing


Sekwat (1999) 166 Finance

Officers of



Cost-Benefit Ratio and


Kester , Chang,

Echanis, Haikal, . Isa,

Skully, Tsui, & , Wang


226 companies in

Australia, Hong

Kong, Indonesia,

Malaysia, The

Philippines and

Singapore in 1996-


Equal importance to

discounted and

non-discounted cash flow

techniques in evaluating


Scenario analysis and

sensitivity analysis

Arnold & Hatzopoulos


300 UK Companies DCF is widely using by

the selected UK firms.

Hall (2000) 65 Respondents

(South Africa)


Graham & Harvey


392 Chief

CFOs of

companies in the


NPV and IRR Large firms- risk adjusted

discount rate Small firms-

Monte Carlo Simulation

Ryan & Ryan (2002) 205US Companies NPV and IRR Sensitivity analysis,

Scenario analysis,

inflation adjusted cash

flows, economic value

added, and incremental


Sandahl & Sjogren


129 Swedish


PBP Annuity

Lord, Shanahan & Boyd


29 Local authorities

of New Zealand

Local Government

Cost Benefit Ratio

Brounen, deJong &

Koedijk (2004)

Four European

countries viz.,

U.K., France,

Germany and the


consisting of 313

companies during

2002 and 2003

Primary tools were in UK

– NPV and PBP, in

Netherland – NPV and

PBP , France and

Germany reported lower

usages of both methods

(42% for NPV, 50 % for

PB and 44% for NPV,

51 % for PB


Lazaridis (2004) Small Medium

Sized Companies


PBP Statistical Risk Analysis,

Scenario Analysis

Elumilade, Asaolu &

Ologunde (2006)

94 firms from

Nigerian stock

exchange (Nigeria)

PBP, ARR , and NPV Linear programming

Lam, Wang & Lam


157 Hong Kong



PBP and Average

Accounting Rate of


Shortening Payback

Period, Raising Required

Rate of Return

Dedi & Orsag (2007) 200 firms

selected from 400

of the best Croatian

firms & 34 banks

IRR, PBP (cost of capital

is calculated by WACC)

Risk-adjusted discount

rate, Certainty equivalents

for cash flows

from a ranking of



Truong, Partington &

Peat (2008)

87 Australian


NPV, IRR and PBP Real options techniques

have gained a foothold in

capital budgeting but are

not yet part of the

mainstream. Capital

Assets Pricing Model is

found to be the most

popular method used in

the estimation of the cost

of equity capital

Leon, Isa & Kester


229 Listed



DCF Techniques Scenario and Sensitivity


Zubairi (2008) 35 firms listed on

KSE (Pakistan)

Bigger size companies

give greater preference to

IRR, while smaller firms

rely more on NPV.

Also smaller firms are

keener in estimating the

PBP as compared to larger


Verma, Gupta & Batra


100 manufacturing

companies (India)

NPV and IRR Weighted Average Cost of

Capital (WACC) was to

calculate the Cost of

Capital. Sensitivity‘


Hall & Millard (2010) South African


companies listed

on the JSE


Exchange for at

Least ten years.


Dragota. Tatu, ,Pele,

Vintila, & Semenescu


Professors in the

economic field,


competences in

Corporate Finance

and teaching in


NPV, IRR or PI, Discount

Rate used for the

investment projects

analysis is the weighted

average cost of capital.

Sensitivity Analysis ,

Monte Carlo Method and

the Scenarios Technique

Shinoda (2010) 225 firms listed on

the Tokyo Stock



Poudel, Sugimoto,

Yamamoto, Nishiwaki ,

& Kano (2010)

50 Farms (Nepal) Benefit-Cost ratio (B/C),


Sensitivity Analysis

Bennouna,Meredith &

Marchant (2010)

88 Large Firms Trends towards

sophisticated techniques

(DCF) have continued. Of

those which did, the

The majority of Canadian

firms use risk analysis

tools mainly sensitivity

analysis followed by

Source: Survey data

3.7 Factor affecting capital budgeting

In practice, there are numerous factors that heavily influence on capital budgeting decision. Behavioral finance become

increasingly important and intrudes into capital budgeting theory, and the knowledge on behavioral finance derived

from sociology and psychology. The behavioral finance states that capital investment decision is not solely dependent

on quantitative data, but the decision is also strongly influenced by qualitative data including institution and personal

values, tolerance to risk, situational context and so on. More recently, Ben-David, Graham, and Harvey‘s (2008) study

of CFOs found that overconfidence was a key driver of investment, however optimism found to be more marginal effect

on investment. Larrick, Burson, and Soll (2007) found that the degree of individuals overconfident is strongly

associated with their thinking that make them to feel that they are better than average. Overconfident managers

generally prefer to overinvest and the overconfident tends to attract more mergers, starting new firms and initiate more

investment. Similarly, Brown and Sarma (2007) stated that CEO overconfidence affect the frequency of corporate

acquisitions of a firm. If past returns on investment are high, CFOs would become more confident on their estimate o f

future returns. A group of 55 managers working in small firms of computer industry have been studied by Simon and

Houghton (2003). Findings showed that managers with greater overconfidence would prefer to introduce more risky

products and seem to fail many times. In early 1990s, some studies found that managerial overconfidence tends to

innovation (Staw, 1991) and to plant expansion (Nutt, 1993). Glaser, Schafers, and Weber (2008) surveyed senior

managers behavior and they observed that when managers are optimistic, they increase their exposure to firm specific

risk when transaction on invest more and in turn increase investment cash flow sensitivity.

Size of the firm is one of the major determinants in capital budgeting practices (e.g., Ho & Pike, 1992; Graham &

Harvey, 2001; Farragher et al., 2001; Brounen et al., 2004; Verbeeten, 2006). Researches supported that large firms

adopts more innovative capital budgeting methods, say, sophisticated capital budgeting practices, to a large extent than

smaller firms do (e.g., Rogers, 1995; Williams & Seaman, 2001) since the larger firms have the capacity and resources

to use sophisticated capital budgeting practices (Ho & Pike, 1992). Payne et al.(1999) and Ryan and Ryan (2002)

documented that large firms were more inclined to use more sophisticated capital budgeting practices. This is due to the

larger firms involves larger projects and the use of sophisticated capital budgeting practices become less costly (Payne

et al., 1999; Hermes et al., 2007).There was a positive relationship between firm size and the use of DCF methods.

Findings have also been confirmed in Hermes et al.‘s (2007) studies. Trahan and Gitman (1995) connotes that large

companies exploited DCF methods (88 % for NPV and 91 % IRR) than small companies (65% for NPV and 54% for

IRR). It was further confirmed in Segelod‘s (1998) study and he found that major firms uses PB model for evaluating

small investments, however, for the large investment decision at least of the DCF methods is in practice. In 2001,

Graham and Harvey studied about capital budgeting methods and firm size in the U.S. and results showed that there is a

significant negative relationship between size and PB. Brounen et al.(2004) found that company size was positively

correlated with the use of capital budgeting methods, large companies use NPV, IRR, and sensitivity analysis more than

small companies.

majority favored NPV and


scenario analysis and

risk-adjusted discount

rate. Use of real options is

limited (8%).

Kester & Robbins


18 Chief Financial

Officers of

companies listed

on the Irish Stock


More Sophisticated

Discounted Cash Flow


(Weighted Average Cost

of Capital is to evaluate

all proposed capital


Scenario Analysis and

Sensitivity Analysis

Singh, Jain & Yadav


31 listed

Companies (India)

More sophisticated DCF


Sensitivity analysis

Maroyi & Pol l (2012) 13 Companies

Listed in the

Mining Sector of

the Johannesburg


Exchange (JSE).

NPV Real option

Generally, ownership structure has greater influence on any managerial decision making and resultant effect on firm‘s

performance (Warfield,Wild & Wild, 1995; Klassen, 1997). Greater managerial ownership has been identified to be

increased use of recommended capital budgeting methods and thus less likely to experience financial distress (Donker,

Santen & Zahir, 2009). It is oft-reported that what managers actually do they ignore profitability investment (even if it

offers positive NPV), if accounting rate of return is too low, and thus top management willed to sacrifice long term

value to meet accounting targets (Graham, Harvey & Rajgopal, 2005).The ownership sometime classify as listed at the

stock exchange or non listed (Hermes et al., 2007). Listed firms were used accurate estimation of cost of equity ,and

cost of capital and more likely to NPV or IRR than non listed (Hermes et al., 2007).

Nature of the industries were also identified as the determinant of capital budgeting practices, for example financial

services industry and the building, construction and utilities industries, have been interest of using more sophisticated

capital budgeting practices than other industries (Verbeeten, 2006). Further, many empirical researches in the past

showed that capital budgeting practices are different across industries (e.g., Ho & Pike, 1998). For example,

widespread use of real option or game theory are more prevalent in the pharmaceutical industry (e.g., Bowman &

Moskowitz, 2001; McGrath & Nerkar, 2004), the extraction industry (e.g., Trigeorgis, 1993), and the financial services

industry and the high-tech industry (e.g., Billington, Johnson & Triantis, 2003).

Education of CFOs was recognized as the determinant of capital budgeting. There was a general argument that CFO

with higher education has fewer problems in understanding more sophisticated capital budgeting techniques and they

thus have the capacity to use them. For example, in Chinese firms, CFOs with higher level of education use cost of debt

less often in comparison with less educated CFOs. Thus, a positive relationship identified between educational

background of CFOs and the use of sophisticated methods (Hermes et al., 2007). Among the U.S. sample, there was a

positive association has been found between CEO education and use of IRR (Graham & Harvey, 2001) and the findings

has been confirmed in the Netherlands, Germany and France, but not in the UK (Brounen et al., 2004). The reasons for

more widespread use of DCF are the availability of computer software that used in computation (e.g., Pike, 1996) and

increased level of formal education of managers (e.g., Pike, 1996; Sangster, 1993). A few studies found that age of the

CFOs was also a determinant of capital budgeting methods. For example, older CFOs could be reluctant to adopt new

techniques, and instead prefer to relaying on older methods (e.g., Hermes et al., 2007).

Since capital investment involves in long term, uncertainty /risk would play a vital role in capital investment decision

making. Generally, uncertainty refers to as the gap between information available and information required to make any

decision. Complete information is unavailable in long run and thus, uncertainty is the dominant factor in capital

investment (Simerly & Li, 2000; Zhu & Weyant, 2003). Nature and type of uncertainty could be, including raw material

uncertainties, input market uncertainties, labor uncertainties, political uncertainties, production uncertainties, output

market uncertainties, liability uncertainties, interest uncertainties, inflation uncertainties, policy uncertainties, exchange

rate uncertainties, competitive uncertainties and society uncertainties. Uncertainties have been treated with adopting

sophisticated capital budgeting practices, for example, use of ROR and/or GT tools (e.g., Bowman & Hurry, 1993; Zhu

& Weyant, 2003). The main concepts of the ROR demonstrates that specific uncertainties (rather than in general) that

would affect capital budgeting practices (Dixit & Pindyck, 1994). Game theory specifies that the optimal investment

criterion can also be changed by specific uncertainties (Smit, 2003). Thus, specific uncertainties need to be tackled with

using different capital budgeting methods. The research findings supported that sophisticated capital budgeting practices

are crucial and useful if financial uncertainties i.e., exchange rate, interest exist. However, social uncertainties, market

uncertainties, and input uncertainties have not sufficiently supported to influence on use of sophisticated capital

budgeting practices. Rather, theoretical background, many experts in capital budgeting area is expected to offer the

capacity and willingness to adopt contemporary capital budgeting practices (e.g., Libby & Waterhouse, 1996, Williams

& Seaman, 2001). Theory and a few empirical research states that specific uncertainties affect capital budgeting

practices, for example, Ho and Pike (1998) found that there is a positive relationship between socioeconomic

uncertainty (i.e., governmental regulations, trade unions actions) and the application of risk analysis techniques,

however, the empirical evidence on these relationship with sophisticated capital budgeting practices are scarce

(Verbeeten, 2006).

Recognition, assessment and reflection of the risk/uncertainty are intriguing. Nowadays, there are number of risk

analysis method available such as sensitivity analysis, scenario analysis, decision trees, computer simulation and Monte

Carlo analysis. In Graham and Harvey‘s (2001) study, participants recognized market risk and they also reported other

risk factors including interest rate, inflation, size, foreign exchange rate. Surprisingly, they found that at least half of the

firm did nothing to adjust WACC (firm‘s average risk) to incorporate project risk. However, in 1996, Shao and Shao

reported that firms employed more on risk adjusted cash flows than risk-adjusted discount rates. Across their sample,

they found that sensitivity analysis was the principal assessment technique. In contrast, Gitman and Vandenberg (2000)

found in their study that 39 % of firms were adjusting their rates against adjusting risk for cash flows. Through there are

number of sophisticated risk analysis models available, the applicability of those models were prone to barriers. The

reasons for their reluctant have been reported as; it is not practical, depending on unrealistic assumption, difficulties in

explaining to the top management and the difficulties in applying (Trahan & Gitman, 1995). Notwithstanding progress

in risk identification, assessment and adjustment has been reported, none of the studies have not been looked at actual

risk analysis, its process and management inputs to improve or usage of existing risk assessment and adjustment models.

Sophisticated capital budgeting practices would help to identify many different types of investment projects in terms of

uncertainty.A range of risk across the many investment projects would create diversification. Diversification generally

helps to maximize the income from investments at minimum risk. A positive relationship has been found between

diversification and use of sophisticated capital budgeting practices (Verbeeten, 2006). Recently, Holmen and Pramborg

(2009) reported that the use of payback method has been positively combined with political risk.

Klammer (1993), and Shank and Govindarajan (1992) suggested that nonfinancial consideration have been integrated

into capital budgeting practices. For example, corporate management integrated into capital budgeting and thus the

decision depends on some of the strategic management tools such as value chain analysis, cost drivers analysis, and

completive advantage analysis. According to Carr and Tomkins (1996), the most successful companies were found to

be using nonfinancial strategic information in making investment decision among their sample of 51 case studies in the

UK, the U.S., and the German companies. However, it is argued that nonfinancial methods were prevalence when the

firms did not adequately implement DCF methods (Carr &Tomkins 1996). However, any studies have not been carried

out the use of non financial methods linking to DCF analysis. It has been argued that increasing acceptance of DCF

analysis ignores the use of nonfinancial methods (e.g., Graham & Harvey 2001; Ryan &d Ryan, 2002).

Capital budgeting practices are different and may have ―country effect‖ influence. This can be attributed to the some

level of economic factors that determine choice of capital budgeting practices. It is recommend furthering research in

indentifying country effect on capital budgeting practices with respect to the level of economic, human, financial and

technological improvement. Shahrokh (2002) argued that capital budgeting is very complex, determined by many

factors including: terminal values, foreign currency fluctuations, long-term inflation rates, subsidized financing, and

Political risk. In Sekwat‘s (1999) study of capital budgeting practices in Tennessee municipal governments, the decision

in using capital budgeting techniques are based on simple, versatile and flexibility of those techniques. Notwithstanding,

he further argued that the usage of techniques in practices is in conjunction with qualitative factors such as ethical, legal,

or political considerations. He concluded that since government funds the capital projects, political factors plays a

critical role in making capital investment decisions.

3.8 Disparities between capital budgeting theory and practices

Capital budgeting theory recommends in using DCF methods (NPV, IRR, MIRR, PI and DPB) and non DFC methods

(PB and ARR) for making capital budgeting decision. However, all most all the firms in developed and developing

countries inclined to use sophisticated capital budgeting methods along with many capital budgeting tools for

incorporating risk (i.e., sensitivity analysis, real options) and sophisticated discounted rate (i.e., Weighted Average Cost

of Capital, Cost of Debt, CAPM) (e.g., Arnold & Hatzopoulos, 2000; Graham & Harvey, 2001; Ryan & Ryan, 2002;

Cooper et al., 2002; Brounen et al., 2004; Hermes et al.,2007; Bennouna et al., 2010; Maquieira , Preve and Allende,


Nemours factors have been identified as the determinant of capital budgeting during the last two decades including size

of the firm, ownership structure, nature of industries, educational qualification of CFOs, experience of CFOs, age of

CFOs, uncertainty(for example, interest rate, inflation, foreign exchange rate), nonfinancial consideration and other

factors (i.e, economic, human, technology, finance, ethical and political). Among them, some factors (for example, size

of the firm, educational qualification of CFOs, experience of CFOs, age of CFOs) were positively associated with the

use of sophisticated capital budgeting practices. However, in some cases, economic, political and technological factors

directly and indirectly affect choice of the capital budgeting practices. (e.g., Bowman & Moskowitz, 2001; Zhu &

Weyant, 2003; McGrath & Nerkar, 2004; Verbeeten, 2006; Donker et al., 2009). Moreover, the factors determining

capital budgeting practice connotes that to certain extent capital budgeting practice prone to ‗country effect infl uence‘,

for example economic factor, cutting edge technology (i.e., decision support system), political factors, accounting

policies, accounting standards and other infrastructure facilities. Although capital budgeting theory was applicable

regardless of countries, to certain extent the actual practices of capital budgeting (for example selection of capital

investment) vary (e.g., Graham & Harvey, 2001; Shahrokh ,2002).‗In practice uncertainty, information asymmetry,

multiple (conflicting) objectives, real options and multi -period multi project considerations greatly complicate capital

budgeting beyond the focus of the theory‘ (Arnold & Hatzopoulos, 2000, p.609). A consideration of the impact of

information asymmetry, real options and other complications on the capital budgeting exercise gives one the view that

there is no unique correct technique and that there is a need for multiple methods (Arnold & Hatzopoulos, 2000). Thus,

all these factors impinge on choice of the capital budgeting practices, and consequently, there are disparities between

theory and practices.

Studies on the practice of capital budgeting in many countries have found that firms increasingly employ more

sophisticated capital budgeting techniques to make investment decisions over several years (Klammer, 1973; Klammer

& Walker, 1984; Pike, 1988; Klammer, Koch & Wilner, 1991; Jog & Srivastava, 1995; Gilbert & Reichart, 1995;

Farragher et al., 1999;Arnold & Hatzopoulos, 2000; Graham & Harvey, 2001; Mustapha & Mooi, 2001; Ryan & Ryan,

2002; Brounen et al., 2004; Hermes et al., 2007; Truong et al., 2008; Baker et al., 2011; Singh et al., 2012). When

comparing a developed economy with an emerging economy, the developed economy has highly developed capital

markets with high levels of liquidity, meaningful regulatory bodies, large market capitalization, and high levels of per

capita income (Geary, 2012). An emerging market is in the process of rapid growth and development with lower per

capita income, less mature capital markets and very small capital projects, compared with developed countries. Therefore,

obviously, emerging market economies pose challenges in applying capital budgeting techniques, owing to less

developed capital markets and the difficulty of setting key parameters.

3.9 Answering to the research questions: Summary of the findings

It is crucial to answer the research questions in order to attain research aims. The first question enquired about ―what are

the capital budgeting theories and practices used by firms? Are there any disparities between the capital budgeting

theories and practices? If so how?‖ The answers for these questions have been well documented during the last twenty

years of studies. Capital budgeting theory recommends in using DCF methods (NPV, IRR, MIRR, and DPB) and non

DFC methods (PB and ARR) for making capital budgeting decision. However, all most all the firms in developed and

developing countries inclined to use sophisticated capital budgeting methods along with many capital budgeting tools

for incorporating risk (i.e., sensitivity analysis, real options) and sophisticated discounted rate (i.e., WACC, CD, CAPM)

(e.g., Arnold & Hatzopoulos, 2000; Graham & Harvey, 2001; Ryan & Ryan, 2002; Cooper et al., 2002; Brounen et al.,

2004; Hermes et al., 2007; Bennouna et al., 2010; Maquieira et al., 2012). Thus it can be concluded that there are

some disparities between capital budgeting theory and practice. The next research‘s question further backs up to this


The second question asked about ―what are the factors determines the use of capital budgeting practices? Are there

different across countries? If so how?‖ Nemours factors have been identified as the determinant of capital budgeting

during the last two decades including size of the firm, ownership structure, nature of industries, educational

qualification of CFOs, experience of CFOs, age of CFOs, uncertainty(for example, interest rate, inflation, foreign

exchange rate), nonfinancial consideration and other factors (i.e, economic, human, technology, finance, ethical and

political). Among them, some factors (for example, size of the firm, educational qualification of CFOs, experience of

CFOs, age of CFOs) were positively associated with the use of sophisticated capital budgeting practices. However, in

some cases, economic, political and technological factors directly and indirectly affect choice of the capital budgeting

practices. (e.g., Bowman & Moskowitz, 2001; Zhu & Weyant, 2003; McGrath & Nerkar, 2004; Verbeeten, 2006;

Donker et al.,2009). Moreover, the factors determining capital budgeting practice connotes that to certain extent capital

budgeting practice prone to ―country effect influence‖, for example economic factor, cutting edge technology (i.e.,

decision support system), political factors, accounting policies, accounting standards and other infrastructure facilities.

Although capital budgeting theory was applicable regardless of countries, to certain extent the actual practices of capital

budgeting (for example selection of capital investment) vary (e.g., Graham & Harvey, 2001; Shahrokh , 2002). Thus, all

these factors impinge on choice of the capital budgeting practices, and consequently, there are disparities between

theory and practices.

The last question asked about ―what are the gaps in the existing capital budgeting literature?‖ Traditional financial

theory suggests that the decision makers are rational, however, modern theory suggests that decision have influenced by

many cognitive illusions (Leon et al., 2008; Tayib & Hussin, 2011). Thus behavioral finance came into play in capital

budgeting decision making. Capital budgeting research connected with behavioral finance have not been studied any

developing countries during the last twenty years. Literature says behavioral finance is a dominant theory determining

capital budgeting decision, confirmed in many studies carried out in developed countries. Thus, there is a complete

dearth of research in Asian studies in case of behavioral finance penetration on capital budgeting practices.

No studies have been attempted to identify relationship between supportive capital information system (software

products to make the required analysis easier in comparison with manual system) and capital budgeting decision

making. Thus it has been identified as a gap between information system and choice and practice of capital budgeting

(Bennouna et al., 2010). Similarly, the environment in which organization are working impact on quality decision. Thus,

researcher should concentrate on scanning organizational environment to make good investment decision rather purely

depends on financial theory. Thus it is paramount important in the current context.

Almost all the research carried out during the last two decades adopted limited methodological aspects. For example,

cross sectional research design, case study and some form of qualitative study were more popular (e.g., Butler et al.,

1993; Verbeeten, 2006; Hermes et al.,2007; Maquieira et al., 2012). However, in modern world, some form of event

study methodology would be seminal for providing greater insights into capital budgeting practices. Thus, a gap has

been identified in use of methodological concepts.

Renowned researchers found that nowadays most of the large companies are inclined to use sophisticated capital

budgeting practices. However, it is intriguing question whether SCBP are important to all types of investment (e.g.

expansion, replacements, mergers and takeovers) and all type of industries, and those techniques outperform than non

SCBP. Thus, these conundrums need to be well investigated.

Many research scholars have argued that capital budgeting influenced by ―country effect influence‖ (e.g., Graham &

Harvey, 2001; Shahrokh, 2002; Hermes et al., 2007), for example, economic policies, taxation system, accounting

policies, conductive social climate, culture of people, technological factor (i.e., decision support system), government

control, political factors, infrastructure facilities. Therefore, more extensive studies are imperative from unsearched

countries to build robust knowledge.

Many studies conducted in developed counties have found that firms use more sophisticated capital budgeting practices

(Graham & Harvey, 2001; Brounen et al.,2004). Nonetheless, when comparing with developed countries, more

sophisticated capital budgeting practices are not prevalent in developing countries. Thus, future research scholars need

to consider the challenges faced by CFOs with regard to the use of sophisticated capital budgeting practices (i.e.

organizational barriers/knowledge gap of CFOs, technological challenges) as they lead to increased performance.

Another opportunity for future research is the investigation of other organizational characteristics (e.g. business unit

strategies, reward and incentive structures, distribution of decision rights and financial structure) that have been shown

to affect capital budgeting practices. Renowned researchers have found that nowadays, most large companies are

inclined to use sophisticated capital budgeting practices (SCBP).

3.10 Policy recommendation

Many research scholars criticized that many researches on capital budgeting were opt-testing the methods of capital

budgeting and its practices. They were purely finding that actual what methods were in practice. However, in practice,

there are enormous factors affecting the capital budgeting practice and it has ―country effect‖ too. In line up with this

argument, this research was well thought out in its design and become springboard for future research. This study

contributed by stating the known and unknown arena of capital budgeting during the last two decades.

In the cutting edge technology world, the way of doing things have been changed and challenging. For example,

decision support system become more prevent in making decision and more advanced technological sphere penetrates

into assessing capital budgeting practices than ever before. Thus, this research would make awareness to top

management, policy makers, practitioners and stakeholders of the company.


This work is licensed under a Creative Commons Attribution 3.0 License.

Use the following data set to answer the following questions. To earn full credit show all of your calculations and other work. Explain your answers. Don’t just write a number.

The 26 students who signed up for General Psychology reported their GPA. Each person was matched with another person on the basis of the GPAs, and two groups were formed. One group was taught with the traditional lecture method by Professor Nouveau. The other class could access the Web for the same lectures whenever they wished. At the end of the term, both classes took the same comprehensive final exam, and they also filled out a "Satisfaction Questionnaire." Scores on both measures are shown below.

Analyze the data with t tests and effect size indexes. Write a conclusion.

You can use the JASP Software to perform your analysis. Make sure you include the analysis output in your submission. Also, explain your results in detail.

Comprehensive Final Exam Scores



Traditional Section

Online Section

Traditional Section

Online Section





















































