Week 10: Permutational techniques

Objectives

The aim of this component of the practical series is to introduce you to several common techniques for analysing multivariate data. At the end of them you will:

Be familiar with the principles underpinning principal components analysis (PCA), and non-metric multidimensional scaling.
Be familiar with the principles underpinning multivariate hypothesis testing using permutational techniques such as ANOSIM and PERMANOVA.
Planning and conducting experiments to test multivariate hypotheses.
Know how to perform these analyses using relevant software.
Be able to interpret, present and report on these analyses.

1 Outline

For this specific practical you will:

Present group presentations of model systems and experimental design.
Analyse sample data in the software package PRIMER or R to:
- visualize data using cluster analysis and nMDS.
- test one and two factor multivariate hypotheses using ANOSIM and PERMANOVA with a focus on permutational techniques (PERMANOVA).
- determine variables contributing to the differences between groups using SIMPER.

2 Analyses covered in practical 2

nMDS and multivariate hypothesis testing (ANOSIM, PERMANOVA and SIMPER) – Mainly used to produce an ordination in order to visualize patterns in the multivariate cloud of data (by reducing dimensions in some way) and also to rigorously test explicit hypotheses concerning those patterns by reference to a priori groups or relationships with predictors, such as environmental variables.
ANOSIM –- testing hypotheses comparing different groups based on their similarity (Among c.f. within) using distance measures. Most effective with simple designs.
PERMANOVA –- more powerful approach to multivariate hypothesis testing using permutational methods, capable of analysing more complex designs.
SIMPER –- used when a significant difference is found to identify which components of the data set are driving the differences (e.g. the contribution each species makes to differences among communities.

3 Community structure/Assemblage structure

The process for looking at community structure is straight forward. We will use both cluster analysis and nMDS/ANOSIM/SIMPER to do this. This involves;

Transforming/standardising your data as appropriate.
Calculating Bray Curtis dissimilarities (you could use something else from BC if the data are suitable) and generating dissimilarity matrix.
Performing MDS (and follow on below).
PERMANOVA (or ANOSIM) to test for difference between treatments.
SIMPER for identifying which variables contribute to the differences.

4 Using PRIMER (R instructions are given on Canvas within an R Studio files)

4.1 PERMANOVA, SIMPER and MDS analysis with PRIMER

Open the Excel document you wish to analyse.
Open PRIMER
In PRIMER go to Open and select Excel file you wish to analyse.
Select worksheet to be analysed. Keep “Data Type” as Sample Data. Clear button “Title”.
Check to ensure you have the correct number of rows and columns and all looks well.
Go to Edit, go to Factors, go to Add and name your new factor(s) appropriately, code each site for that factor and press OK.
Go to Pre-treatment and then to Transform (overall), and select an appropriate transformation for your data. (4th root or presence/absence, etc¹.)
To make an nMDS plot: Go to Analyse then Resemblance and click ‘analyse between samples’ and ‘Bray-Curtis similarity’. Go to Analyse then to Non-metric MDS, change number of restarts to 20 and select configuration plot. You can modify how the nMDS plot is displayed by right clicking on the figure.
To perform a PERMANOVA you need to firstly create a PERMANOVA design, and then run the PERMANOVA you have designed
1. To create a PERMANOVA design: Return to the resemblance matrix. Go to PERMANOVA+ then Create PERMANOVA design. Create a title for the design and select appropriate number of factors. Enter in each factor you want to incorporate into the design, and code as either fixed or random factor. Complex designs can be created at this point.
2. To run the PERMANOVA: Return to the resemblance matrix. Go to PERMANOVA+ then PERMANOVA. Select the correct design (usually the highest number). Select Mains test and your permutational method (reduced model is usually fine), the sum of squares type (type III) and the number of permutations (999 recommended) then OK.
Interpret the PERMANOVA output. If one or more of your factors are significant, then you run the PERMANOVA design again, selecting Pair-wise test and select the factor/interaction of interest. Press OK.
If appropriate, return to transformed data set and go to Analyse and then SIMPER. Select appropriate factor and click OK.
Save result(s) to a file.

4.2 ANOSIM

As an alternative to PERMANOVA, many studies will test differences between treatments using Analysis of Similarities (ANOSIM). Unlike PERMANOVA which can be used for multi-factorial designs, ANOSIM is best used for simple one-factor analyses. PERMANOVA and ANOSIM will generate the same (or very similar results) for one-factor analyses.

4.2.1 To perform an ANOSIM

Return to resemblance matrix, go to Analyse then to ANOSIM, select appropriate factor which you added before.

Significance level in % in ANOSIM becomes probability if divided by 100. e.g. 0.2 ÷ 100 = 0.002 = P (in this case P < 0.05). If ‘Significance level of sample statistic’ in ‘Global Test’ section of the output is more than 5 (which is > 0.05), than there is no significant difference among any of the treatments even if ‘Pairwise Tests’ section has significant differences between pairs of particular treatments. ANOSIM results are presented as: (P = 0.076, Global R = 0.148, 999 permutations) where P is a global significance level.

5 For next week

Our first steps involve organizing the data so that we can get it into our analytical software (SPSS, PRIMER, R etc,) and generating summary statistics and figures. Different analytical software requires slightly different formats but there are usually some generalities that apply across programs.

For the start of next week’s practical each group will need to have collated their group data into three spreadsheets; These spreadsheets will be marked out of 5 by demonstrators.

Habitat data: an SPSS or R ready excel file with sites as rows and habitat variables as columns (see below)
Assemblage/Community data: a PRIMER or R ready excel file with species ID as rows and sites as columns (see below)
Summary statistics: additional excel files as above (1 and 2) but with summary statistics (numbers of spp., individuals, habitat variables etc.) calculated for your data. This may be done on a site specific or treatment specific basis depending on your question.

5.1 1. Your habitat data for PCA

Habitat data – You will need to enter your data on habitat characteristics into SPSS or R for analysis next week. Make sure you are familiar with how to organise it for SPSS. Every site needs its own row where every habitat variable measured is listed, i.e. every site assessment has an independent row. If you have treatments you need to include a column labeling those treatments too (in addition to “site name”).

5.2 2. Organising and entering assemblage data for PRIMER

Assemblage/Community data – You will need to generate your spp*site matrix into a PRIMER ready file so that we are ready to analyse your data at the labs next week. These will take the form of a spreadsheet with a single column identifying species (unique species identifiers are fine) with the abundances for each site in rows (page 22). This sheet must start in the top left cell of that worksheet with only a single row of column labels. Make sure that you keep a reference file where you have recorded the specific identity of the unique species identifiers (e.g. A= Toyota Corolla, B= Leyland P76 etc.).

5.3 3. Generating summary statistics for additional (univariate) analyses

On an additional spreadsheet/workspace, you should calculate summary statistics to perform additional univariate analyses. Create a copy of your completed excel file and or worksheet. You should generate a column in the far right of your spreadsheet summing the number of things in each row. You can then sort the data (REMEMBER TO SELECT IT ALL!) by that column and delete all the zero rows (i.e. no evidence of that species in your system).

Owing to the high degree of variation and the number of singletons you will encounter it is possible (and likely) that some groups will pool species into more coarse groupings (i.e. “genera” or “family”).

The easiest way to calculate some of these summary statistics will be to use the pivot table command in Excel to sort your data for you. To do this:

Highlight the dataset you want to use
Choose Pivot Table from the Data menu
Work through the wizard creating a pivot table in a new worksheet
Then change the drop down menu in the box to “to data area” and click on the different sites to add them.

You should have a functional summary pivot table of your data. You can change the pivot tables to look at “counts” (equivalent of presence absence data) or “sums” (abundance data) for your spreadsheet.

Figure 1: Example of data sheet for importing data into SPSS for PCA

Figure 2: Example of data sheet for importing data into PRIMER

Footnotes

When you are analyzing your group data (week 11), you will be able to apply multiple transformations to your data. I’d encourage you to use a range of them to see how it affects your data. Note that PRIMER will be reluctant to undertake any analyses when data are not transformed!↩︎