Week 9: PCA and Report 2
Objectives
The aim of this component of the practical series is to introduce you to several commonly used techniques for analysing multivariate data. By the end of the series, you will:
- Understand the principles underlying Principal Components Analysis (PCA) and Non-metric Multidimensional Scaling (nMDS)
- Be familiar with the concepts behind multivariate hypothesis testing using permutational techniques such as ANOSIM and PERMANOVA
- Be able to plan and conduct experiments that test multivariate hypotheses
- Know how to carry out these analyses using relevant statistical software
- Be able to interpret, present, and report your findings clearly and effectively
In this specific practical you will:
1 What you need to submit for this module
This assessment involves two separate submissions via Canvas:
Group Data Submission (Excel File)
- Submit one Excel file containing your group’s:
- Species (assemblage) data
- Habitat data
- Only one group member needs to upload the file on behalf of the group.
- Due: By 10:00am on the day of your Week 11 practical session
- Worth: 5% of your total unit mark
Make sure the file is clearly organised, with each dataset on a separate worksheet or in separate files, and correctly formatted for analysis in PRIMER or Jamovi.
- Submit one Excel file containing your group’s:
Individual Report
- Submit your individual report based on your group’s study.
- Format: A short scientific article (see assessment guidelines on Canvas for details).
- Worth: 15% of your total unit mark.
- Due date: Refer to the Assessment on Canvas and the Unit Outline.
1.1 Analyses covered in practical 1
Principal Components Analysis (PCA) is a technique used to reduce the dimensionality of a dataset by extracting a smaller set of new variables (principal components) that capture as much of the variance in the original data as possible. These components summarise complex patterns across multiple variables, making the data easier to visualise and interpret.
2 Part 1: Principal components analysis – “perceptions of biology lecturers”
You will use PCA to reduce your set of measured variables into a coherent smaller dataset. In addition, you will be comparing your lecturer perceptions between preferred systems (marine, freshwater or terrestrial), preferred taxon (animals or plants) and gender (male or female) using conventional univariate approaches (t-tests/ANOVAs). To perform these univariate approaches, make sure that you not only create the principal components (PCs) but also save the scores (see below) for each component.
The process in JAMOVI is listed below. R instructions are given on Canvas in an R script file.
- Import your data. Download it here. If the link does not work, please go on to Canvas. A new Module 3 Resurces link has been added to the front page.
- Open Jamovi.
- Go to the top-left menu (☰) → click Open → Browse and select your Excel/csv file (e.g. your lecturer dataset).
- Once imported, check that all the columns (variables) are correctly recognised as numeric (look for a ruler icon in the column headers).
- You may convert columns of interest to numeric in several ways. One way is to use the Variables tab to select all the variables of interest (use Shift+click or Ctrl+click), then click on Edit and change “Measure Type” to “Continuous”.
- Run Principal Components Analysis
- Click on the “Factor” tab in the top toolbar.
- Select Principal Component Analysis.
- Select your variables
- In the *new section, from the panel on the left, select the variables of interest and move them into the “Variables” box.
- You can do this by either clicking on the arrow button or by dragging and dropping them.
- Method and assumption checks
- Under the “Methods” section, set Rotation to “Varimax” (this is the most common rotation method).
- Tick the box for Bartlett’s test of sphericity (to check if your data is suitable for PCA) in Assumption checks.
- Number of components
- Select “Based on eigenvalue”.
- The default is to retain components with eigenvalues > 1 (Kaiser’s criterion). Change that if you have a specific reason to do so.
- Additional Output
- Select all of the options in this section. These will provide you with useful information to interpret your PCA results.
- Save component scores
- Tick “Save component scores to data set” (usually found under the Scores section at the bottom).
- This will add new variables to your dataset representing each component.
- You can view the new variables by clicking on the Data tab at the top of the screen and scrolling to the right.
After PCA: Review Your Output
- Scree Plot: Helps determine how many components to retain.
- Saved Scores: You can now use these component scores in further analyses (e.g., regression or cluster analysis).
- Bartlett’s Test of Sphericity tests whether your correlation matrix is significantly different from an identity matrix—a matrix where all off-diagonal values are zero (no correlation between variables), and all diagonal values are one (each variable only correlates with itself). Bartlett’s Test should be significant (p < .05): a significant result means that your correlation matrix is not an identity matrix, and PCA can be used.
That said, this test is most useful when used alongside visual inspection of your correlation matrix. If many variables aren’t correlated with others, they may not contribute much to PCA and could be removed.
Interpreting the PCA
You need to:
- Look at the correlation matrix.
- Examine the scree plot and determine how many PCs you can identify (eigenvalues >1, Kaiser’s Criterion).
- Establish a meaningful name for each component based on the loadings for individual variables in the rotated solutions.
After the PCA
Examine whether gender, system biases (marine v. terrestrial) and taxon (animal v. plant) biases may affect the perceptions of biology lecturers. You will need to code the columns for each factor (i.e. males = 0, females = 1) for Jamovi to perform its standard one-way ANOVAs or t-tests.
Questions to consider:
- Do males and females perceive lecturer qualities the same way?
- Do botanists and zoologists perceive lecturer qualities the same way?
- Do marine and terrestrial folk perceive lecturer qualities the same way?
3 Part 2: Designing your multivariate experiment
The data and your model system
Biologists are increasingly turning to multivariate approaches because the questions we ask—and the data we collect—are often inherently multivariate. This is especially true when studying both biotic assemblages and abiotic habitat variables.
To explore these analytical techniques, you will work in groups to generate your own multivariate datasets using non-biological model systems. Examples include:
- Cars and the car parks they inhabit
- Gargoyles and the buildings they reside on
- Beer types and the drinking establishments in which they appear
These creative analogues allow you to test and visualise multivariate patterns in a controlled and accessible way.
Example: Cars as a Model System
- Cars have been successfully used as a teaching model for ecological analysis (e.g., Gaston et al. 1993). In this context:
- Different car types can be treated as “species”.
- Different suburbs or locations represent “sites”.
- The car park characteristics (e.g., size, surface, signage) act as “habitat variables”.
A sample proforma will be provided to illustrate how such a system can be structured to generate a multivariate dataset with testable hypotheses. See the Module 3 Resources section on Canvas to download the template.
Group work
As a group, you will:
- Design your own non-biological multivariate model system.
- Develop 2–3 testable hypotheses relating to:
- The assemblages of “species” across different treatments (e.g., locations, types of environments),
- The role of measured “habitat variables” in shaping those assemblages.
- Use a two-factor design where possible (e.g., site × time, or suburb × establishment type).
- Ensure adequate replication—you will need at least 3 replicates per treatment.
- Design a sampling strategy that allows you to collect the required data.
- Assign a data coordinator within your group to manage dataset preparation.
4 Week 10 group presentation (compulsory)
During the Week 10 practical session (Week 2 of the prac series), your group will deliver a brief presentation (<5 minutes) using a PowerPoint to introduce your project.
This session is compulsory and designed to provide constructive feedback before you begin data collection. Don’t worry—feedback is friendly and intended to improve your design, not critique your creativity!
Your presentation should include:
- An introduction to your system (model species and treatments)
- Your 2–3 hypotheses
- A description of your taxonomy (how you’re classifying your “species”)
- Your sampling design, including sample size and sampling effort
- A list of habitat variables you will measure at each site (and how you will measure them)
- A simple map of your sites
5 Assessment overview
To give you the maximum opportunity to analyse your data and understand your results, your group’s dataset will be checked during Week 11 by demonstrators.
5.1 Group data submission (5%)
- This is a group assessment: only one group member needs to upload the data file (Excel format) to Canvas on behalf of the group.
- The dataset must be submitted before your Week 11 practical session.
- It is worth 5% of your total unit mark.
- Near-perfect datasets that are well-organised, correctly formatted, and ready for analysis will receive the full 5%.
- As this is a group submission, Special Consideration, Simple Extensions, and Academic Plans do not apply.
5.2 Individual report (15%)
Each student will submit an individual short report, focusing primarily on the Methods and Results sections, as outlined in the attached marking rubric.
- The report must include:
- A brief Introduction with clearly stated hypotheses
- A detailed Methods section, describing your data collection and analyses
- A well-presented Results section with appropriate tables, figures, and descriptions
- You are not required to invent a detailed rationale or references for your study, although you are welcome to include them if you wish.
- Your report must be written in the format of a manuscript suitable for submission to Austral Ecology.
- Length limit: Maximum of four single-spaced pages (not including the cover sheet, title/abstract page, references, tables, or figures)
- Emphasis will be placed on:
- Clear explanation of analytical methods
- Accurate description of the data collected
- Clear and concise presentation of results
6 Report submission details
Your individual report is due by 11:59pm at the end of Week 13. Please ensure it is submitted electronically and anonymously via Canvas.
We aim to return reports with summative feedback approximately one to two weeks after submission.
6.1 Academic integrity
You are expected to complete your report independently and uphold the principles of academic honesty.
- Plagiarism or copying from others will result in loss of marks and may lead to further disciplinary action.
- Refer to the University’s policies on academic integrity: Academic Integrity at the University of Sydney
6.2 Use of ChatGPT or similar tools
- You may use tools like ChatGPT to check grammar or assist with R coding.
- However, extensive use of AI to write your report will be detectable and penalised. We are looking for your own scientific reasoning and interpretation.
6.3 Report Format
Your report should follow the format of a short scientific article suitable for submission to Austral Ecology. The “Notice to Contributors” is available at: Austral Ecology – Author Guidelines
6.4 Marking scheme
Introduction (use your imagination) – 10 marks
- Aims of the experiment and hypotheses being tested
- Why is the study important and/or interesting
- Study system – species, location of experiment (brief)
- Welcome to use imagination with references and background, but will not be penalised if you are unable to provide real examples.
Materials and Methods – 30 marks
- Set-up of experiment/survey
- Data / variables recorded and how
- Sampling design and justifications for analyses
- Statistical analyses used, including which are dependent/independent variable(s) (if applicable), unit of replication, actual test(s) and statistical package(s)
Results – 40 marks
- Statement of results (data and stats). Refer to tables and figures as needed.
- Summary presentation of data (no raw data) as tables, figures or in text as appropriate. Use tables for large amounts of data where detail is important, use figures to illustrate patterns. Indicate sample size and errors, and note whether the latter are standard errors or standard deviations. Format consistently.
- Evidence of thought in choices of analyses.
- Results of statistical analyses presented either in text, on figures, or in tables.
- Do not screenshot figures of tables
Discussion – 10 marks
- Main conclusions or findings
- Interpretation of results – what do they mean? How do they relate to original aims?
- Limitations of approach or methods
- Future studies – what would you do next to improve what you did and/or extend the understanding of the subject?
General
- Brief Abstract at start of report
- Written in plain English
- Presentation (format/length)
- No screenshotting of figures and tables
- References if used and cited correctly
6.5 References & resources
You are encouraged to consult the references from the annotated reading list provided with the lectures on the unit website. Key references include:
- Gaston, K. J., Blackburn, T. M., & Lawton, J. H. (1993). Comparing animals and automobiles – a vehicle for understanding body size and abundance relationships in species assemblages. Oikos, 66, 172–179.
- Quinn, G. P., & Keough, M. J. (2002). Experimental Design and Data Analysis for Biologists. Cambridge: Cambridge University Press. (Relevant chapters)
- Quinn, G. P., & Keough, M. J. (2023). Experimental Design and Data Analysis for Biologists (2nd ed.). Cambridge: Cambridge University Press. (Relevant chapters)