Disclaimer: This is a very long post with a fair amount of statistical talk. I'll try my best to simplify some of it a bit and explain a few things, but if you don't like statistics, this post may not be for you! There is a TL;DR at the bottom condensing the overall findings into a more concise format.
Sorry to bump this particular thread, but following a really interesting topic I read on another forum earlier about 2012 vs 2023 in terms of UK theme park attendance and what effect various factors have on attendance, I was inspired to revisit this dataset and explore the relationship between UK theme park attendance and various extrinsic factors, as these have previously raised some interesting discussion points surrounding the topic of UK theme park attendance.
Before I explore various different factors and their effect upon UK theme park attendance, I should firstly set out that the attendance I use is the
combined attendance of all four theme parks from 1997 (the first year where all four are listed under their current guise) through until 2021. I tried all of my tests for the dataset including both 2020 and 2021, the dataset excluding 2020 only and the dataset excluding both 2020 and 2021, as I felt that the circumstances of 2020 in particular were too anomalous not to consider and I was unsure whether to even place 2021 among "normal" years, as the parks were still restricted to some degree for part of or all of the season. As such, I tested the data both including and excluding the COVID years, so that we could see the relationships exhibited pre, during and post COVID.
To test out whether a significant causal relationship exists between two variables, I used a Pearson correlation coefficient test, and the two metrics I used to determine this were the correlation coefficient itself and the p-value. To explain what each is:
- The correlation coefficient is a number between 1 and -1 that denotes how strong the causal relationship between two variables is. A correlation coefficient of 1 indicates a perfect positive correlation (i.e. "as x increases, y also increases"), a correlation coefficient of 0 indicates no correlation (i.e. "x has no significant effect on y"), and a correlation coefficient of -1 indicates a perfect negative correlation (i.e. "as x increases, y decreases"). As it is staggeringly rare to have a perfect correlation, I will denote the strength of the correlation by using the absolute value of the correlation coefficient as follows; an absolute coefficient of 0-0.25 indicates no significant correlation, an absolute coefficient of 0.25-0.5 indicates a weak correlation, an absolute coefficient of 0.5-0.75 indicates a moderate correlation, and an absolute coefficient of 0.75 or higher indicates a strong correlation.
- The p-value is the probability that a relationship does not exist, and it is a decimal falling between 0 and 1. It can be represented as a percentage; for instance, a p-value of 0.55 indicates that there is a 55% chance of a relationship not existing. In hypothesis testing, you want the p-value to be low if you are wanting to prove your hypothesis (in this case, that a causal relationship exists) and disprove the null hypothesis (in this case, that no causal relationship exists). I will denote how strong the evidence for a causal relationship is by using the p-value as follows; a p-value of 0.1 or higher indicates insufficient evidence in favour of a relationship, a p-value of 0.05-0.1 indicates marginally significant evidence in favour of a relationship, a p-value of 0.01-0.05 indicates significant evidence in favour of a relationship, and a p-value of less than 0.01 indicates extremely significant evidence in favour of a relationship.
Now I've explained some of my processes, let's move onto the analysis! The first external factor I tested out is one that has been particularly relevant this summer... it's that good old chestnut known as
the weather!
The Weather
The weather is often referenced as a factor that could potentially be affecting UK theme park attendance, so I thought; why not test that theory out? Now, I hear you asking "Matt, there are so many different metrics of weather; which one did you test out?". That would be a fair question, and in answer, I tested out three different weather metrics; average rainfall in millimetres, average maximum temperature in degrees Celsius, and average number of hours of bright sunshine. To gain the relevant weather data, I took the months between April and October (the 7 months in which the parks are operating for the full month) for each metric for every year since 1997 from the Met Office weather data archive (
https://www.metoffice.gov.uk/research/climate/maps-and-data/uk-and-regional-series). I averaged out the values for the months from April-October of a given year and used that as that year's value for a given metric. I set the region as "England"; as all four Merlin parks are in England, I figured that the weather in Wales, Scotland and Northern Ireland was irrelevant for this particular investigation.
Average Rainfall (in millimetres)
So for our first metric, average rainfall in millimetres, the distribution of the data including 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.44 (2dp) |
Pearson Correlation Coefficient | -0.16 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
The distribution of the data excluding 2020 only was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.26 (2dp) |
Pearson Correlation Coefficient | -0.24 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
The distribution of the data excluding both 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.26 (2dp) |
Pearson Correlation Coefficient | -0.25 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | Weak Negative Correlation |
So I think we can conclude that even though some signs of a weak negative correlation between the two are shown when you remove 2020 and 2021 from the equation, the overall evidence for a significant causal relationship between UK theme park attendance and average rainfall is weak; there isn't enough evidence to firmly argue in favour of a causal relationship, even if some signs point towards a weak negative correlation.
Average Maximum Temperature (in degrees Celsius)
For our second metric, average maximum temperature in degrees Celsius, the distribution of the data including 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.56 (2dp) |
Pearson Correlation Coefficient | -0.12 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
The distribution of the data excluding 2020 only was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.95 (2dp) |
Pearson Correlation Coefficient | 0.01 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
The distribution of the data excluding both 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.96 (2dp) |
Pearson Correlation Coefficient | 0.01 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
So having tested the data both including and excluding the COVID data, I think it's safe to say that the chances of a significant causal relationship existing between UK theme park attendance and average maximum temperature are very, very slim. With a correlation coefficient of close to 0 once COVID data was removed, there is no compelling evidence in favour of a causal relationship existing.
Average Number of Hours of Bright Sunshine
For our final weather metric, average number of hours of bright sunshine, the distribution of the data including 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.53 (2dp) |
Pearson Correlation Coefficient | -0.13 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
The distribution of the data excluding 2020 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.45 (2dp) |
Pearson Correlation Coefficient | 0.16 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
The distribution of the data excluding both 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.47 (2dp) |
Pearson Correlation Coefficient | 0.16 (2dp) |
Evidence In Favour of a Relationship | Insufficient |
Correlation Strength | No Significant Correlation |
So there is limited evidence in favour of a causal relationship between UK theme park attendance and the average number of hours of bright sunshine. The evidence there is leans positive, but that evidence is too limited to conclude even a weak correlation, and there certainly isn't enough evidence in favour of a causal relationship.
So in conclusion, then, the weather seemingly has less of an effect on UK theme park attendance than you might expect. The strongest evidence for a causal relationship between UK theme park attendance and any weather metric is presented by average rainfall, which shows some signs of a weak negative correlation, but even that presented insufficient evidence in favour of a significant causal relationship.
Weather is not the only external factor I explored, however. With our purse strings getting tighter as a result of the cost of living crisis, I thought that
the economy would also be an interesting one to explore!
The Economy
With disposal incomes currently being lower across the country as a result of the cost of living crisis and rampant inflation, many have figured that the cost of living crisis may be having an effect on theme park attendance, so I thought that I'd test out some macroeconomic factors too.
In terms of the economic indicators, I tested; I tested three different ones. The first indicator I tested was annual GDP growth rate, with the figures being gained from this site (
https://www.macrotrends.net/countries/GBR/united-kingdom/gdp-growth-rate). GDP, standing for Gross Domestic Product, is a measure of the UK's economic output, and high GDP growth is often seen as a sign of a healthy economy. Our politicians frequently talk about "growth", anyhow! The second indicator I tested was annual CPI inflation rate, with the figures being gained from this site (
https://www.rateinflation.com/inflation-rate/uk-historical-inflation-rate/). CPI stands for Consumer Price Index, and the rate of CPI inflation is a measure of how much something such as a weekly shop is rising in cost by across a given time period. It's the figure used when newsreaders talk about inflation, and high CPI inflation is often seen as a bad sign for the state of the economy. The final indicator I tested was annual unemployment rate, with the figures being gained from this site (
https://www.macrotrends.net/countries/GBR/united-kingdom/unemployment-rate). High unemployment rate is often seen as a sign of an unhealthy economy.
Annual GDP Growth Rate (%)
For our first economic metric, annual GDP growth rate, the distribution of the data including 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.02 (2dp) |
Pearson Correlation Coefficient | 0.47 (2dp) |
Evidence In Favour of a Relationship | Significant |
Correlation Strength | Weak Positive Correlation |
The distribution of the data excluding 2020 only was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.02 (2dp) |
Pearson Correlation Coefficient | -0.47 (2dp) |
Evidence In Favour of a Relationship | Significant |
Correlation Strength | Weak Negative Correlation |
The distribution of the data excluding both 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.05 (2dp) |
Pearson Correlation Coefficient | -0.41 (2dp) |
Evidence In Favour of a Relationship | Marginally Significant |
Correlation Strength | Weak Negative Correlation |
So I think we can conclude that there is some evidence in favour of a causal relationship between UK theme park attendance and annual GDP growth. All tests yielded at least marginally significant evidence in favour of a relationship, and all tests suggest a weak-to-moderate negative correlation once 2020 is removed. Thus, we can conclude that a relationship may exist, but it might not be the strongest.
Annual CPI Inflation Rate (%)
For our second economic metric, annual CPI inflation rate, the distribution of the data including 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.02 (2dp) |
Pearson Correlation Coefficient | 0.47 (2dp) |
Evidence In Favour of a Relationship | Significant |
Correlation Strength | Weak Positive Correlation |
The distribution of the data excluding 2020 only was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.01 (2dp) |
Pearson Correlation Coefficient | 0.50 (2dp) |
Evidence In Favour of a Relationship | Significant |
Correlation Strength | Moderate Positive Correlation |
The distribution of the data excluding both 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.01 (2dp) |
Pearson Correlation Coefficient | 0.55 (2dp) |
Evidence In Favour of a Relationship | Extremely Significant |
Correlation Strength | Moderate Positive Correlation |
So I think we can conclude that there is pretty significant evidence of a causal relationship between UK theme park attendance and annual CPI inflation rate. Once 2020 was removed, a moderate positive correlation between the two variables was consistently exhibited, and the evidence in favour of a relationship toed the line between significant and extremely significant, so I think it's fair to suggest that there could well be a link!
Annual Unemployment Rate (%)
For our final economic metric, annual unemployment rate, the distribution of the data including 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.01 (2dp) |
Pearson Correlation Coefficient | 0.51 (2dp) |
Evidence In Favour of a Relationship | Extremely Significant |
Correlation Strength | Moderate Positive Correlation |
The distribution of the data excluding 2020 only was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.00 (2dp) |
Pearson Correlation Coefficient | 0.61 (2dp) |
Evidence In Favour of a Relationship | Extremely Significant |
Correlation Strength | Moderate Positive Correlation |
The distribution of the data excluding both 2020 and 2021 was as follows:
And the values returned after a Pearson correlation coefficient test to test for a relationship were as follows:
P-Value | 0.00 (2dp) |
Pearson Correlation Coefficient | 0.60 (2dp) |
Evidence In Favour of a Relationship | Extremely Significant |
Correlation Strength | Moderate Positive Correlation |
So I think we can conclude that the evidence for a causal relationship between UK theme park attendance and annual unemployment rate is fairly strong. All tests produced extremely significant evidence in favour of a relationship existing, and once 2020 was removed, the correlation coefficient was quite comfortably in the realms of a moderate-to-strong positive correlation. Thus, I think we can conclude that there may be a link between UK theme park attendance and annual unemployment rate!
Now we've analysed the data, I think it's about time we wrapped things up and discussed our findings...
Conclusion
So in conclusion, this analysis yielded some very interesting, and perhaps somewhat unexpected, results, in my view.
The weather is always discussed as a big factor affecting theme park attendance, but overall, the weather metrics seemingly affected attendance a lot less than you might expect within this dataset. The biggest affector of the weather metrics was average rainfall, and even that presented only very spurious evidence of a relationship with UK theme park attendance; at best, it showed minor signs of a weak negative correlation, and evidence in favour of a causal relationship was insufficient. With that being said, much of the limited evidence of relationships that was shown among the weather metrics did point in the general direction I would have expected, with rainfall pointing towards a negative relationship and sunshine erring towards a positive relationship. I was very surprised at the profound lack of trend when it came to temperature, however; the evidence of a relationship there was pretty much zero, with no real leaning in either direction.
The economy is also discussed, albeit less than weather, but unlike weather, the economic metrics seemingly affected attendance to a surprising degree within this dataset. Both CPI inflation rate and unemployment rate exhibited significant to extremely significant evidence of relationships and moderate correlations, and even GDP growth exhibited significant evidence of a relationship and a weak-to-moderate correlation. Interestingly, the evidence of relationships within the economic factors also pointed in the complete opposite direction to the one you'd initially expect, with the evidence of CPI inflation rate and unemployment rate having moderate positive correlations and GDP growth having a weak-to-moderate negative correlation suggesting that UK theme park attendance is generally higher when the economy is doing worse. That's not an outcome I would initially have expected; maybe there's something in the notion that UK parks often do well out of recessions?
I should note a few things here, however. For starters, correlation
does not equal causation, and it should not be treated as concrete proof that x causes y. Just because my data suggests a certain correlation, that does not mean that there's necessarily a chain of causality that works that way in reality. I should also note that these parks do not operate in a vacuum, and these are far from the only factors affecting attendance; there are a wide smorgasbord of intrinsic and extrinsic factors, and it is a phenomenally multi-faceted issue.
Nonetheless, I hope you've found my investigation interesting! If you'd like me to investigate anything else, or if you think I've done something wrong, don't hesitate to tell me!
TL;DR: I performed an investigation into the relationship between UK theme park attendance and various extrinsic factors, with a key focus on the weather and the economy. The weather was found to not affect attendance to a statistically significant degree overall, with even the metric with the strongest-seeming relationship, rainfall, only showing spurious evidence of a causal relationship and exhibiting signs of a weak-to-insignificant correlation. The economy was found to have a far more significant effect, with CPI inflation rate and unemployment rate in particular exhibiting highly significant evidence of relationships and moderate-to-strong correlations. Interestingly, it was also found that perhaps unexpectedly, UK theme park attendance seems to be higher when economic strength is lower.