Empowering Effects:

Thunberg’s movement on Climate Change

START

A "hot" problem...


Climate change is a problem that concerns everyone. Recent studies show that the Earth has globally warmed of 1.1°C compared to the mean value at the pre-industrial period. This value is predicted to increase of 3 to 5 degrees more until 2100. The Global Humanitarian Forum estimated, in 2009, that the number of deaths due to climate change was about 300 000 per year. They also estimated that it could grow up to 500 000 deaths a year in 2030.

To move things faster, some personalities are starting to raise their voices to make the world change. One of them appeared for the first time in the media in August 2018 with the first school climate strike in Sweden. This person is nobody else than Greta Thunberg. And since then, she did appear multiple times in the front row of the fight against global warming, demanding different politicians to move faster.

However, even though her image and her thoughts were widely spread in the media, did the people really listen to her reclamations ? Did they become more interested in the topic ? Did she make, indeed, people more aware that they needed to get more educated about climate change?

Through this data analysis, the aim is to show the impact Greta's mediatic appearance had on the habits of Wikipedia users. Do people now have a higher tendency to get information about the subject, did she make a difference ?

The chosen data were the number of pageviews of Wikipedia grouped by month. By taking specific events where Greta did appear mediatically, the study aims to show if she did change the Wikipedia users’ habits. The first part of the study looks at usual Wikipedia pages; a dataset of climate change related pages compared to a dataset of the most popular pages. The second part digs even deeper by studying if people are really looking into more scientific information. To that effect, it compares Wikipedia scientific articles concerning climate change to any other science related article.

Methodology

Replicated study

The data analysis performed here is following the same method as the study of Jonathon Penney: "Chilling effects: Online surveillance and Wikipedia use" conducted in 2016. In his study, J. Penney aims at providing evidences of the online government surveillance impact over Wikipedia users researches. In order to study their impact, they looked at possible apparition of chilling effects (defined as the inhibition or discouragement of the exercise of legal rights by the threat of a legal sanction or privacy leaks). To study those effects, they explored how the Wikipedia pageviews on topics that raise privacy concerns changed after an "exogenous shock", here the NSA/PRISM surveillance revelations of June 2013. They hypothesized, based on the chilling effects theory, that Internet users will be less likely to view articles on those specific topics after being aware of online monitoring. Even if Wikipedia is the main focus in this study, they also looked at how people seek and access information and knowledge online more generally through crowdsourcing.

Their method was the following. First, they chose to study dataset retrieved from English Wikipedia as it is an influential resource for information and knowledge online. Indeed, over 50% of internet users use it as a source of information. Thus, if the government is chilling people from accessing knowledge about specific topics, it should be sufficiently represented by Wikipedia's data. A sample of 48 Wikipedia articles was selected based on the DHS keywords listed as related to "terrorism" (government using this key word as a justification for its online surveillance practices) in a non-random way and without any assumption that people avoid those topics dataset consisted of their pageviews number from January 2012 to August 2014 aggregated on a monthly basis.

Once data were retrieved, analysis was done using an interrupted time series (interruption: June 2013, surveillance revelations). The following segmented regression analysis allowed the comparison in data levels and trends of the dataset.

The empowering effects - Case study

To see the possible empowering effect done by Thunberg's intervention, interrupted time series analysis (ITS) has been used as previously explained. The selected period of interest depends on the choice of events that will be the pivots of the interrupted time series analysis. Values are studied between January 2018 and February 2020. Indeed, they are not extended further as the Covid-19 pandemic might have influenced the pageviews a lot. The important pivots events are three major actions that Greta Thunberg took part in and that were highly mediatized.



In the replicated publication, the terrorism articles are compared to another quasi-control group. This group is made of security related articles that would arguably not be affected by the treatment but are similar enough in terms of topic to be used as a control group. To be able to reproduce this method in our context, we need a group of articles that are likely to be affected by the treatment, namely that people would consult them more because of the mediatic movement around Greta Thunberg and climate change. We also need a second group of articles that are similar to the first group such that any external bias would appear in both and therefore cancel out at comparison. It should also not be affected by the treatment, namely not be related to climate change issues.

It is not straightforward to find a second group that would be used along with the first corpus that we considered because this group is already quite broad. To paliate to this problem, we restrained ourselves to scientific articles chosen in wikipedia index of articles randomly. The first group was selected randomly in a list of scientific articles on topics likely to be related to climate change (biodiversity, energy, meteo, pesticides, earth). The second group was selected randomly in a list of scientific articles on topics not related to climate change (anatomy, genetics, optics, philosophy, social). This pick has the advantage of showing if the public reads on scientific articles, which would confirm the hypothesis that people were encouraged to get educated on the topic of climate change.

In the end, we have 4 groups – two different treatment groups that will be compared to 2 different comparators separately, as shown in the table below.


Treated group Control group
Case 1 Common climate-related article Most popular article
Case 2 Scientific climate-related article Scientific other-topic-related article

All data were retrieved here.

Results

To verify the claim of an "Empowering effect", we use the immediate change in pageviews from before to after each event, as well as the long term trend indicated by the regression slopes and levels. The potential trends observed in the treatment group is compared to the data in the comparator group. To verify the statistical significance of those results, the confidence intervals of the regressions are taken into consideration.



Outliers gestion

It is important to control if our dataset does contain outliers. If it is the case, they need to be removed, in order to keep the dataset as meaningful as possible. To do so, the first thing to do is to use the Cook's D values from the regression of each dataset to detect the presence of potential outliers. As some are observed, it is then decided to use the z-score to find the article that created this effect. This technique is used, in order to reproduce the method followed by J. Penney. Whether an article has a z-score superior to 3.0 or inferior to -3.0, it might be considered as an outlier. For the first study case, in the climate related group, "Global warming" is apparently a large outlier, unfortunately, this is the main article related to our problematic, it has been decided to keep it in the dataset since it is very relevant. Concerning the second study case, we can consider that the outliers for the climate change dataset are "Health", "Life expectancy" and "Humanism", it is logical since these themes are linked to other matter that only the climate change. For the control dataset about scientific articles, we can see the Galileo Galilei and Autism are outliers. We can drop them in the aggregate.

Global warming article

On the image below, the Global warming outlier is plotted for the whole period, considering the interruptions to see if it is consistent with an "Empowering Effect". Here, the pageviews are averaged over 7 days to filter out the weekly pattern.



It seems like the first interruption doesn't lead to a direct increase in pageviews, even though they appear a bit later. The second interruption, if anything, leads to a sudden and sharp drop that is not consistent with an "Empowering Effect''. Only the third interruption leads to an increase of the pageviews which would be consistent with an "Empowering Effect".



Monthly mean

To start the analysis, monthly means were performed as first results. A more positive change of the monthly mean number of pageviews for the Climate change articles as compared to the control articles would be consistent with an "Empowering Effect".


Here, it is noted that the climate articles have a steady increase of monthly mean number of pageviews across the 3 events. This seems to be consistent with an “Empowering Effect” already before comparison with the control group. Compared with the control group, we observe that only the first (+12% for treatment, -34% for control) and third (+14% for treatment, -40% for control) interruptions lead to a more positive difference of monthly mean number of pageviews for the treatment group, whereas the comparison of both sets for the second interruption (+2% for treatment, +59% for control) shows a radically more positive difference for the control group.

Here it is observed that, the monthly mean is decreased by about 2 % between period 1 and 2 for the treatment group whereas the monthly mean for the control is reduced by approximately 4.5 %. This doesn’t seem to be repeated for the second interruption, where the drop for both sets is comparable (approximately 10.5 %), even though a bit smaller for the climate change related set (more like 10.2 %). The trend is not verified either for the third interruption where the drop for the treatment is group 7.2 % but the drop for the control group is 4.3 %.



The steady increase of the treatment group when the other group has radical differences between periods is consistent with an "Empowering Effect" at least for the first and last interruptions. The control group has a large variance, which seems to make the comparison a bit fragile here. Intuitively, if the claim of an "Empowering Effect" is true, the monthly mean number of pageviews across the 4 periods should increase more/decrease less in the treatment group (articles related to climate change) than in the control. This sort of trend is not observable for the scientific articles, at least not for the 2nd and 3rd interventions. One might argue that the first intervention might have had an impact. This claim is to be verified by further tests.



Linear regression

Finally, a linear regression is performed in order to study more deeply the overall trend of our study between chosen periods. If there is a short term "Empowering Effect", the short term change in number of pageviews after the interruption should be more positive for the treatment group than for the control. If there is a long term "Empowering Effect", the change in the slope of the regression after the interruption should be more positive for the treatment group than for the control. From that and the confidence intervals, it should be possible to assess if we can observe with confidence an "Empowering Effect".

To correct for the radically different number of pageviews across article groups, the pageviews per month are normalized by the mean of the distribution. The normalization should not be dependent on the difference of noise between the data sets and therefore it is not normalized by the maximum value.



Common articles

The comparison between the linear regression on the popular articles and the one on the climate change related articles doesn’t seem to indicate clearly an "Empowering Effect". The long term effect is basically non-existent, indeed the slope of the number of views for both corpus is close to zero in the last period. A short term effect might possibly be argued at the last interruption, since a positive jump is observed in the number of pageviews of the climate change related articles and a negative jump is observed in the corpus featuring the popular articles. The strong negative jump in the corpus featuring the popular articles at the first interruption is also observable but can easily be attributed to noise, which is omnipresent on this set.

The popular articles dataset is quite noisy from the computed R-squared value. Indeed, the optimal linear regression has R-squared equal to 0.499. The fit does not explain a large amount of the set’s variance.




Scientific articles

The linear regressions and their confidence intervals overlap heavily especially in the first and last periods. It is a good thing that they overlap in the first period because it means that both corpus are similar enough before the treatment appears. That they overlap so much in the last period would be consistent with the claim that there was no long term "Empowering Effect" in this case.

No immediate change can be observed at the interruptions. The confidence intervals overlap quite a bit and it is difficult to assess a long term effect.

It is worth noting here that the R-squared value of both linear regressions are similar (near 0.7), suggesting that the quality of the fits is comparable. This is also due to the fact that the variance in both sets is comparable.



It seems far fetched to interpret the linear regression comparison between the popular articles and the climate change related articles for the first study case. Even though a short term effect is not excluded at the last interruption, it is confirmed neither by the other interruptions (where the jump simply doesn’t appear) nor by the other data set where we effectively observe no difference between the datasets. It seems that this jump could be an artefact created by the noise in the popular articles dataset. In the case of the scientific corpus of articles, it is not possible to assess any long term "Empowering Effect" with confidence. In fact, both datasets almost appear to be the same in the linear regression and cannot be distinguished with any degree of reasonable confidence. This seems to imply that there is no significant increase in readings of scientific articles that are related to climate change.

It boils down to...

The goal of this work was to try and highlight the potential “Empowering Effect” discussed in the introduction. The two comparisons made on both pairs of data sets have not provided results that indicate clearly such an effect. If some light indications of an empowering effect could perhaps be distinguished in the monthly mean analysis, particularly when considering the first intervention, these indications could not be confirmed when performing the linear regressions. Indeed, The 'Common' set features large confidence intervals that drown the data in uncertainty and the scientific articles comparison yielded no significant result whatsoever. The only effect (short term) that could possibly be observed for the last intervention when comparing the common articles in the monthly mean analysis is not supported by any other result (apart from the daily pageviews of the single 'Global Warming' article) and therefore seems to be an artefact introduced by the noise on the set of popular articles.

This particular analysis did not demonstrate any “Empowering Effect”. However, this should in no case undermine the message conveyed by Miss Thunberg in her attempt to change minds. There is still hope to be had, the night is young, maybe almost as young as Greta herself.






"Change is coming, whether you like it or not. "

Greta Thunberg
Instagram (2019)


This project has been done within the course "Applied Data Analysis" given by Robert West at EPFL.

Our team

Jonathan Haenni

Microengineer

Danny Kohler

Digital Forensic Scientist

Léa Schmidt

Bioengineer