Home Bibliography MOOCs and student data privacy: a sentiment analysis

MOOCs and student data privacy: a sentiment analysis

November 5, 2019

4357

Are you still with me as I plough through the special edition on learning analytics from the journal Distance Education? I hope so. I hope this isn’t too boring for you but at least I am persistent. I am determined to get a good grasp of the strengths and weaknesses of learning analytics.

So this is the fifth post reviewing articles in the journal Distance Education, Vol. 40, No.3. The other four were:

analytics and learning design at the UKOU.
analytics and personality traits in a high school in China
analytics and gamification in an undergraduate course at a Hong Kong university
learning analytics to predict mastery using an online mathematics tutoring system.

As always if you find these posts of interest, please read the original articles. Your conclusions will almost certainly be different from mine.

The article

Prinsloo, P. et al. (2019) Student data privacy in MOOCs: a sentiment analysis Distance Education, Vol. 40, No. 3

The aim of the study

The study aimed to identify whether there is evidence of the use of emotive language in terms and conditions of three MOOC platform providers (edX, Coursera, FutureLearn). The issue is ‘whether students are persuaded to accept terms and conditions through the use of emotive or affective language.’

Methods

Both a quantitative and qualitative analysis of the documents were conducted.

1.Quantitative sentiment analysis

Three different computer programs were used to ‘mine’ the texts for emotive words:

the Bing lexicon, which sorts words into positive or negative positions
The NRC emotion lexicon, which ranks words in terms of emotions (anger, joy, etc.)
The Loughran and McDonald lexicon based on financial aspects of public companies.

This is the learning analytics approach to sentiment analysis.

2.Qualitative content analysis

Three researchers used a standard qualitative content analysis method (Lincoln and Guba, 1985) to identify clearly positive or negative terms in the documents, because the researchers found it too difficult to relate examples of phrasing to the emotion categories in the lexicons used for the quantitative analysis.

This might be considered a standard, non-learning analytics approach, dependent on human observation and analysis.

Results

1.Quantitative analysis

The authors report: ‘There appears to be little consistency between the results of the three separate analyses.’ This result holds for the terms and conditions of each of the three platforms. However, two of the lexicons suggested a greater use of positive words.

There are some other significant differences – you should read the article in detail – but in general the quantitative analysis raises all kinds of problems. For instance the context (an educational platform) results in some words that are neutral within a MOOC context (e.g. content, meaning ‘stuff’ such as a recorded lecture) being considered positive (e.g. content, meaning happy) by a general lexicon such as Bing. In other words, one would need to develop a specific lexicon for terms and agreements in educational contexts.

My interpretation here is that quantitative sentiment analysis needs to use context-specific lexicons, which makes the mining exercise much more laborious if a new context-specific lexicon has to be developed each time. The authors though conclude that ‘the use and privacy statements are not only positive but will, most probably, encourage engagement.‘

2 Qualitative analysis

Not surprisingly, this is much more difficult to summarise as the qualitative analysis is rich with specific examples from each of the three platforms documents, but the main conclusions are that

the overall sentiment or affective tone in the documents was negative, but…. this may be an inherent characteristic of a legal document
FutureLearn’s documents ‘appear to be more specific and the overall tone more positive than those of the other two providers.’
our analysis did not find….persuasive evidence of overt use of emotive language to ensure user engagement and understanding.

My comments

As Otto says at finding no diamonds in the safe in ‘A Fish Called Wanda’: ‘DIS-A-POINTED!!!’

Prinsloo et al. here are trying to analyse the effects of language in the terms and conditions of MOOC platform providers on student users. In this case, neither using learning analytics nor more conventional qualitative content analysis provided evidence of students being manipulated through the use of emotive language in these agreements.

It is not so much the result that is disappointing as the failure of the methods. Prinsloo et al. are right to worry about terms and conditions of use. Indeed, using straightforward analysis they did identify vagueness or lack of clarity in the wording of these agreements as an issue. What they could not show is that the language of the agreement unduly influenced students to sign up.

In other words, a straightforward reading and critique of terms and conditions of service would have identified the problem. Using learning analytics or sentiment analysis was using a sledgehammer to crack a nut (and missing).

Again, though, one has to question the validity of using quantitative sentiment analysis within an educational context. The meaning of words is highly context specific. Simple sorting and counting of words out of context is not going to work. It just destroys meaning. AI and learning analytics will need to become much more intelligent and sophisticated if it is going to be useful in these contexts.

Up next

The last post in this series will look at George Siemen’s reflection on these papers and Michael Jacobson’s reflection on the need for a theoretical perspective that takes account of the complexity of educational systems when applying learning analytics. I will then end with my own conclusions drawn from this edition of Distance Education.

Over to you

Before I reveal my own conclusions, I would really like to hear from you. After reading these posts, what is your view of learning analytics?

The best thing to happen to online and distance learning?
Tremendous potential yet to be achieved?
A waste of everyone’s time?
A disaster waiting to happen?
All or none of the above?

Can’t wait for your answers!

1 COMMENT

Geoffrey Cain November 5, 2019 At 2:00 pm

Hey Tony,
Thanks for slogging through all of this for us. I really appreciate the digest!
Geoff

Reply