Drachsler, H. et al. (2016) Is Privacy a Show-stopper for Learning Analytics? A Review of Current Issues and Their Solutions Learning Analytics Review, no. 6, January 2016, ISSN: 2057-7494
One of the most interesting sessions for me at last week’s EDEN conference in Budapest was a workshop run by Sally Reynolds of ATiT in Brussels and Dai Griffiths of the University of Bolton, UK. They are both participants in a European Commission project called LACE (Learning Analytics Community Exchange).
The LACE web site states:
LACE partners are passionate about the opportunities afforded by current and future views of learning analytics (LA) and educational data mining (EDM) but we were concerned about missed opportunities and failing to realise value. The project aimed to integrate communities working on LA and EDM from schools, workplace and universities by sharing effective solutions to real problems.
There are a number of reviews and case studies of the use of learning analytics available from the web site, which, if you are interested in (or concerned) about the use of learning analytics, are well worth reading.
The EDEN workshop
The EDEN workshop focused on one of the reviews concerned with issues around ethics and privacy in the use of learning analytics, and in particular the use of big data.
I am reasonably familiar with the use of ‘small’ data for learning analytics, such as the use of institutional student data regarding the students in the courses I am teaching, or the analysis of participation in online discussions, both in quantitative and qualitative terms. I am less familiar with the large-scale use of data and especially how data collected via learning management or MOOC registration systems are or could be used to guide teaching and learning.
However, the focus of the workshop was specifically on ethical and privacy issues, based on the review quoted above, but nevertheless I learned a great deal about learning analytics in general through the workshop.
What is the concern?
This is best stated in the review article:
Once the Pandora’s Box of data availability has been opened, then individuals lose control of the data about them that have been harvested. They are unable to specify who has access to the data, and for what purpose, and may not be confident that the changes to the education system which result from learning analytics will be desirable. More generally, the lack of transparency in data collection and analysis exacerbates the fear of undermining privacy and personal information rights in society beyond the confines of education. The transport of data from one context to another can result in an unfair and unjustified discrimination against an individual.
In the review article, these concerns are exemplified by case studies covering schools, universities and the workplace. These concerns are summarized under the following headings:
- informed consent and transparency in data collection
- location and interpretation of data
- data management and security
- data ownership
- possibility of error
- role of knowing and obligation to act
There are in fact a number of guidelines regarding data collection and use that could be applied to learning analytics, such as the Nuremberg Code on research ethics, the OECD Privacy Framework, (both of which are general), or the JISC code of practice for learning analytics. However, the main challenge is that some proponents of learning analytics want to approach the issue in ways that are radically different from past data collection methods (like my ‘small’ data analysis). In particular they propose using random data collection then subsequently analysing it through data analysis algorithms to identify possible post-hoc applications and interpretations.
It could be argued that educational organizations have always collected data about students, such as registers of attendance, age, address and student grades. However, new technology, such as data trawling and the ability to combine data from completely different sources, as well as automated analysis, completely changes the game, raising the following questions:
- who determines what data is collected and used within a learning management system?
- who ensures the security of student (or instructor) data?
- who controls access to student data?
- who controls how the data is used?
- who owns the data?
In particular, increasingly student (and instructor) data is being accessed, stored and used not just outside an institution, but even outside a particular country, and hence subject to laws (such as the U.S. Patriot Act) that do not apply in the country from which the data was collected.
Recommendations from the LACE working group
The LACE working group has developed an eight point checklist called DELICATE, ‘to support a new learner contract, as the basis for a trusted implementation of Learning Analytics.’
For more on DELICATE see:
Drachsler, H. and Greller, W. (2016) Privacy and Learning Analytics – its a DELICATE issue Heerlen NL: The Open University of the Netherlands
Issues raised in the workshop
First it was pointed out that by today’s standards, most institutional data doesn’t qualify as ‘big data’. In education, what would constitute big data would for example be student information from the whole education system. The strategy would be to collect data about or from all students, then apply analysis that may well result in by-passing or even replacing institutions with alternative services. MOOC platforms are possibly the closest that come to this model, hence their potential for disruption. Nevertheless, even within an institution, it is important to develop policies and practices that take into account ethics and privacy when collecting and using data.
As in many workshops, we were divided into small groups to discuss some of these issues, with a small set of questions to guide the discussion. In my small group of five conference participants, none of the participants was in an institution that had a policy regarding ethics and privacy in the use of learning analytics (or if it existed, they were unaware of it).
There was a concern on our table that increasing amounts of student data around learning was accessible to external organizations (such as LMS software companies and social media organizations such as Facebook). In particular, there was a concern that in reality, many technology decisions, such as choice of an institutional learning platform, were influenced strongly by the CIO, who may not take into sufficient account ethical and privacy concerns when negotiating agreements, or even by students themselves, who are often unaware of the implications of data collection and use by technology providers.
Our table ended by suggesting that every post-secondary institution should establish a small data ethics/privacy committee that would include, if available, someone who is a specialist in data ethics and privacy, and representatives of faculty and students, as well as the CIO, to implement and oversee policy in this area.
This was an excellent workshop that tried to find solutions that combine a balance between the need to track learner behaviour and privacy and ethical issues.
Over to you
Some questions for you:
- is your institution using learning analytics – or considering it
- if so, does your institution have a policy or process for monitoring data ethics and privacy issues?
- is this really a lot of fuss over nothing?
I’d love to hear from you on this.
No privacy is not an issue. What matters is the data , and lots of it. The type of data needed can be anonymous and there is no issue of privacy.
For example, we need to ask the broad questions like:
What topic do students struggle with the most in Grade 9 math? Why?
Why do students in Colorado struggle with Pre-Calculus while Tennessee students do not? Furthermore in Tennessee, why does Marshall county consistently have the best scores?
Why do Polynomial related questions require an average of 4.2 attempts to get the correct answer in Oregon but only require 2.3 attempts in California?
As you can see, we don’t need to know if John Doe had a certain grade or know his time on task or number of attempts. It can be anonymous. The data can then eventually predict outcomes with increasing confidence levels.