Blog

Linking survey and digital trace data

A recent event explored the potential in linked digital trace data and survey responses.

Author:

Irina Andreeva
Publishing date:

11 March 2024

Understanding online behaviours, attitudes and identities is a key challenge for social science in the 21st century. At the same time, the opportunities provided by digital trace data are substantial as researchers can access huge quantities of precise observational data often relatively quickly, easily and cheaply.

At a recent online event, colleagues from Cardiff University, the National Centre for Social Research (NatCen) and the University of Essex explored the potential in linked digital trace data and survey responses, uncovering intriguing insights. The research presented at this event was funded by the ESRC as part of the Understanding (Offline/Online) Society project.

This ‘Linking survey and digital trace data’ event was divided into three sessions, each centred around a crucial methodological inquiry which included:

Exploring the synergies between digital trace data and survey data;
Considering strategies for optimising informed consent for linking survey and digital trace data; and
Addressing the ethical and legal considerations in collecting, linking, and sharing digital trace data with survey data while preserving its utility.

We found that motivations driving online behaviours on platforms like X/Twitter differ based on social class and political affiliation. Users driven by self-expression tend to be more active on Twitter, while those seeking networking opportunities often have a larger following. These findings shed light on the intricate interplay between online motivations and behaviours, highlighting the nuanced dynamics at play in the digital realm. Our ongoing research endeavours aim to further explore and understand these relationships, offering valuable contributions to the evolving landscape of online interactions.

The second session delved deep into the intricate dynamics of consent to data linkage and its association with self-reported X/Twitter use. Through a comprehensive analysis of socio-demographic characteristics, this session revealed findings that shed light on public attitudes and decision-making processes regarding consent.

Qualitative research shows that individuals often rely on shortcuts when making consent decisions, with key factors including risk, benefit, trust, and control. Despite a lack of full understanding of the consent terms, participants' decisions remained unchanged after detailed discussions. Varying preferences in information presentation and use were also observed.

Experimental evidence highlighted the effectiveness of different approaches in improving consent rates. Offering a small incentive of £2 significantly increased consent rates. Interestingly, older individuals and those leaning towards Conservative or no political affiliations were less likely to consent, while internet usage patterns showed divergent results based on the sample studied.

Furthermore, this study explored the association between consent rates and self-reported Twitter activity. Individuals engaging less frequently on the platform were less likely to consent, and consent rates were highest among participants using Twitter to share their own content.

Moreover, the investigation into the variety of respondents' smartphone activity and technical skills revealed interesting correlations with consent to link X/Twitter data. Those using smartphones for a wider array of purposes were more likely to consent, while activity frequency was associated with privacy concerns. Privacy and security concerns were identified as potential mediators between various factors and data linkage consent.

The third session delved into the complexities of working with data and setting up Data Sharing Agreements (DSAs). The focus was on ensuring compliance with specific requirements across organisations, securing access via locked-down devices, and navigating the myriad of opinions from legal, risk, and compliance experts.

Lessons learned underscored the challenges of erasing data from cloud services and the need for robust archival frameworks. Balancing individual privacy with data publication remains a key concern, with methods like statistical disclosure control and de-identification proposed as solutions to minimise risks while maintaining broad access to research. The challenges lay in finding a middle ground between full anonymisation and retaining data's usefulness. The Five Safes Framework was highlighted as a key tool for ensuring safe data practices. With the constantly evolving data landscape and varying definitions of personal data, the discussion emphasised the importance of understanding risks and designing systems based on trust rather than control. By considering both legal obligations and potential harm to individuals, researchers can navigate the complexities of data governance effectively.

What does this mean for us in practice? The first is to recognise that things are constantly changing, and the second is that there are some things we cannot know. A fundamental aspect of this project is that one does not know how data will be used in the future. This means we cannot usefully characterise datasets as public or by potential use, and that the intrinsic nature of the data cannot be used as an argument that they are not risky. It is not the data per se that raise ethical issues, but the use to which they are put and the analysis to which they are subjected. This underscores the importance of staying vigilant and adaptable in data governance practices to address emerging challenges effectively.