- We deployed a survey asking about people’s willingness to share data so that the appropriateness of the UK COVID-19 Alert level could be monitored
- For one section of the survey, our headline initial findings are that people are more willing to share data when:
- as the UK COVID-19 Alert level increases,
- it is anonymous, rather than identifiable (include name and address)
- it is being shared with a Public Health Body or the Local Authority, rather than a Regional Police Force or a Commercial Company
- the data is medical rather than mobility data
Across the globe, the response to Covid-19 prompted a paradigm shift in regard to data sharing and use. We have become accustomed to hearing how various policy decisions are “data driven”, yet these data-driven systems require people to be willing to share often sensitive information.
In the UK, these unprecedented times triggered organisations such as public health bodies, regional police forces and local authorities, to share more data with each other and more swiftly than ever before. We have also been asked to contribute data in ways that we have not seen previously, most notably the NHS Covid-19 Contact Tracing app. Throughout the pandemic, we have seen continual conversation and debate around such data sharing, with many wrestling with the complexities of wanting to retain data privacy but also wanting to assist the world in the fight against the Covid-19 virus. These conversations have reintensified as the Department for Digital, Culture, Media and Sport publishes its response to consultation on their National Data Strategy and the Department of Health and Social Care publishes a draft of “ambitious plans to harness the potential of data in health and care”.
As the UK begins to see the light at the end of the tunnel, with over 31 million citizens receiving their second dose of the vaccine (as reported 21st June), we wanted to consider what the public’s willingness to share data may now be in the wake of the Covid-19 pandemic as we look towards a new more data-driven future.
We deployed a survey and collected responses from 1,003 UK residents. This sample was nationally representative in regard to age, sex and ethnicity (simplified to five categories) based on data from the Office of National Statistics. In addition to asking about our participants’ demographic information and some questions related to their experiences of Covid-19, we presented our participants with a series of data sharing scenarios. Each scenario defined 4 attributes – (I) what the Covid-19 Alert level was, (II) the type of data, (III) who the data was being shared with, and (IV) whether the data was stored alongside identifying data or not (see the table below for the different options that these attributes were broken down into).
We designed the survey so that the name of the Public Health Body would change based on the participant’s answer to the question “What is your current country of residence?” earlier in the survey. For example, if a participant had answered “Scotland” then the data sharing scenarios would describe the Public Health Body option as “Public Health Scotland”. Also, we wanted to try to gain some insight into the public’s willingness to share data with a commercial company because there are a number of examples of Covid-19 data collection activities that involve a commercial company. Rather than using a real-life company and risk participant’s being biased based on their prior experiences or impressions of that company, we invented a fictional company (“Info-Insight”) to be the commercial company option in the survey.
We presented these data sharing scenarios in pairs, like in the screenshot below, asking our participants to compare the two scenarios and select the one in which they would be more willing to share their data. Each data sharing scenario and the scenario it was being compared to was designed to maximise the data collected in as few questions as possible (in our case, 17 data sharing scenario comparisons). In some cases the two data sharing scenarios being compared were very similar, in other cases they were very different. It would be difficult to remember what all the different options were, so participants were able to hover their mouse over any underlined text and a little box (a tooltip) would appear providing a description (see the table above for these descriptions).
Based on previous research, it seems that the decision to share data or not is multi-dimensional. By using this approach, where our participants are forced to make a series of choices (known as a Choice-based Multiple Conjoint Activity), the overall decision can be deconstructed into its component parts (namely, the attributes) and the effects of these attributes on data-sharing preference can be investigated.
Some Initial Results
The graph below displays the initial results from analysing the survey’s Choice-based Multiple Conjoint Activity data. Listed down the left-hand side are the four attributes and their options. For each attribute, one of the options is set as the reference, e.g. “(I) Alert level (ref=Level 1)”. The estimates (presented by a dot, the point estimate, with the associated 95% confidence interval) measure the changes in preference relative to the reference option. To help interpret the estimates, a vertical dashed line down the figure is added to indicate no difference in preference between the option of interest and the reference option. The further away an estimate is from the dashed line the more willing (if to the right) or unwilling (if to the left) the participants were to share their data in a scenario that included that option in comparison to the reference option. Statistical evidence of the presence or absence of a difference in data-sharing preference is presented via the 95% confidence interval, the two vertical bars to either side of the dot. An interval overlapping with the dashed line indicates little evidence of a difference in choice preference between the option of interest and the reference option. Otherwise, the data suggest a considerable difference in choice preference.
The results, as displayed in the graph, are:
(I) Participants were more willing to share data as the Covid Alert Level increased.
(II) Participants were less willing to share mobility data compared to medical data.
(III) Participants were more willing to share data with their Public Health Body and Local Authority than they were a commercial company (Info-Insights). There was very little difference in how willing participants were to share data with their Regional Police Force compared to a commercial company as the vertical dashed line sits between the lower and upper bounds of the 95% confidence interval for Regional Police Force.
(IV) Participants were less willing to share identifiable data.
Based on research findings from before the Covid-19 pandemic began, our predictions about the direction of willingness for some attributes have come to fruition: that participants were less willing to share identifiable data, and that participants were more willing to share with a health-related government body (Public Health) and a service-related government body (Local Authority) than a commercial company. Further, that participants were more willing to share data as the Alert Level increased illustrates response to an urgent need for data, but also reflects the want to retain data-privacy when data sharing isn’t absolutely necessary.
The somewhat unexpected result is in regard to the Data Type attribute. As predicted, there is a difference in willingness to share these two data types. It could be that participants were less willing to share Mobility data because it was felt that this was more invasive in regard to day to day privacy, or it could be that there was greater willingness to share Medical data because of the nature of a pandemic and the need to understand Covid-19 in order to beat it. But what is interesting is that although the difference in willingness between sharing Medical data and Mobility data is statistically significant, it is small. It may be that this reflects participant’s awareness that in order to stop the spread of Covid-19 both types of data are needed – medical to understand who has Covid-19 and mobility to understand who they may have transferred it to. However, this result could to a degree be an artefact of the survey’s definitions of Medical and Mobility data being ambiguous. Later in the survey we delved deeper into this, asking participants to describe the amount of detail they would be willing to include in the Medical or Mobility data that they were sharing, bearing in mind the Covid-19 alert level, who they were sharing it with, and whether the data would be anonymous of identifiable.
Some Final Thoughts
To summarise, this blogpost really does just scratch the surface of the insights that we could gain from the OMDDAC survey data. Not only can we re-examine the Choice-based Multiple Conjoint Activity data to consider whether the results differ by demographics such as age, gender and ethnicity, but the Choice-based Multiple Conjoint Activity is just one of the three sets of questions about data sharing preferences in the survey. So, stay tuned for more results in part 2!