Facebook
TwitterLinking survey and administrative data offers the possibility of combining the strengths, and mitigating the weaknesses, of both. Such linkage is therefore an extremely promising basis for future empirical research in social science. For ethical and legal reasons, linking administrative data to survey responses will usually require obtaining explicit consent. It is well known that not all respondents give consent. Past research on consent has generated many null and inconsistent findings. A weakness of the existing literature is that little effort has been made to understand the cognitive processes of how respondents make the decision whether or not to consent. The overall aim of this project was to improve our understanding about how to pursue the twin goals of maximizing consent and ensuring that consent is genuinely informed. The ultimate objective is to strengthen the data infrastructure for social science and policy research in the UK. Specific aims were: 1. To understand how respondents process requests for data linkage: which factors influence their understanding of data linkage, which factors influence their decision to consent, and to open the black box of consent decisions to begin to understand how respondents make the decision. 2. To develop and test methods of maximising consent in web surveys, by understanding why web respondents are less likely to give consent than face-to-face respondents. 3. To develop and test methods of maximising consent with requests for linkage to multiple data sets, by understanding how respondents process multiple requests. 4. As a by-product of testing hypotheses about the previous points, to test the effects of different approaches to wording consent questions on informed consent.
Our findings are based on a series of experiments conducted in four surveys using two different studies: The Understanding Society Innovation Panel (IP) and the PopulusLive online access panel (AP). The Innovation Panel is part of Understanding Society: the UK Household Longitudinal Study. It is a probability sample of households in Great Britain used for methodological testing, with a design that mirrors that of the main Understanding Society survey. The Innovation Panel survey was conducted in wave 11, fielded in 2018. The Innovation Panel data are available from the UK Data Service (SN: 6849, http://doi.org/10.5255/UKDA-SN-6849-12).
Since the Innovation Panel sample size (around 2,900 respondents) constrained the number of experimental treatment groups we could implement, we fielded a parallel survey with additional experiments, using a different sample. PopulusLive is a non-probability online panel with around 130,000 active sample members, who are recruited through web advertising, word of mouth, and database partners. We used age, gender and education quotas to match the sample composition of the Innovation Panel.
A total of nine experiments were conducted across the two sample sources. Experiments 1 to 5 all used variations of a single consent question, about linkage to tax data (held by HM Revenue and Customs, HMRC). Experiments 6 and 7 also used single consent questions, but respondents were either assigned to questions on tax or health data (held by the National Health Service, NHS) linkage. Experiments 8 and 9 used five different data linkage requests: tax data (held by HMRC), health data (held by the NHS), education data (held by the Department for Education in England, DfE, and equivalent departments in Scotland and Wales), household energy data (held the Department for Business, Energy and Industrial Strategy, BEIS), and benefit and pensions data (held by the Department for Work and Pensions, DWP).
The experiments, and the survey(s) on which they were conducted, are briefly summarized here:
1. Easy vs. standard wording of consent request (IP and AP). Half the respondents were allocated to the ‘standard’ question wording, used previously in Understanding Society. The balance was allocated to an ‘easy’ version, where the text was rewritten to reduce reading difficulty and to provide all essential information about the linkage in the question text rather than an additional information leaflet.
2. Early vs. late placement of consent question (IP). Half the respondents were asked for consent early in the interview, the other half were asked at the end.
3. Web vs. face-to-face interview (IP). This experiment exploits the random assignment of IP cases to explore mode effects on consent.
4. Default question wording (AP). Experiment 4 tested a default approach to giving consent, asking respondents to “Press ‘next’ to continue” or explicitly opt out, versus the standard opt-in consent procedure.
5. Additional information question wording (AP). This experiment tested the effect of offering additional information, with a version that added a third response option (“I need more information before making a decision”) to the standard ‘yes’ or no’ options.
6. Data linkage domain (AP). Half the respondents were assigned to a question asking for consent to link to HMRC data; the other half were asked for linkage to NHS data.
7. Trust priming (AP).This experiment was crossed with the data linkage domain experiment, and focused on the effect of priming trust on consent. Half the sample saw an additional statement: “HMRC / The NHS is a trusted data holder” on an introductory screen prior to the consent question. This was followed by an icon symbolizing data security: a shield and lock symbol with the heading “Trust”. The balance was not shown the additional statement or icon.
8. Format of multiple consents (AP). For one group, the five consent questions were each presented on a separate page, with respondents consenting to each in turn. For the second group the questions were all presented on one page; however, the respondent still had to answer each consent question individually. For the third group all five data requests were presented on a single page and the respondent answered a single yes/no question, whether they consented to all the linkages or not.
9. Order of multiple consents (AP). One version asked the five consent questions in ascending order of sensitivity of the request (based on previous data), with NHS asked first. The other version reversed the order, with consent to linkage to HMRC data asked first.
For all of the experiments described above, we examined the rates of consent. We also tested comprehension of the consent request, using a series of knowledge questions about the consent process. We also measured subjective understanding, to get a sense of how much respondents felt they understood about the request. Finally, we also ascertained subjective confidence in the decision they had made.
In additional to the experiments, we used digital audio-recordings of the IP11 face-to-face interviews (recorded with respondents’ permission) to explore how interviewers communicate the consent request to respondents, whether and how they provide additional information or attempt to persuade respondents to consent, and whether respondents raise questions when asked for consent to data linkage.
Key Findings
Correlates of consent:
(1) Respondents who have better understanding of the data linkage request (as measured by a set of knowledge questions) are also more likely to consent.
(2) As in previous studies, we find no socio-demographic characteristics that consistently predict consent in all samples. The only consistent predictors are positive attitudes towards data sharing, trust in HMRC, and knowledge of what data HMRC have.
(3) Respondents are less likely to consent to data linkage if the wording of the request is difficult and the question is asked late in the questionnaire. Position has no effect on consent if the wording is easy; wording has no effect on consent if the position is early.
(4) Priming respondents to think about trust in the organisations involved in the data linkage increases consent.
(5) The only socio-demographic characteristic that consistently predicts objective understanding of the linkage request is education. Understanding is positively associated with the number of online data sharing behaviours (e.g., posting text or images on social media, downloading apps, online purchases or banking) and with trust in HMRC.
(6) Easy wording of the consent question increases objective understanding of the linkage request. Position of the consent question in the questionnaire has no effect on understanding.
The consent decision process: (7) Respondents decide about the consent request in different ways: some use more reflective decision-making strategies, others use less reflective strategies. (8) Different decision processes are associated with very different levels of consent, comprehension, and confidence in the consent decision. (9) Placing the consent request earlier in the survey increases the probability of the respondent using a reflective decision-making process.
Effects of mode of data collection on consent: (10) As in previous studies, respondents are less likely to consent online than with an interviewer. (11) Web respondents have lower levels of understanding than face-to-face respondents. (12) There is no difference by mode in respondents’ confidence in their decisions. (13) Web respondents report higher levels of concern about data security than face-to-face respondents. (14) Web respondents are less likely to use reflective strategies to make their decision than face-to-face respondents, and instead more likely to make habit-based decisions. (15) Easier wording of the consent request does not reduce mode effects on rates of consent. (16) Respondents rarely ask questions and interviewers rarely provide additional information.
Multiple consent requests: (17) The format in which a sequence of consent requests is asked does not seem to matter. (18) The order of multiple consent requests affects
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Record linkage is a complex process and is utilized to some extent in nearly every organization that works with modern human data records. People create methods for linking records on a case-by-case basis. Some may use basic matching between record 1 and record 2 as seen below ```python
if (r1.FirstName == r2.FirstName) & (r1.LastName == r2.LastName): out.match = True else: out.match = False ``` while others may choose to create more complex decision trees or even machine learning approaches to record linkage.
When people approach record linkage via machine learning (ML), they can match on a variety of fields, typically dependent on the forms used to collect data. While these ML-utilized fields can vary on a organization-to-organization level, there are several fields that appear more frequently than others. They are as follows: * First Name * Middle Name * Last Name * Date of Birth * Sex at Birth * Race * SSN (maybe) * Address house * Address zip * Address city * Address county * Address state * Phone * Email * Date Data Submitted
By comparing two records on all of these fields, ML record linkage models use complex logic to make the "Yes" or "No" decision on whether 2 records reflect the same individual. Record linkage can become difficult when individuals change addresses, adopt new last name, erroniously fill out data, or have information that closely resembles another individual (ex: twins).
As described above, record linkage can have many complex elements to it.
Consider a situation where you are manually reviewing 2 records.
These two records only contain basic information on the individuals and you are tasked to decide if Record #1 and Record #2 belong to the same person.
| Record # | First Name | Last Name | Sex | DOB | Address | Address ZIP | Address State | Date Recieved |
|---|---|---|---|---|---|---|---|---|
| 1 | Wanda | Smith | F | 1992-09-13 | 1768 Walker Rd. Unit 209 | 99301 | WA | 2015-03-01 |
| 2 | Wanda | Turner | 1992-09-13 | 4545 Pennsylvania Ct. | 98682 | WA | 2021-06-30 |
At a glance, these records are significantly different and you should therefore mark them as different persons. For the purposes of record linkage manual review, you probably made the correct decision. After all, for record linkage, most models prefer False Negatives to False Positives.
When groups validate record linkage models, they often turn to manually-reviewed record comparisons as their "gold-standard". There are two seperate marks of judgement for record linkage that I would like us to consider 1. Creating a model that simulates a human's decision making processs 2. Creating a model that seeks a deeper record equality "Truth"
I believe many groups aim for and are content with accomplishing goal #1. That approach is inarguably useful. However, I believe that it can be harmful in biases that it introduces. For example, it is biased against people who adopt new last names upon marriages/civil unions (more often "Female" Sex at Birth). Models that bias against non-american names can also produce high validation marks, but are flawed nonetheless. Consider the 2 records displayed earlier. There is a real chance that Wanda adopted a new last name and moved in the 6 years between when the data was collected.
Without relavant documentation (birth, marriage, ... , housing records), we have no way of knowing whether or not "Wanda Smith" is the same person as "Wanda Turner". It follows that treating manual review as a "gold-standard" fails to completely support goal #2.
We hope to create a simulated society that can be used as absoulte truth. The simulated society will be built to reflect the population of Washington State. This will have a relational-database type structure with tables containing all relevant supporting structures for record linkage such as: * birth records * partnership (marriage) records * moving records
We hope to create a society with representative and diverse names, representative demographic breakdowns, and representative geographic population densities. The structure of the database will allow for "Time Travel" queries that allow a user to capture all data from a specific year in time.
By creating a simulated society, we will have absolute truth in determining whether record1 = record2. This approach will give us an opportunity to assess record linkage models considering goal #2.
After we wrap this work, we will work on proccesses/functions for simulating human error in record filling. Also, functions to help support the process of bias-recognition in using our dataset as a test set.
PJ Gibson - peter.gibson@doh.wa.gov
Facebook
TwitterSalutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.
We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.
What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.
Products: API Suite Web UI Full and Custom Data Feeds
Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
Facebook
TwitterThis work represents an initial work to establish the feasibility of linking police reported road casualty data (STATS19) and Incident Reporting System (IRS) data provided by the Home Office for road collision incidents attended by fire and rescue services.
The initial feasibility study focused on establishing a method for linking the two datasets, to support the future data strategy as set out in the latest STATS19 review final report. The main purpose of linking the two datasets was to establish whether the IRS data can help us to understand more about post crash care and response times.
The further analysis reviews and amends the linkage methodology, and explores the different trends for road collisions from the two datasets. This analysis was used to identify where patterns diverged as a basis for engagement with STATS19 data providers in 2025.
Any feedback from users of the statistics on the value of this data linking will be valuable in determining whether further work is merited. Feedback on the work to date is welcome by email to the road safety statistics team.
Road safety statistics
Email mailto:roadacc.stats@dft.gov.uk">roadacc.stats@dft.gov.uk
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Metadata (data dictionary) and statistical analysis plan (including outcomes definitions for data dictionary) for the ACTORDS 20-year follow-up study. The DOI for the primary study publication is https://doi.org/10.1371/journal.pmed.1004618.Data and associated documentation for participants who have consented to future re-use of their data are available to other users under the data sharing arrangements provided by the University of Auckland’s Human Health Research Services (HHRS) platform (https://research-hub.auckland.ac.nz/subhub/human-health-research-services-platform). The data dictionary and metadata are published on the University of Auckland’s data repository Figshare, which allocates a DOI and thus makes these details searchable and available indefinitely. Researchers are able to use this information and the provided contact address (dataservices@auckland.ac.nz) to request a de-identified dataset through the HHRS Data Access Committee. Data will be shared with researchers who provide a methodologically sound proposal and have appropriate ethical approval, where necessary, to achieve the research aims in the approved proposal. Data requestors are required to sign a Data Access Agreement that includes a commitment to using the data only for the specified proposal, not to attempt to identify any individual participant, a commitment to secure storage and use of the data, and to destroy or return the data after completion of the project. The HHRS platform reserves the right to charge a fee to cover the costs of making data available, if needed, for data requests that require additional work to prepare.
Facebook
TwitterThis report summarises the results of a feasibility study which links the police road casualty (STATS19) data with a subset of ambulance service data provided by the South West Ambulance Service Trust (SWAST), covering the more serious incidents within the South West of England over a 4 year period.
This small-scale initial study is anticipated to be the first stage of work to link STATS19 and healthcare data, working with the Pre-hospital Research and Audit Network (PRANA) which has necessary approvals to receive healthcare data including hospital admissions and trauma care for the whole of England.
Feedback on this initial work, or suggestions for future uses of linked police and health data, is welcome to inform future work.
Road safety statistics
Email mailto:roadacc.stats@dft.gov.uk">roadacc.stats@dft.gov.uk
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background and significanceIntravascular (IV) catheters are the most invasive medical device in healthcare. Localized priority-setting related to IV catheter quality surveillance is a key objective of recent healthcare reform in Australia. We sought to determine the plausibility of using electronic health record (EHR) data for catheter surveillance by mapping currently available data across state-wide platforms. This work has identified barriers and facilitators to a state-wide EHR surveillance initiative.Materials and methodsData variables were generated and mapped from routinely used EHR sources across Queensland, Australia through a systematic search of gray literature and expert consultation with clinical information specialists. EHR systems were eligible for inclusion if they collected data related to IV catheter insertion, care, or outcomes of hospitalized patients. Generated variables were mapped against international recommendations for IV catheter surveillance, with data linkage and data export capacity narratively summarized.ResultsWe identified five EHR systems, namely, iEMR, MetaVision ICU®, Multiprac, RiskMan, and the Nephrology Registry. Systems were used across jurisdictions and hospital wards. Data linkage was not evident across systems. Extraction processes for catheter data were not standardized, lacking clear and reliable extraction techniques. In combination, EHR systems collected 43/50 international variables recommended for catheter surveillance, however, individual systems collected a median of 24/50 (IQR 22, 30) variables. We did not identify integrated clinical analytic systems (incorporating machine learning) to support clinical decision making or for risk stratification (e.g., catheter-related infection).ConclusionCurrent data linkage across EHR systems limits the development of an IV catheter quality surveillance system to provide timely data related to catheter complications and harm. To facilitate reliable and timely surveillance of catheter outcomes using clinical informatics, substantial work is needed to overcome existing barriers and transform health surveillance.
Facebook
Twitter
According to our latest research, the global Identity Linkage Analytics market size reached USD 2.12 billion in 2024, reflecting rapid adoption across sectors driven by the need for advanced identity verification and fraud prevention. The market is demonstrating a robust compound annual growth rate (CAGR) of 17.5% and is forecasted to reach USD 10.55 billion by 2033. This growth is propelled by increasing digitalization, the proliferation of sophisticated cyber threats, and stringent regulatory requirements for identity management and data privacy.
One of the most significant growth factors for the Identity Linkage Analytics market is the escalating incidence of identity theft, fraud, and cybercrime globally. As organizations continue to digitize their operations and customer interactions, the volume and complexity of identity data have surged. This makes it increasingly challenging to accurately link disparate identity attributes and detect anomalies in real-time. Identity Linkage Analytics solutions leverage advanced machine learning and artificial intelligence algorithms to connect data points across multiple channels, enabling organizations to create unified customer profiles, detect fraudulent activities, and mitigate risks efficiently. The growing reliance on digital platforms, particularly in sectors such as banking, financial services, healthcare, and retail, is driving demand for robust identity analytics to ensure secure and seamless user experiences.
Another crucial driver is the tightening regulatory landscape governing data privacy and identity management. Laws such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States have compelled organizations to adopt advanced identity analytics to ensure compliance. These regulations require businesses to have comprehensive visibility into customer identities, consent management, and data usage, necessitating the deployment of sophisticated Identity Linkage Analytics platforms. Furthermore, the rise of remote work and online transactions in the post-pandemic era has heightened the need for reliable identity verification and linkage capabilities, further fueling market growth.
Technological advancements and the integration of artificial intelligence and big data analytics are also accelerating the adoption of Identity Linkage Analytics solutions. Modern platforms are increasingly capable of processing massive volumes of structured and unstructured data from diverse sources, including social media, mobile devices, and IoT endpoints. This enables organizations to gain deeper insights into user behavior, enhance customer engagement, and proactively detect potential threats. The convergence of identity analytics with other technologies such as blockchain and biometrics is expected to open new avenues for innovation and growth in the coming years.
From a regional perspective, North America currently dominates the Identity Linkage Analytics market, accounting for the largest revenue share in 2024. The region's leadership is attributed to the high adoption of digital technologies, a mature cybersecurity ecosystem, and the presence of leading solution providers. Europe follows closely, driven by stringent regulatory requirements and a strong focus on data privacy. Meanwhile, the Asia Pacific region is emerging as a high-growth market, with increasing investments in digital infrastructure, rapid expansion of the e-commerce sector, and rising awareness about identity-related risks. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions ramp up their digital transformation initiatives.
The Identity Linkage Analytics market is segmented by component into Software and Services. The software segment commands the largest share of the market, driven by the increasing
Facebook
TwitterThis study includes a synthetically-generated version of the Ministry of Justice Data First Probation datasets. Synthetic versions of all 43 tables in the MoJ Data First data ecosystem have been created. These versions can be used / joined in the same way as the real datasets. As well as underpinning training, synthetic datasets should enable researchers to explore research questions and to design research proposals prior to submitting these for approval. The code created during this exploration and design process should then enable initial results to be obtained as soon as data access is granted.
The Ministry of Justice Data First probation dataset provides data on people under the supervision of the probation service in England and Wales from 2014. This is a statutory criminal justice service that supervises high-risk offenders released into the community. The data has been extracted from the management information system national Delius (nDelius), used by His Majesty's Prisons and Probation Service (HMPPS) to manage people on probation.
Information is included on service users' characteristics and offence, and on their pre-sentence reports, sentence requirements, licence conditions and post-sentence supervision; for example, age, gender, ethnicity, offence category, key dates relating to sentence and recalls, activities and programmes required as part of rehabilitation (e.g. drug and alcohol treatment, skills training) and limitations set on their activities (e.g. curfew, location monitoring, drugs testing).
Each record in the dataset gives information about a single person and probation journey. As part of Data First, records have been deidentified and deduplicated, using our probabilistic record linkage package, Splink, so that a unique identifier is assigned to all records believed to relate to the same person, allowing for longitudinal analysis and investigation of repeat interactions with probation. This aims to improve on links already made within probation services. This opens up the potential to better understand probation service users and address questions on, for example, what works to reduce reoffending.
The Ministry of Justice Data First linking dataset can be used in combination with this and other Data First datasets to join up administrative records about people from across justice services (courts, prisons and probation) to increase understanding around users' interactions, pathways and outcomes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BACKGROUND. The UK hosts many of the world's longest running prospective longitudinal birth cohort studies. These projects make repeated observations of their participants and use this data to explore health outcomes and mortality. An alternative method for data collection is record linkage; the linking together of electronic health and administrative records. Applied nationally, this could provide unrivalled opportunities to follow a large number of people in perpetuity. However, public attitudes to the use of data and samples in research are currently unclear. Here we report on an event at which we collected attitudes towards recent opportunities and controversies within health data science. METHODS. The event was attended by ~250 individuals (cohort members and their guests), who had been invited through the offices of their participating cohort studies. Attendees heard talks describing key research results and participated in 15 multiple-choice questions using interactive voting pads. RESULTS. Our participants showed a high level of trust in researchers and doctors, but less trust in commercial companies. They supported the idea of researchers using information from both neonatal blood spots (Guthrie spots) and from health records. Participants said they would be willing to wear devices like a 'fit-bit' and to undergo a brain scan that might predict later mental illness. However, they were less willing to change an aspect of their lifestyle or take a new drug for research purposes. They were very keen to encourage others to take part in research; whether that be offering the opportunity to pregnant mothers or indeed extending invitations to their own children and grandchildren CONCLUSIONS. Our participants were broadly supportive of research access to data and samples, albeit less supportive when commercial interests are involved. Public engagement events that facilitate two-way interactions can influence and support future research and public engagement efforts. Ethical permission for this work was granted by The Psychology Research Ethics Committee (PREC) at the University of Edinburgh (Ref No: 327-1718/3). No identifying data were collected from participating individuals. Videos are publicly available on the CCACE YouTube Channel: https://www.youtube.com/channel/UCaemWVOehYht6pylL9zq4nw
Facebook
TwitterThe 1970 British Cohort Study (BCS70) is a longitudinal birth cohort study, following a nationally representative sample of over 17,000 people born in England, Scotland and Wales in a single week of 1970. Cohort members have been surveyed throughout their childhood and adult lives, mapping their individual trajectories and creating a unique resource for researchers. It is one of very few longitudinal studies following people of this generation anywhere in the world.Since 1970, cohort members have been surveyed at ages 5, 10, 16, 26, 30, 34, 38, 42, 46, and 51. Featuring a range of objective measures and rich self-reported data, BCS70 covers an incredible amount of ground and can be used in research on many topics. Evidence from BCS70 has illuminated important issues for our society across five decades. Key findings include how reading for pleasure matters for children's cognitive development, why grammar schools have not reduced social inequalities, and how childhood experiences can impact on mental health in mid-life. Every day researchers from across the scientific community are using this important study to make new connections and discoveries.BCS70 is run by the Centre for Longitudinal Studies (CLS), a research centre in the UCL Institute of Education, which is part of University College London. The content of BCS70 studies, including questions, topics and variables can be explored via the CLOSER Discovery website.How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:For information on how to access biomedical data from BCS70 that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.Secure Access datasetsSecure Access versions of BCS70 have more restrictive access conditions than versions available under the standard End User Licence (EUL). In 2012, consent was sought for data linkage of health administrative records from the Hospital Episode Statistics (HES) to survey data for cohort members in the 1970 British Cohort Study (BCS70). The main aim of this data linkage exercise is to enhance the research potential of the study, by combining administrative record with the rich information collected in the surveys. The 1970 British Cohort Study: Linked Health Administrative Datasets (Hospital Episode Statistics), England, 1997-2023: Secure Access contains information about all hospital admissions in England. The following linked HES data are available: 1) Accident and Emergency (A&E) The A&E dataset details each attendance to an Accident and Emergency care facility in England, between 01-04-2007 and 31-03-2019 (inclusive). It includes major A&E departments, single speciality A&E departments, minor injury units and walk-in centres in England. 2) Admitted Patient Care (APC) The APC data summarises episodes of care for admitted patients, where the episode occurred between 01-04-1997 and 31-03-2023 (inclusive). 3) Critical Care (CC) The CC dataset covers records of critical care activity between 01-04-2009 and 31-03-2023 (inclusive). 4) Out Patient (OP) The OP dataset lists the outpatient appointments between 01-04-2003 and 31-03-2023 (inclusive). 5) Emergency Care Dataset (ECDS) The ECDS lists the emergency care appointments between 01-04-2020 and 31-03-2023 (inclusive). 6) Consent data The consents dataset describes consent to linkage, and is current at the time of deposit CLS/ NHS Digital Sub-licence agreement NHS Digital has given CLS permission for onward sharing of the Next Steps/HES dataset via the UKDS Secure Lab. In order to ensure data minimisation, NHS Digital requires that researchers only access the HES variables needed for their approved research project. Therefore, the HES linked data provided by the UKDS to approved researchers will be subject to sub-setting of variables. The researcher will need to request a specific sub-set of variables from the Next Steps HES data dictionary, which will subsequently make available within their UKDS Secure Account. Once the researcher has finished their research, the UKDS will delete the tailored dataset for that specific project.
Any party wishing to access the data deposited at the UK Data Service will be required to enter into a Licence agreement with CLS (UCL), in addition to the agreements signed with the UKDS, provided in the application pack.
The Licensee shall acknowledge in any publication, whether printed, electronic or broadcast, based wholly or in part on such materials, both the source of the data and UCL. An example of an appropriate acknowledgement can be found here: https://cls.ucl.ac.uk/data-access-training/citing-our-data/.
Facebook
TwitterThe National Child Development Study (NCDS) is a continuing longitudinal study that seeks to follow the lives of all those living in Great Britain who were born in one particular week in 1958. The aim of the study is to improve understanding of the factors affecting human development over the whole lifespan.
The NCDS has its origins in the Perinatal Mortality Survey (PMS) (the original PMS study is held at the UK Data Archive under SN 2137). This study was sponsored by the National Birthday Trust Fund and designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the 17,000 children born in England, Scotland and Wales in that one week. Selected data from the PMS form NCDS sweep 0, held alongside NCDS sweeps 1-3, under SN 5565.
Survey and Biomeasures Data (GN 33004):
To date there have been ten attempts to trace all members of the birth cohort in order to monitor their physical, educational and social development. The first three sweeps were carried out by the National Children's Bureau, in 1965, when respondents were aged 7, in 1969, aged 11, and in 1974, aged 16 (these sweeps form NCDS1-3, held together with NCDS0 under SN 5565). The fourth sweep, also carried out by the National Children's Bureau, was conducted in 1981, when respondents were aged 23 (held under SN 5566). In 1985 the NCDS moved to the Social Statistics Research Unit (SSRU) - now known as the Centre for Longitudinal Studies (CLS). The fifth sweep was carried out in 1991, when respondents were aged 33 (held under SN 5567). For the sixth sweep, conducted in 1999-2000, when respondents were aged 42 (NCDS6, held under SN 5578), fieldwork was combined with the 1999-2000 wave of the 1970 Birth Cohort Study (BCS70), which was also conducted by CLS (and held under GN 33229). The seventh sweep was conducted in 2004-2005 when the respondents were aged 46 (held under SN 5579), the eighth sweep was conducted in 2008-2009 when respondents were aged 50 (held under SN 6137), the ninth sweep was conducted in 2013 when respondents were aged 55 (held under SN 7669), and the tenth sweep was conducted in 2020-24 when the respondents were aged 60-64 (held under SN 9412).
A Secure Access version of the NCDS is available under SN 9413, containing detailed sensitive variables not available under Safeguarded access (currently only sweep 10 data). Variables include uncommon health conditions (including age at diagnosis), full employment codes and income/finance details, and specific life circumstances (e.g. pregnancy details, year/age of emigration from GB).
Four separate datasets covering responses to NCDS over all sweeps are available. National Child Development Deaths Dataset: Special Licence Access (SN 7717) covers deaths; National Child Development Study Response and Outcomes Dataset (SN 5560) covers all other responses and outcomes; National Child Development Study: Partnership Histories (SN 6940) includes data on live-in relationships; and National Child Development Study: Activity Histories (SN 6942) covers work and non-work activities. Users are advised to order these studies alongside the other waves of NCDS.
From 2002-2004, a Biomedical Survey was completed and is available under Safeguarded Licence (SN 8731) and Special Licence (SL) (SN 5594). Proteomics analyses of blood samples are available under SL SN 9254.
Linked Geographical Data (GN 33497):
A number of geographical variables are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies.
Linked Administrative Data (GN 33396):
A number of linked administrative datasets are available, under more restrictive access conditions, which can be linked to the NCDS EUL and SL access studies. These include a Deaths dataset (SN 7717) available under SL and the Linked Health Administrative Datasets (SN 8697) available under Secure Access.
Multi-omics Data and Risk Scores Data (GN 33592)
Proteomics analyses were run on the blood samples collected from NCDS participants in 2002-2004 and are available under SL SN 9254. Metabolomics analyses were conducted on respondents of sweep 10 and are available under SL SN 9411. Polygenic indices are available under SL SN 9439. Derived summary scores have been created that combine the estimated effects of many different genes on a specific trait or characteristic, such as a person's risk of Alzheimer's disease, asthma, substance abuse, or mental health disorders, for example. These scores can be combined with existing survey data to offer a more nuanced understanding of how cohort members' outcomes may be shaped.
Additional Sub-Studies (GN 33562):
In addition to the main NCDS sweeps, further studies have also been conducted on a range of subjects such as parent migration, unemployment, behavioural studies and respondent essays. The full list of NCDS studies available from the UK Data Service can be found on the NCDS series access data webpage.
How to access genetic and/or bio-medical sample data from a range of longitudinal surveys:
For information on how to access biomedical data from NCDS that are not held at the UKDS, see the CLS Genetic data and biological samples webpage.
Further information about the full NCDS series can be found on the Centre for Longitudinal Studies website.
The National Child Development Study: Linked Health Administrative Datasets (Hospital Episode Statistics), England, 1997-2023: Secure Access includes data files from the NHS Digital HES database for those cohort members who provided consent to health data linkage in the Age 50 sweep. The HES database contains information about all hospital admissions in England. The following linked HES data are available:
1) Accident and Emergency (A&E)
The A&E dataset details each attendance to an Accident and Emergency care facility in England, between 01-04-2007 and 31-03-2020 (inclusive). It includes major A&E departments, single speciality A&E departments, minor injury units and walk-in centres in England.
2) Admitted Patient Care (APC)
The APC data summarises episodes of care for admitted patients, where the episode occurred between 01-04-1997 and 31-03-2023 (inclusive).
3) Critical Care (CC)
The CC dataset covers records of critical care activity between 01-04-2009 and 31-03-2023 (inclusive).
4) Out Patient (OP)
The OP dataset lists the outpatient appointments between 01-04-2003 and 31-03-2023 (inclusive).
5) Emergency Care Dataset (ECDS)
The ECDS lists the emergency care appointments between 01-04-2020 and 31-03-2023 (inclusive).
6) Consent data
The consents dataset describes consent to linkage, and is current at the time of deposit.
CLS/ NHS Digital Sub-licence agreement
NHS Digital has given CLS permission for onward sharing of the NCDS/HES dataset via the UKDS Secure Lab. In order to ensure data minimisation, NHS Digital requires that researchers only access the HES variables needed for their approved research project. Therefore, the HES linked data provided by the UKDS to approved researchers will be subject to sub-setting of variables. The researcher will need to request a specific sub-set of variables from the NCDS/HES data dictionary, which will subsequently be made available within their UKDS Secure Account. Once the researcher has finished their research, the UKDS will delete the tailored dataset for that specific project. Any party wishing to access the data deposited at the UK Data Service will be required to enter into a Licence agreement with CLS (UCL), in addition to the agreements signed with the UKDS, provided in the application pack.
CLS Hospital Episode Statistics data access update July 2025
From March 2027, HES data linked to all four CLS studies will no longer be available via the UK Data Service. For projects ending before March 2027, uses should continue to apply via UKDS. However, if access to a wider range of linked Longitudinal Population Studies data is needed, UKLLC might be more suitable. For projects ending after March 2027, users must apply via UKLLC.
Latest edition information
For the third
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This study explores the transformation of women's wage work in New England between 1820 and 1900 with distinct studies of rural outwork, cotton textile manufacturing, boot and shoemaking, domestic service, needle trades, and teaching. Appendices in TRANSFORMING WOMEN'S WORK (Cornell University Press, 1994) discuss in detail how the datasets were constructed. Beginning with either employment or census records, Thomas Dublin employed nominal record linkage to assemble life course data on groups of women workers in Mew England.
Facebook
TwitterObjectivesComorbidity is prevalent in older working ages and might affect employment exits. This study aimed to 1) assess the associations between comorbidity and different employment exit routes, and 2) examine such associations by gender.MethodsWe used data from employed adults aged 50–62 in the Stockholm Public Health Survey 2002 and 2006, linked to longitudinal administrative income records (N = 10,416). The morbidity measure combined Limiting Longstanding Illness and Common Mental Disorder—captured by the General Health Questionnaire-12 (≥4)—into a categorical variable: 1) No Limiting Longstanding Illness, no Common Mental Disorder, 2) Limiting Longstanding Illness only, 3) Common Mental Disorder only, and 4) comorbid Limiting Longstanding Illness+Common Mental Disorder. Employment status was followed up until 2010, treating early retirement, disability pension and unemployment as employment exits. Competing risk regression analysed the associations between morbidity and employment exit routes, stratifying by gender.ResultsCompared to No Limiting Longstanding Illness, no Common Mental Disorder, comorbid Limiting Longstanding Illness+Common Mental Disorder was associated with early retirement in men (subdistribution hazard ratio = 1.73, 95% confidence intervals: 1.08–2.76), but not in women. For men and women, strong associations for disability pension were observed with Limiting Longstanding Illness only (subdistribution hazard ratio = 11.43, 95% confidence intervals: 9.40–13.89) and Limiting Longstanding Illness+Common Mental Disorder (subdistribution hazard ratio = 14.25, 95% confidence intervals: 10.91–18.61), and to a lesser extent Common Mental Disorder only (subdistribution hazard ratio = 2.00, 95% confidence intervals: 1.31–3.05). Women were more likely to exit through disability pension than men (subdistribution hazard ratio = 1.96, 95% confidence intervals: 1.60–2.39). Common Mental Disorder only was the only morbidity category associated with unemployment (subdistribution hazard ratio = 1.70, 95% confidence intervals: 1.36–2.15).ConclusionsStrong associations were observed between specific morbidity categories with different employment exit routes, which differed by gender. Initiatives to extend working lives should consider older workers’ varied health needs to prevent inequalities in older age.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo investigate public willingness to share sensitive health information for research, health policy and clinical practice.MethodsA total of 1,003 Australian respondents answered an online, attribute-driven, survey in which participants were asked to accept or reject hypothetical choice sets based on a willingness to share their health data for research and frontline-medical support as part of an integrated health system. The survey consisted of 5 attributes: Stakeholder access for analysis (Analysing group); Type of information collected; Purpose of data collection; Information governance; and Anticipated benefit; the results of which were analysed using logistic regression.ResultsWhen asked about their preference for sharing their health data, respondents had no preference between data collection for the purposes of clinical practice, health policy or research, with a slight preference for having government organisations manage, govern and curate the integrated datasets from which the analysis was being conducted. The least preferred option was for personal health records to be integrated with insurance records or for their data collected by privately owned corporate organisations. Individuals preferred their data to be analysed by a public healthcare provider or government staff and expressed a dislike for any private company involvement.ConclusionsThe findings from this study suggest that Australian consumers prefer to share their health data when there is government oversight, and have concerns about sharing their anonymised health data for clinical practice, health policy or research purposes unless clarity is provided pertaining to its intended purpose, limitations of use and restrictions to access. Similar findings have been observed in the limited set of existing international studies utilising a stated preference approach. Evident from this study, and supported by national and international research, is that the establishment and preservation of a social license for data linkage in health research will require routine public engagement as a result of continuously evolving technological advancements and fluctuating risk tolerance. Without more work to understand and address stakeholder concerns, consumers risk being reluctant to participate in data-sharing and linkage programmes.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Older adults represent a large and growing section of Aotearoa New Zealand's population. Longitudinal research on experiences of later life enables understanding of both the capabilities with which people are ageing, and their determinants. The Health, Work, and Retirement (HWR) study has to date conducted eight biennial longitudinal postal surveys of health and well-being with older people (n = 11,601 respondents; 49.4% of Māori descent). Survey data are linked at the individual-level to other modes of data collection, including cognitive assessments, life course history interviews, and national health records. This article describes the HWR study and its potential to support our understanding of ageing in Aotearoa New Zealand. We present an illustrative analysis of data collected to date, using indicators of physical health-related functional ability from n = 10,728 adults aged 55–80 to describe mean trajectories of physical ability with age, by birth cohort and gender. As the original participant cohort recruited in 2006 reach ages 71–86 in 2022, future directions for study include expanding the study's core longitudinal measures to include follow-up assessments of cognitive functioning to understand factors predicting cognitive decline, and linkage to national datasets to identify population-level profiles of risk for conditions such as frailty.
Facebook
TwitterBig Data and Society Abstract & Indexing - ResearchHelpDesk - Big Data & Society (BD&S) is open access, peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies. The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business, and government relations, expertise, methods, concepts, and knowledge. BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practice that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, the content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government, and crowdsourced data. Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes. BD&S seeks contributions that analyze Big Data practices and/or involve empirical engagements and experiments with innovative methods while also reflecting on the consequences for how societies are represented (epistemologies), realized (ontologies) and governed (politics). Article processing charge (APC) The article processing charge (APC) for this journal is currently 1500 USD. Authors who do not have funding for open access publishing can request a waiver from the publisher, SAGE, once their Original Research Article is accepted after peer review. For all other content (Commentaries, Editorials, Demos) and Original Research Articles commissioned by the Editor, the APC will be waived. Abstract & Indexing Clarivate Analytics: Social Sciences Citation Index (SSCI) Directory of Open Access Journals (DOAJ) Google Scholar Scopus
Facebook
TwitterPost-crash care is one of the pillars of the safe systems approach to road safety. While the data on road casualties reported to police through the STATS19 system contains detailed information of accident circumstances and vehicles involved, information on clinical outcomes is more limited.
The https://www.tarn.ac.uk/Home.aspx">Trauma Audit and Research Network (TARN) database provides an excellent platform for trauma research. TARN data can be made available for https://www.tarn.ac.uk/Content.aspx?ca=9">research related to trauma care, and provides a source of detailed information on the clinical outcomes for more seriously injured road casualties.
This work represents an initial attempt to establish the feasibility of linking STATS19 and TARN data as the first step in exploring the value of TARN data in improving the evidence base on post-crash care for road casualties and the burdens of road casualties on the NHS.
This work focuses on establishing a method for linking the two datasets. It is expected that a further update, including analysis of the linked data, will be published alongside the publication of Reported Road Casualties Great Britain in September 2022.
Feedback on the work to date is welcome by email to the road safety statistics team.
Road safety statistics
Email mailto:roadacc.stats@dft.gov.uk">roadacc.stats@dft.gov.uk
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tough Tables (2T) is a dataset designed to evaluate table annotation approaches in solving the CEA and CTA tasks. The dataset is compliant with the data format used in SemTab 2019, and it can be used as an additional dataset without any modification. The target knowledge graph is DBpedia 2016-10. Check out the 2T GitHub repository for more details about the dataset generation.
New in v2.0: We release the updated version of 2T_WD! The target knowledge graph is Wikidata (online instance) and the dataset complies with the SemTab 2021 data format.
This work is based on the following paper:
Cutrona, V., Bianchi, F., Jimenez-Ruiz, E. and Palmonari, M. (2020). Tough Tables: Carefully Evaluating Entity Linking for Tabular Data. ISWC 2020, LNCS 12507, pp. 1–16.
Note on License: This dataset includes data from the following sources. Refer to each source for license details: - Wikipedia https://www.wikipedia.org/ - DBpedia https://dbpedia.org/ - Wikidata https://www.wikidata.org/ - SemTab 2019 https://doi.org/10.5281/zenodo.3518539 - GeoDatos https://www.geodatos.net - The Pudding https://pudding.cool/ - Offices.net https://offices.net/ - DATA.GOV https://www.data.gov/
THIS DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Changelog:
v2.0
New GT for 2T_WD
A few entities have been removed from the CEA GT, because they are no longer represented in WD (e.g., dbr:Devonté points to wd:Q21155080, which does not exist)
Tables codes and values differ from the previous version, because of the random noise.
Updated ancestor/descendant hierarchies to evaluate CTA.
v1.0
New Wikidata version (2T_WD)
Fix header for tables CTRL_DBP_MUS_rock_bands_labels.csv and CTRL_DBP_MUS_rock_bands_labels_NOISE2.csv (column 2 was reported with id 1 in target - NOTE: the affected column has been removed from the SemTab2020 evaluation)
Remove duplicated entries in tables
Remove rows with wrong values (e.g., the Kazakhstan entity has an empty name "''")
Many rows and noised columns are shuffled/changed due to the random noise generator algorithm
Remove row "Florida","Floorida","New York, NY" from TOUGH_WEB_MISSP_1000_us_cities.csv (and all its NOISE1 variants)
Fix header of tables:
CTRL_WIKI_POL_List_of_current_monarchs_of_sovereign_states.csv
CTRL_WIKI_POL_List_of_current_monarchs_of_sovereign_states_NOISE2.csv
TOUGH_T2D_BUS_29414811_2_4773219892816395776_videogames_developers.csv
TOUGH_T2D_BUS_29414811_2_4773219892816395776_videogames_developers_NOISE2.csv
v0.1-pre
First submission. It contains only tables, without GT and Targets.
Facebook
Twitterhttps://data.go.kr/ugs/selectPortalPolicyView.dohttps://data.go.kr/ugs/selectPortalPolicyView.do
Facebook
TwitterLinking survey and administrative data offers the possibility of combining the strengths, and mitigating the weaknesses, of both. Such linkage is therefore an extremely promising basis for future empirical research in social science. For ethical and legal reasons, linking administrative data to survey responses will usually require obtaining explicit consent. It is well known that not all respondents give consent. Past research on consent has generated many null and inconsistent findings. A weakness of the existing literature is that little effort has been made to understand the cognitive processes of how respondents make the decision whether or not to consent. The overall aim of this project was to improve our understanding about how to pursue the twin goals of maximizing consent and ensuring that consent is genuinely informed. The ultimate objective is to strengthen the data infrastructure for social science and policy research in the UK. Specific aims were: 1. To understand how respondents process requests for data linkage: which factors influence their understanding of data linkage, which factors influence their decision to consent, and to open the black box of consent decisions to begin to understand how respondents make the decision. 2. To develop and test methods of maximising consent in web surveys, by understanding why web respondents are less likely to give consent than face-to-face respondents. 3. To develop and test methods of maximising consent with requests for linkage to multiple data sets, by understanding how respondents process multiple requests. 4. As a by-product of testing hypotheses about the previous points, to test the effects of different approaches to wording consent questions on informed consent.
Our findings are based on a series of experiments conducted in four surveys using two different studies: The Understanding Society Innovation Panel (IP) and the PopulusLive online access panel (AP). The Innovation Panel is part of Understanding Society: the UK Household Longitudinal Study. It is a probability sample of households in Great Britain used for methodological testing, with a design that mirrors that of the main Understanding Society survey. The Innovation Panel survey was conducted in wave 11, fielded in 2018. The Innovation Panel data are available from the UK Data Service (SN: 6849, http://doi.org/10.5255/UKDA-SN-6849-12).
Since the Innovation Panel sample size (around 2,900 respondents) constrained the number of experimental treatment groups we could implement, we fielded a parallel survey with additional experiments, using a different sample. PopulusLive is a non-probability online panel with around 130,000 active sample members, who are recruited through web advertising, word of mouth, and database partners. We used age, gender and education quotas to match the sample composition of the Innovation Panel.
A total of nine experiments were conducted across the two sample sources. Experiments 1 to 5 all used variations of a single consent question, about linkage to tax data (held by HM Revenue and Customs, HMRC). Experiments 6 and 7 also used single consent questions, but respondents were either assigned to questions on tax or health data (held by the National Health Service, NHS) linkage. Experiments 8 and 9 used five different data linkage requests: tax data (held by HMRC), health data (held by the NHS), education data (held by the Department for Education in England, DfE, and equivalent departments in Scotland and Wales), household energy data (held the Department for Business, Energy and Industrial Strategy, BEIS), and benefit and pensions data (held by the Department for Work and Pensions, DWP).
The experiments, and the survey(s) on which they were conducted, are briefly summarized here:
1. Easy vs. standard wording of consent request (IP and AP). Half the respondents were allocated to the ‘standard’ question wording, used previously in Understanding Society. The balance was allocated to an ‘easy’ version, where the text was rewritten to reduce reading difficulty and to provide all essential information about the linkage in the question text rather than an additional information leaflet.
2. Early vs. late placement of consent question (IP). Half the respondents were asked for consent early in the interview, the other half were asked at the end.
3. Web vs. face-to-face interview (IP). This experiment exploits the random assignment of IP cases to explore mode effects on consent.
4. Default question wording (AP). Experiment 4 tested a default approach to giving consent, asking respondents to “Press ‘next’ to continue” or explicitly opt out, versus the standard opt-in consent procedure.
5. Additional information question wording (AP). This experiment tested the effect of offering additional information, with a version that added a third response option (“I need more information before making a decision”) to the standard ‘yes’ or no’ options.
6. Data linkage domain (AP). Half the respondents were assigned to a question asking for consent to link to HMRC data; the other half were asked for linkage to NHS data.
7. Trust priming (AP).This experiment was crossed with the data linkage domain experiment, and focused on the effect of priming trust on consent. Half the sample saw an additional statement: “HMRC / The NHS is a trusted data holder” on an introductory screen prior to the consent question. This was followed by an icon symbolizing data security: a shield and lock symbol with the heading “Trust”. The balance was not shown the additional statement or icon.
8. Format of multiple consents (AP). For one group, the five consent questions were each presented on a separate page, with respondents consenting to each in turn. For the second group the questions were all presented on one page; however, the respondent still had to answer each consent question individually. For the third group all five data requests were presented on a single page and the respondent answered a single yes/no question, whether they consented to all the linkages or not.
9. Order of multiple consents (AP). One version asked the five consent questions in ascending order of sensitivity of the request (based on previous data), with NHS asked first. The other version reversed the order, with consent to linkage to HMRC data asked first.
For all of the experiments described above, we examined the rates of consent. We also tested comprehension of the consent request, using a series of knowledge questions about the consent process. We also measured subjective understanding, to get a sense of how much respondents felt they understood about the request. Finally, we also ascertained subjective confidence in the decision they had made.
In additional to the experiments, we used digital audio-recordings of the IP11 face-to-face interviews (recorded with respondents’ permission) to explore how interviewers communicate the consent request to respondents, whether and how they provide additional information or attempt to persuade respondents to consent, and whether respondents raise questions when asked for consent to data linkage.
Key Findings
Correlates of consent:
(1) Respondents who have better understanding of the data linkage request (as measured by a set of knowledge questions) are also more likely to consent.
(2) As in previous studies, we find no socio-demographic characteristics that consistently predict consent in all samples. The only consistent predictors are positive attitudes towards data sharing, trust in HMRC, and knowledge of what data HMRC have.
(3) Respondents are less likely to consent to data linkage if the wording of the request is difficult and the question is asked late in the questionnaire. Position has no effect on consent if the wording is easy; wording has no effect on consent if the position is early.
(4) Priming respondents to think about trust in the organisations involved in the data linkage increases consent.
(5) The only socio-demographic characteristic that consistently predicts objective understanding of the linkage request is education. Understanding is positively associated with the number of online data sharing behaviours (e.g., posting text or images on social media, downloading apps, online purchases or banking) and with trust in HMRC.
(6) Easy wording of the consent question increases objective understanding of the linkage request. Position of the consent question in the questionnaire has no effect on understanding.
The consent decision process: (7) Respondents decide about the consent request in different ways: some use more reflective decision-making strategies, others use less reflective strategies. (8) Different decision processes are associated with very different levels of consent, comprehension, and confidence in the consent decision. (9) Placing the consent request earlier in the survey increases the probability of the respondent using a reflective decision-making process.
Effects of mode of data collection on consent: (10) As in previous studies, respondents are less likely to consent online than with an interviewer. (11) Web respondents have lower levels of understanding than face-to-face respondents. (12) There is no difference by mode in respondents’ confidence in their decisions. (13) Web respondents report higher levels of concern about data security than face-to-face respondents. (14) Web respondents are less likely to use reflective strategies to make their decision than face-to-face respondents, and instead more likely to make habit-based decisions. (15) Easier wording of the consent request does not reduce mode effects on rates of consent. (16) Respondents rarely ask questions and interviewers rarely provide additional information.
Multiple consent requests: (17) The format in which a sequence of consent requests is asked does not seem to matter. (18) The order of multiple consent requests affects