Facebook
TwitterThese documents present updated record level data created from the National Energy Efficiency Data-Framework (NEED):
Please forward any feedback to energyefficiency.stats@energysecurity.gov.uk.
Facebook
TwitterThese documents present updated record level data created from the National Energy Efficiency Data-Framework (NEED):
Please forward any feedback to energyefficiency.stats@beis.gov.uk.
Facebook
Twitter
According to our latest research, the global healthcare data anonymization services market size reached USD 1.42 billion in 2024, reflecting a robust expansion driven by increasing regulatory demands and heightened focus on patient privacy. The market is projected to grow at a CAGR of 15.8% from 2025 to 2033, with the total market value expected to reach USD 5.44 billion by 2033. This impressive growth trajectory is underpinned by the rising adoption of digital health solutions, stringent data protection laws, and the ongoing digitalization of healthcare records worldwide.
The primary growth factor fueling the healthcare data anonymization services market is the proliferation of electronic health records (EHRs) and the expanding use of big data analytics in healthcare. As healthcare providers and organizations increasingly leverage advanced analytics for improving patient outcomes, there is a corresponding surge in data generation. However, these vast datasets often contain sensitive patient information, making data anonymization essential to ensure compliance with regulations such as HIPAA, GDPR, and other regional privacy laws. The increasing frequency of data breaches and cyberattacks has further highlighted the importance of robust anonymization services, prompting healthcare organizations to prioritize investments in data privacy and security solutions. As a result, demand for both software and service-based anonymization solutions continues to rise, contributing significantly to market growth.
Another key driver for the healthcare data anonymization services market is the growing emphasis on research and clinical trials, which require the sharing and analysis of large volumes of patient data. Pharmaceutical and biotechnology companies, as well as research organizations, are increasingly collaborating across borders, necessitating the anonymization of datasets to protect patient identities and comply with international data protection standards. The adoption of cloud-based healthcare solutions has also facilitated the secure and efficient sharing of anonymized data, supporting advancements in personalized medicine and population health management. As organizations seek to balance innovation with compliance, the demand for advanced anonymization technologies that offer high accuracy and scalability is expected to accelerate further.
Technological advancements in artificial intelligence (AI) and machine learning (ML) are also shaping the future of the healthcare data anonymization services market. These technologies are enabling more sophisticated and automated anonymization processes, reducing the risk of re-identification while maintaining data utility for research and analytics. The integration of AI-driven tools into anonymization workflows is helping organizations streamline operations, minimize human error, and achieve greater compliance with evolving regulatory requirements. Additionally, the increasing availability of customizable and interoperable anonymization solutions is making it easier for healthcare organizations of all sizes to adopt and scale these services, thereby broadening the market’s reach and impact.
From a regional perspective, North America continues to dominate the healthcare data anonymization services market, accounting for the largest share in 2024. This leadership position is attributed to the presence of advanced healthcare infrastructure, widespread adoption of EHRs, and strict regulatory frameworks governing patient data privacy. Europe follows closely, driven by the enforcement of the General Data Protection Regulation (GDPR) and a strong culture of data protection. The Asia Pacific region is witnessing the fastest growth, propelled by increasing healthcare digitalization, government initiatives to modernize healthcare systems, and rising awareness of data privacy among patients and providers. Latin America and the Middle East & Africa are also experiencing steady growth, albeit from a smaller base, as healthcare organizations in these regions begin to prioritize data security and compliance.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The main population base for published statistical tables from the 2011 Census in Northern Ireland is the usual resident population base as at Census day, 27 March 2011. By way of background, for 2011 Census purposes a usual resident of the United Kingdom (UK) is anyone who, on Census day, was in the UK and had stayed or intended to stay in the UK for a period of 12 months or more, or had a permanent UK address and was outside the UK and had intended to be outside the UK for less than 12 months. Against this background, the 2011 Census Microdata Sample of Anonymised Records (SARs) Teaching File comprises a sample of 19,862 records (approximately 1 per cent) relating to people who were usually resident in Northern Ireland at the time of the 2011 Census. For each individual, information is available for seventeen separate characteristics (for example, sex, age, marital status) to varying degrees of detail. Both the size of the sample and the content of the records in the file have been harmonised, wherever possible, with the equivalent SARs teaching file that the Office for National Statistics simultaneously released for England and Wales. Purpose The primary purpose of the teaching file, which comprises unit-record level data as opposed to statistical aggregates, is as an educational tool aimed at: encouraging wider use of Census data by facilitating another way of examining Census data, for example through the building of statistical models, over and above that already available through the raft of standard tabular output released to date; providing a broad insight into the sort of detail that is generally included in a SARs product, along with data formats and any associated metadata. This will enable users (arguably those less experienced at using SARs products) to ‘play’ with the data and increase their knowledge and skills in readiness for accessing the more detailed SARs products that are planned and will be available in, for example, a safe setting; and assisting with the teaching of statistics and geography at GCSE and higher levels.
Facebook
TwitterThis data set is a collection of anonymized sample fundraising data sets so that practitioners within our field can practice and share examples using a common data source
If you have any anonymous data that you would like to include here let me know: Michael Pawlus (pawlus@usc.edu)
Thanks to everyone who has shared data so far to make this possible.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File is a coding example of the anonymised data
Facebook
TwitterOverview
The CKW Group is a distribution system operator that supplies more than 200,000 end customers in Central Switzerland. Since October 2022, CKW publishes anonymised and aggregated data from smart meters that measure electricity consumption in canton Lucerne. This unique dataset is accessible in the ckw.ch/opendata platform.
Data set A - anonimised smart meter data
Data set B - aggregated smart meter data
Contents of this data set
This data set contains a small sample of the CKW data set A sorted per smart meter ID, stored as parquet files named with the id field of the corresponding smart meter anonymised data. Example: 027ceb7b8fd77a4b11b3b497e9f0b174.parquet
The orginal CKW data is available for download at https://open.data.axpo.com/%24web/index.html#dataset-a as a (gzip-compressed) csv files, which are are split into one file per calendar month. The columns in the files csv are:
id: the anonymized counter ID (text)
timestamp: the UTC time at the beginning of a 15-minute time window to which the consumption refers (ISO-8601 timestamp)
value_kwh: the consumption in kWh in the time window under consideration (float)
In this archive, data from:
| Dateigrösse | Export Datum | Zeitraum | Dateiname || ----------- | ------------ | -------- | --------- || 4.2GiB | 2024-04-20 | 202402 | ckw_opendata_smartmeter_dataset_a_202402.csv.gz || 4.5GiB | 2024-03-21 | 202401 | ckw_opendata_smartmeter_dataset_a_202401.csv.gz || 4.5GiB | 2024-02-20 | 202312 | ckw_opendata_smartmeter_dataset_a_202312.csv.gz || 4.4GiB | 2024-01-20 | 202311 | ckw_opendata_smartmeter_dataset_a_202311.csv.gz || 4.5GiB | 2023-12-20 | 202310 | ckw_opendata_smartmeter_dataset_a_202310.csv.gz || 4.4GiB | 2023-11-20 | 202309 | ckw_opendata_smartmeter_dataset_a_202309.csv.gz || 4.5GiB | 2023-10-20 | 202308 | ckw_opendata_smartmeter_dataset_a_202308.csv.gz || 4.6GiB | 2023-09-20 | 202307 | ckw_opendata_smartmeter_dataset_a_202307.csv.gz || 4.4GiB | 2023-08-20 | 202306 | ckw_opendata_smartmeter_dataset_a_202306.csv.gz || 4.6GiB | 2023-07-20 | 202305 | ckw_opendata_smartmeter_dataset_a_202305.csv.gz || 3.3GiB | 2023-06-20 | 202304 | ckw_opendata_smartmeter_dataset_a_202304.csv.gz || 4.6GiB | 2023-05-24 | 202303 | ckw_opendata_smartmeter_dataset_a_202303.csv.gz || 4.2GiB | 2023-04-20 | 202302 | ckw_opendata_smartmeter_dataset_a_202302.csv.gz || 4.7GiB | 2023-03-20 | 202301 | ckw_opendata_smartmeter_dataset_a_202301.csv.gz || 4.6GiB | 2023-03-15 | 202212 | ckw_opendata_smartmeter_dataset_a_202212.csv.gz || 4.3GiB | 2023-03-15 | 202211 | ckw_opendata_smartmeter_dataset_a_202211.csv.gz || 4.4GiB | 2023-03-15 | 202210 | ckw_opendata_smartmeter_dataset_a_202210.csv.gz || 4.3GiB | 2023-03-15 | 202209 | ckw_opendata_smartmeter_dataset_a_202209.csv.gz || 4.4GiB | 2023-03-15 | 202208 | ckw_opendata_smartmeter_dataset_a_202208.csv.gz || 4.4GiB | 2023-03-15 | 202207 | ckw_opendata_smartmeter_dataset_a_202207.csv.gz || 4.2GiB | 2023-03-15 | 202206 | ckw_opendata_smartmeter_dataset_a_202206.csv.gz || 4.3GiB | 2023-03-15 | 202205 | ckw_opendata_smartmeter_dataset_a_202205.csv.gz || 4.2GiB | 2023-03-15 | 202204 | ckw_opendata_smartmeter_dataset_a_202204.csv.gz || 4.1GiB | 2023-03-15 | 202203 | ckw_opendata_smartmeter_dataset_a_202203.csv.gz || 3.5GiB | 2023-03-15 | 202202 | ckw_opendata_smartmeter_dataset_a_202202.csv.gz || 3.7GiB | 2023-03-15 | 202201 | ckw_opendata_smartmeter_dataset_a_202201.csv.gz || 3.5GiB | 2023-03-15 | 202112 | ckw_opendata_smartmeter_dataset_a_202112.csv.gz || 3.1GiB | 2023-03-15 | 202111 | ckw_opendata_smartmeter_dataset_a_202111.csv.gz || 3.0GiB | 2023-03-15 | 202110 | ckw_opendata_smartmeter_dataset_a_202110.csv.gz || 2.7GiB | 2023-03-15 | 202109 | ckw_opendata_smartmeter_dataset_a_202109.csv.gz || 2.6GiB | 2023-03-15 | 202108 | ckw_opendata_smartmeter_dataset_a_202108.csv.gz || 2.4GiB | 2023-03-15 | 202107 | ckw_opendata_smartmeter_dataset_a_202107.csv.gz || 2.1GiB | 2023-03-15 | 202106 | ckw_opendata_smartmeter_dataset_a_202106.csv.gz || 2.0GiB | 2023-03-15 | 202105 | ckw_opendata_smartmeter_dataset_a_202105.csv.gz || 1.7GiB | 2023-03-15 | 202104 | ckw_opendata_smartmeter_dataset_a_202104.csv.gz || 1.6GiB | 2023-03-15 | 202103 | ckw_opendata_smartmeter_dataset_a_202103.csv.gz || 1.3GiB | 2023-03-15 | 202102 | ckw_opendata_smartmeter_dataset_a_202102.csv.gz || 1.3GiB | 2023-03-15 | 202101 | ckw_opendata_smartmeter_dataset_a_202101.csv.gz |
was processed into partitioned parquet files, and then organised by id into parquet files with data from single smart meters.
A small sample of all the smart meters data above, are archived in the cloud public cloud space of AISOP project https://os.zhdk.cloud.switch.ch/swift/v1/aisop_public/ckw/ts/batch_0424/batch_0424.zip and also here is this public record. For access to the complete data contact the authors of this archive.
It consists of the following parquet files:
| Size | Date | Name |
|------|------|------|
| 1.0M | Mar 4 12:18 | 027ceb7b8fd77a4b11b3b497e9f0b174.parquet |
| 979K | Mar 4 12:18 | 03a4af696ff6a5c049736e9614f18b1b.parquet |
| 1.0M | Mar 4 12:18 | 03654abddf9a1b26f5fbbeea362a96ed.parquet |
| 1.0M | Mar 4 12:18 | 03acebcc4e7d39b6df5c72e01a3c35a6.parquet |
| 1.0M | Mar 4 12:18 | 039e60e1d03c2afd071085bdbd84bb69.parquet |
| 931K | Mar 4 12:18 | 036877a1563f01e6e830298c193071a6.parquet |
| 1.0M | Mar 4 12:18 | 02e45872f30f5a6a33972e8c3ba9c2e5.parquet |
| 662K | Mar 4 12:18 | 03a25f298431549a6bc0b1a58eca1f34.parquet |
| 635K | Mar 4 12:18 | 029a46275625a3cefc1f56b985067d15.parquet |
| 1.0M | Mar 4 12:18 | 0301309d6d1e06c60b4899061deb7abd.parquet |
| 1.0M | Mar 4 12:18 | 0291e323d7b1eb76bf680f6e800c2594.parquet |
| 1.0M | Mar 4 12:18 | 0298e58930c24010bbe2777c01b7644a.parquet |
| 1.0M | Mar 4 12:18 | 0362c5f3685febf367ebea62fbc88590.parquet |
| 1.0M | Mar 4 12:18 | 0390835d05372cb66f6cd4ca662399e8.parquet |
| 1.0M | Mar 4 12:18 | 02f670f059e1f834dfb8ba809c13a210.parquet |
| 987K | Mar 4 12:18 | 02af749aaf8feb59df7e78d5e5d550e0.parquet |
| 996K | Mar 4 12:18 | 0311d3c1d08ee0af3edda4dc260421d1.parquet |
| 1.0M | Mar 4 12:18 | 030a707019326e90b0ee3f35bde666e0.parquet |
| 955K | Mar 4 12:18 | 033441231b277b283191e0e1194d81e2.parquet |
| 995K | Mar 4 12:18 | 0317b0417d1ec91b5c243be854da8a86.parquet |
| 1.0M | Mar 4 12:18 | 02ef4e49b6fb50f62a043fb79118d980.parquet |
| 1.0M | Mar 4 12:18 | 0340ad82e9946be45b5401fc6a215bf3.parquet |
| 974K | Mar 4 12:18 | 03764b3b9a65886c3aacdbc85d952b19.parquet |
| 1.0M | Mar 4 12:18 | 039723cb9e421c5cbe5cff66d06cb4b6.parquet |
| 1.0M | Mar 4 12:18 | 0282f16ed6ef0035dc2313b853ff3f68.parquet |
| 1.0M | Mar 4 12:18 | 032495d70369c6e64ab0c4086583bee2.parquet |
| 900K | Mar 4 12:18 | 02c56641571fc9bc37448ce707c80d3d.parquet |
| 1.0M | Mar 4 12:18 | 027b7b950689c337d311094755697a8f.parquet |
| 1.0M | Mar 4 12:18 | 02af272adccf45b6cdd4a7050c979f9f.parquet |
| 927K | Mar 4 12:18 | 02fc9a3b2b0871d3b6a1e4f8fe415186.parquet |
| 1.0M | Mar 4 12:18 | 03872674e2a78371ce4dfa5921561a8c.parquet |
| 881K | Mar 4 12:18 | 0344a09d90dbfa77481c5140bb376992.parquet |
| 1.0M | Mar 4 12:18 | 0351503e2b529f53bdae15c7fbd56fc0.parquet |
| 1.0M | Mar 4 12:18 | 033fe9c3a9ca39001af68366da98257c.parquet |
| 1.0M | Mar 4 12:18 | 02e70a1c64bd2da7eb0d62be870ae0d6.parquet |
| 1.0M | Mar 4 12:18 | 0296385692c9de5d2320326eaa000453.parquet |
| 962K | Mar 4 12:18 | 035254738f1cc8a31075d9fbe3ec2132.parquet |
| 991K | Mar 4 12:18 | 02e78f0d6a8fb96050053e188bf0f07c.parquet |
| 1.0M | Mar 4 12:18 | 039e4f37ed301110f506f551482d0337.parquet |
| 961K | Mar 4 12:18 | 039e2581430703b39c359dc62924a4eb.parquet |
| 999K | Mar 4 12:18 | 02c6f7e4b559a25d05b595cbb5626270.parquet |
| 1.0M | Mar 4 12:18 | 02dd91468360700a5b9514b109afb504.parquet |
| 938K | Mar 4 12:18 | 02e99c6bb9d3ca833adec796a232bac0.parquet |
| 589K | Mar 4 12:18 | 03aef63e26a0bdbce4a45d7cf6f0c6f8.parquet |
| 1.0M | Mar 4 12:18 | 02d1ca48a66a57b8625754d6a31f53c7.parquet |
| 1.0M | Mar 4 12:18 | 03af9ebf0457e1d451b83fa123f20a12.parquet |
| 1.0M | Mar 4 12:18 | 0289efb0e712486f00f52078d6c64a5b.parquet |
| 1.0M | Mar 4 12:18 | 03466ed913455c281ffeeaa80abdfff6.parquet |
| 1.0M | Mar 4 12:18 | 032d6f4b34da58dba02afdf5dab3e016.parquet |
| 1.0M | Mar 4 12:18 | 03406854f35a4181f4b0778bb5fc010c.parquet |
| 1.0M | Mar 4 12:18 | 0345fc286238bcea5b2b9849738c53a2.parquet |
| 1.0M | Mar 4 12:18 | 029ff5169155b57140821a920ad67c7e.parquet |
| 985K | Mar 4 12:18 | 02e4c9f3518f079ec4e5133acccb2635.parquet |
| 1.0M | Mar 4 12:18 | 03917c4f2aef487dc20238777ac5fdae.parquet |
| 969K | Mar 4 12:18 | 03aae0ab38cebcb160e389b2138f50da.parquet |
| 914K | Mar 4 12:18 | 02bf87b07b64fb5be54f9385880b9dc1.parquet |
| 1.0M | Mar 4 12:18 | 02776685a085c4b785a3885ef81d427a.parquet |
| 947K | Mar 4 12:18 | 02f5a82af5a5ffac2fe7551bf4a0a1aa.parquet |
| 992K | Mar 4 12:18 | 039670174dbc12e1ae217764c96bbeb3.parquet |
| 1.0M | Mar 4 12:18 | 037700bf3e272245329d9385bb458bac.parquet |
| 602K | Mar 4 12:18 | 0388916cdb86b12507548b1366554e16.parquet |
| 939K | Mar 4 12:18 | 02ccbadea8d2d897e0d4af9fb3ed9a8e.parquet |
| 1.0M | Mar 4 12:18 | 02dc3f4fb7aec02ba689ad437d8bc459.parquet |
| 1.0M | Mar 4 12:18 | 02cf12e01cd20d38f51b4223e53d3355.parquet |
| 993K | Mar 4 12:18 | 0371f79d154c00f9e3e39c27bab2b426.parquet |
where each file contains data from a single smart meter.
Acknowledgement
The AISOP project (https://aisopproject.com/) received funding in the framework of the Joint Programming Platform Smart Energy Systems from European Union's Horizon 2020 research and innovation programme under grant agreement No 883973. ERA-Net Smart Energy Systems joint call on digital transformation for green energy transition.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.
The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.
This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.
The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.
The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.
The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.
The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.
A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):
· I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.
The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk
Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science
Facebook
TwitterIt covers domestic properties in England and Wales. The dataset draws on data from a range of sources matched together at household level and anonymised to ensure no individual household can be identified from the data.
Two datasets have been published by DECC:
More information on this project is available from the National Energy Efficiency Data-Framework: making data available consultation page.
Information on how the datasets are being used is welcomed by DECC. For enquiries and feedback please contact: Energyefficiency.stats@decc.gsi.gov.uk
Facebook
TwitterThese documents present updated record level data created from the National Energy Efficiency Data-Framework (NEED):
Please forward any feedback to energyefficiency.stats@beis.gov.uk.
Facebook
TwitterThe 2001 Census: Special Licence Household Sample of Anonymised Records (SL-HSAR) dataset comprises Sample of Anonymised Records (SARs) data that relate to 29 April 2001. They were created by the Office for National Statistics (ONS) as part of the 2001 Census of Population. All households were asked to complete a form giving information about the household and all individuals living in the household. Completion of the form was compulsory for the entire population. The Census schedule includes questions on housing and tenure, and demographic and socio-economic information for all household members.
The dataset comprises SARs data for 1% of households in England and Wales, including imputed values for households which were not enumerated during the Census. Individual data for households larger than 11 residents have been suppressed. To protect confidentiality, age data have been grouped into 2-year bands and there is no geographical breakdown available. A small amount of perturbation has been applied to the data to protect confidentiality. As with the Individual Licensed SAR (see under SNs 7210 and 7211), separate variables indicate whether or not imputation or perturbation has been applied to any given variable for each case in the sample. Documentation, training and user support for these data is undertaken by the SARs team at the Cathie Marsh Centre for Census and Survey Research (CCSR). A further release of data, which contains additional derived variables, will be made available at a later date.
The Secure Access version replaces the previous Special Licence version that was held under SN 5278, which is no longer available. Prospective users of the Secure Access data will need to fulfil additional requirements, including completion of face-to-face training and agreement to Secure Access' User Agreement and Breaches Penalties Policy, in order to obtain permission to use that version (see 'Access' section below).
Detailed SARs data:
A more detailed version of these data, containing geographical information at the level of Local Authority, is available as a Controlled Access Microdata Sample (CAMS). These can be accessed at all ONS sites. Applications to use these data should be made to ONS; further details can be found on their CAMS web page. The CAMS file includes data for Scotland and Northern Ireland as well as England and Wales.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample data of a subject collected in the APHP INSIDE study
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.
The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.
The University of Sheffield are the designated data controller for this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains anonymized Twitter metadata collected for the purpose of data visualization and exploratory analysis. It includes a small, curated sample of 40 tweets, each represented by key engagement metrics and a binary label (heads_tag) for simple grouping.
The dataset was scraped via the Twitter API and adheres to Twitter’s Developer Policy. It does not contain any tweet text or user-identifiable information, only metadata fields that are permitted for redistribution.
This dataset is designed for:
Due to its small size, it’s ideal for:
| Column Name | Description |
|---|---|
tweet_id | Unique identifier of the tweet |
user_id | Anonymized user ID |
timestamp | Date and time the tweet was posted |
like_count | Number of likes the tweet received |
retweet_count | Number of times the tweet was retweeted |
reply_count | Number of replies to the tweet |
view_count | Number of views on the tweet |
word_count | Number of words in the tweet (text not included) |
heads_tag | Binary label (0 or 1) for abstract grouping (not associated with real entities) |
Facebook
TwitterDRAKO specializes in delivering Anonymous IP Data, focusing on privacy-first approaches to consumer identity and behavior analysis. Our data allows businesses to track user interactions without compromising individual privacy, ensuring compliance with data protection regulations.
Anonymous IP Data is crucial for effective audience targeting, analyzing traffic sources, and measuring campaign performance. By connecting digital audiences through various IPs, we enable a clearer understanding of user journeys across devices and platforms. Beyond IPs, we’re also able to connect these IDs to broader ID types like Mobile Advertising IDs and CTV Ids.
Key Features: - IPV4 and IPV6 in hashed format - Detailed mapping of Anonymous IPs for secure user behavior analysis - Integration with Mobile IP Data for insights into mobile user interactions - Comprehensive Identity Data for enhanced audience profiling - Digital Audience Data to understand demographics and interests - Identity Linkage Data for connecting user profiles across different channels
Use Cases: - Audience segmentation and targeting strategies - Traffic source analysis and optimization - Digital campaign performance measurement - User journey mapping across devices - Compliance-focused marketing solutions
Data Compliance: Our Anonymous IP Data is fully compliant with industry standards for data privacy and security. We prioritize ethical data collection practices, ensuring that user identities remain anonymous while still providing valuable insights.
Data Quality: DRAKO employs rigorous quality assurance protocols to maintain the accuracy and reliability of our Anonymous IP Data. We continuously update our datasets and utilize advanced validation techniques to ensure data integrity.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Water companies in the UK are responsible for testing the quality of drinking water. This dataset contains the results of samples taken from the taps in domestic households to make sure they meet the standards set out by UK and European legislation. This data shows the location, date, and measured levels of determinands set out by the Drinking Water Inspectorate (DWI).
Key Definitions
Aggregation
Process involving summarizing or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes
Anonymisation
Anonymised data is a type of information sanitization in which data anonymisation tools encrypt or remove personally identifiable information from datasets for the purpose of preserving a data subject's privacy
Dataset
Structured and organized collection of related elements, often stored digitally, used for analysis and interpretation in various fields.
Determinand
A constituent or property of drinking water which can be determined or estimated.
DWI
Drinking Water Inspectorate, an organisation “providing independent reassurance that water supplies in England and Wales are safe and drinking water quality is acceptable to consumers.”
DWI Determinands
Constituents or properties that are tested for when evaluating a sample for its quality as per the guidance of the DWI. For this dataset, only determinands with “point of compliance” as “customer taps” are included.
Granularity
Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours
ID
Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.
LSOA
Lower-Level Super Output Area is made up of small geographic areas used for statistical and administrative purposes by the Office for National Statistics. It is designed to have homogeneous populations in terms of population size, making them suitable for statistical analysis and reporting. Each LSOA is built from groups of contiguous Output Areas with an average of about 1,500 residents or 650 households allowing for granular data collection useful for analysis, planning and policy- making while ensuring privacy.
ONS
Office for National Statistics
Open Data Triage
The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data. <
Sample
A sample is a representative segment or portion of water taken from a larger whole for the purpose of analysing or testing to ensure compliance with safety and quality standards.
Schema
Structure for organizing and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.
Units
Standard measurements used to quantify and compare different physical quantities.
Water Quality
The chemical, physical, biological, and radiological characteristics of water, typically in relation to its suitability for a specific purpose, such as drinking, swimming, or ecological health. It is determined by assessing a variety of parameters, including but not limited to pH, turbidity, microbial content, dissolved oxygen, presence of substances and temperature.
Data History
Data Origin
These samples were taken from customer taps. They were then analysed for water quality, and the results were uploaded to a database. This dataset is an extract from this database.
Data Triage Considerations
Granularity
Is it useful to share results as averages or individual?
We decided to share as individual results as the lowest level of granularity
Anonymisation
It is a requirement that this data cannot be used to identify a singular person or household. We discussed many options for aggregating the data to a specific geography to ensure this requirement is met. The following geographical aggregations were discussed:
<!--·
Water Supply Zone (WSZ) - Limits interoperability
with other datasets
<!--·
Postcode – Some postcodes contain very few
households and may not offer necessary anonymisation
<!--·
Postal Sector – Deemed not granular enough in
highly populated areas
<!--·
Rounded Co-ordinates – Not a recognised standard
and may cause overlapping areas
<!--·
MSOA – Deemed not granular enough
<!--·
LSOA – Agreed as a recognised standard appropriate
for England and Wales
<!--·
Data Zones – Agreed as a recognised standard
appropriate for Scotland
Data Specifications
Each dataset will cover a calendar year of samples
This dataset will be published annually
Historical datasets will be published as far back as 2016 from the introduction of of The Water Supply (Water Quality) Regulations 2016
The Determinands included in the dataset are as per the list that is required to be reported to the Drinking Water Inspectorate.
Context
Many UK water companies provide a search tool on their websites where you can search for water quality in your area by postcode. The results of the search may identify the water supply zone that supplies the postcode searched. Water supply zones are not linked to LSOAs which means the results may differ to this dataset
Some sample results are influenced by internal plumbing and may not be representative of drinking water quality in the wider area.
Some samples are tested on site and others are sent to scientific laboratories.
Data Publish Frequency
Annually
Data Triage Review Frequency
Annually unless otherwise requested
Supplementary information
Below is a curated selection of links for additional reading, which provide a deeper understanding of this dataset.
<!--1.
Drinking Water
Inspectorate Standards and Regulations:
<!--2.
https://www.dwi.gov.uk/drinking-water-standards-and-regulations/
<!--3.
LSOA (England
and Wales) and Data Zone (Scotland):
<!--5.
Description
for LSOA boundaries by the ONS: Census
2021 geographies - Office for National Statistics (ons.gov.uk)
<!--[6.
Postcode to
LSOA lookup tables: Postcode
to 2021 Census Output Area to Lower Layer Super Output Area to Middle Layer
Super Output Area to Local Authority District (August 2023) Lookup in the UK
(statistics.gov.uk)
<!--7.
Legislation history: Legislation -
Drinking Water Inspectorate (dwi.gov.uk)
Facebook
Twitter
According to our latest research, the global anonymized trip data exchange market size reached USD 1.25 billion in 2024. The sector is experiencing robust expansion, registering a CAGR of 18.5% from 2025 to 2033, and is forecasted to attain a value of USD 6.16 billion by 2033. The primary growth factor driving this market is the increasing demand for data-driven insights to optimize urban mobility, enhance traffic management, and support smart city initiatives worldwide. As organizations and governments prioritize privacy-compliant data sharing, the anonymized trip data exchange market is positioned for significant advancements and adoption across various verticals.
One of the most prominent growth factors for the anonymized trip data exchange market is the global shift towards smart city development and the digital transformation of urban mobility infrastructure. Governments and municipal authorities are increasingly leveraging anonymized trip data to gain actionable insights into traffic flow, congestion hotspots, and travel behavior. This data-driven approach allows for the optimization of public transit routes, reduction of commute times, and improved allocation of resources. With the proliferation of IoT devices and connected vehicles, the volume and granularity of trip data have surged, further fueling the need for sophisticated platforms that can securely exchange and analyze anonymized information while upholding stringent privacy regulations.
Another critical driver is the rise of mobility-as-a-service (MaaS) platforms and ride-sharing applications, which rely heavily on accurate and real-time trip data to match riders with drivers, predict demand, and optimize pricing strategies. As these services expand into new geographies and diversify their offerings, the necessity for interoperable and anonymized data exchange frameworks becomes even more pronounced. Companies in the transportation and automotive sectors are forming strategic partnerships with data exchange providers to access comprehensive datasets that inform product development, fleet management, and customer experience enhancements. The integration of artificial intelligence and machine learning further amplifies the value of anonymized trip data by enabling predictive analytics and automation across the mobility ecosystem.
The evolving regulatory landscape also plays a pivotal role in shaping the growth trajectory of the anonymized trip data exchange market. With increasing concerns over data privacy and the implementation of frameworks such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States, organizations are compelled to adopt robust anonymization techniques to ensure compliance while facilitating data sharing. This dynamic has accelerated the development of advanced anonymization algorithms and secure data exchange protocols, fostering trust among stakeholders and encouraging broader participation in data-driven initiatives. As privacy-preserving technologies mature, the market is expected to witness greater adoption across both public and private sectors.
From a regional perspective, North America currently leads the global anonymized trip data exchange market, driven by early adoption of smart mobility solutions, a well-established data infrastructure, and proactive regulatory frameworks. Europe follows closely, with significant investments in sustainable transportation projects and cross-border data collaboration. The Asia Pacific region is emerging as a high-growth market, propelled by rapid urbanization, government-led smart city programs, and the proliferation of ride-hailing platforms. Latin America and the Middle East & Africa are also witnessing increasing interest, albeit at a more gradual pace, as urban mobility challenges and digital transformation initiatives gain momentum. Overall, the regional outlook underscores a widespread and accelerating demand for anonymized trip data exchange solutions across the globe.
Facebook
TwitterExample Textile Trade And Industry Anonymous Company Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
Twitteroslo-city-bike License: Norwegian Licence for Open Government Data (NLOD) 2.0 According to the license, we have full rights to collect, use, modify, and distribute this data, provided you clearly indicate the source (which I do).
Folder oslobysykkel contains all available data from 2019 to 2025. Format: oslobysykkel-YYYY-MM.csv. why is oslo still appearing in the file names? because there is also similar data for Trondheim and Bergen
from oslobysykkel.no Variable Format Description started_at Timestamp Timestamp of when the trip started ended_at Timestamp Timestamp of when the trip ended duration Integer Duration of trip in seconds start_station_id String Unique ID for start station start_station_name String Name of start station start_station_description String Description of where start station is located start_station_latitude Decimal degrees in WGS84 Latitude of start station start_station_longitude Decimal degrees in WGS84 Longitude of start station end_station_id String Unique ID for end station end_station_name String Name of end station end_station_description String Description of where end station is located end_station_latitude Decimal degrees in WGS84 Latitude of end station end_station_longitude Decimal degrees in WGS84 Longitude of end station
Please note: this data and my analysis focuses on the new data format, but historical data for the period April 2016 - December 2018 (Legacy Trip Data) has a different pattern.
I myself was extremely fascinated by this open data of Oslo City Bike and in the process of deep analysis saw broad prospects. This interest turned into an idea to create a data-analytical problem book or even platfrom 'exercise bike'. Publishing this dataset to make it convenient for my own further use in the next phases of the project (Clustering, Forecasting), as well as so that anyone can participate in analysis and modeling based on this exciting data.
**Autumn's remake of Oslo bike sharing data analysis ** https://colab.research.google.com/drive/1tAxrIWVK5V-ptKLJBdODjy10zHlsppFv?usp=sharing
https://drive.google.com/file/d/17FP9Bd5opoZlw40LRxWtycgJJyXSAdC6/view
Full notebooks with code, visualizations, and commentary will be published soon! This dataset is the backbone of an ongoing project — stay tuned for see a deeper dives into anomaly detection, station clustering, and interactive learning challenges.
Index of my notebooks Phase 1: Cleaned Data & Core Insights Time-Space Dynamics Exploratory
Clustering and Segmentation Demand Forecasting (Time Series) Geospatial Analysis (Network Analysis)
Similar dataset https://www.kaggle.com/code/florestancharlaix/oslo-city-bikes-analysis
links to works I have found or that have inspired me
Exploring Open Data from Oslo City Bike Jon Olave — visualization of popular routes and seasonality analysis.
Oslo City Bike Data Wrangling Karl Tryggvason — predicting bicycle availability at stations, focusing on everyday use (e.g., trips to kindergarten).
Helsinki City Bikes: Exploratory Data Analysis Analysis of a similar system in Helsinki — useful for comparative studies and methodological ideas.
The idea is to connect with other data. For example I did it for weather data - integrate temperature, precipitation, and wind speed to explain variations in daily demand. https://meteostat.net/en/place/no/oslo
I also used data from Airbnb (that's where I took division into neighbourhoods) https://data.insideairbnb.com/norway/oslo/oslo/2025-06-27/visualisations/neighbourhoods.csv
oslo bike-sharing eda feature-engineering geospatial time-series
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"The BBC's Great Debate"
was broadcasted live in the UK by the BBC on Tuesday 21 June 2016 between 20:00 and 22:00 BST.
It saw activity on Twitter with the #BBCDebate hashtag. I collected some
of the Tweets tagged with #BBCDebate using a Google Spreadsheet.The raw data was downloaded as an Excel spreadsheet file
containing an archive of 38,166 Tweets (38,066 Unique Tweets) publicly
published with the queried hashtag (#BBCDebate) between 14/06/2016
22:03:18 and 22/06/2016 09:12:32 BST. Due to the expected high volume of
Tweets only users with at least 10 followers were included in the
archive. The Tweets contained in the Archive sheet were collected using Martin Hawksey’s TAGS 6.0. Given the relatively large volume of activity expected around #BBCDebate
and the public and political nature of the hashtag, I have only shared
indicative data. No full tweets nor any other associated metadata have been shared. The dataset contains a metrics summary as well
as a table with column headings labeled created_at, time,
geo_coordinates (anonymised; if there was data YES has been indicated; if no data was present the corresponding cell has been left blank),
user_lang and user_followers_count data corresponding to each Tweet.
Timestamps should suffice to prove the existence of the Tweets and could
be useful to run analyses of activity on Twitter around a real-time
media event.No Personally identifiable information (PII), nor Sensitive Personal
Information (SPI) was collected nor was contained in the dataset.Some basic deduplication and refining of the collected data performed.I
have shared the anonymised dataset including the extra tables as a sample and as
an act of citizen scholarship in order to archive, document and
encourage open educational and historical research and analysis. It is
hoped that by sharing the data someone else might be able to run
different analyses and ideally discover different or more significant
insights.For more information including methodological and limitation issues etc. please click on the references listed below.
Facebook
TwitterThese documents present updated record level data created from the National Energy Efficiency Data-Framework (NEED):
Please forward any feedback to energyefficiency.stats@energysecurity.gov.uk.