Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.
The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.
The University of Sheffield are the designated data controller for this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General Information
Cebulka (Polish dark web cryptomarket and image board) messages data.
Haitao Shi (The University of Edinburgh, UK); Patrycja Cheba (Jagiellonian University); Leszek Świeca (Kazimierz Wielki University in Bydgoszcz, Poland).
The dataset is part of the research supported by the Polish National Science Centre (Narodowe Centrum Nauki) grant 2021/43/B/HS6/00710.
Project title: “Rhizomatic networks, circulation of meanings and contents, and offline contexts of online drug trade” (2022-2025; PLN 956 620; funding institution: Polish National Science Centre [NCN], call: OPUS 22; Principal Investigator: Piotr Siuda [Kazimierz Wielki University in Bydgoszcz, Poland]).
Data Collection Context
Polish dark web cryptomarket and image board called Cebulka (http://cebulka7uxchnbpvmqapg5pfos4ngaxglsktzvha7a5rigndghvadeyd.onion/index.php).
This dataset was developed within the abovementioned project. The project focuses on studying internet behavior concerning disruptive actions, particularly emphasizing the online narcotics market in Poland. The research seeks to (1) investigate how the open internet, including social media, is used in the drug trade; (2) outline the significance of darknet platforms in the distribution of drugs; and (3) explore the complex exchange of content related to the drug trade between the surface web and the darknet, along with understanding meanings constructed within the drug subculture.
Within this context, Cebulka is identified as a critical digital venue in Poland’s dark web illicit substances scene. Besides serving as a marketplace, it plays a crucial role in shaping the narratives and discussions prevalent in the drug subculture. The dataset has proved to be a valuable tool for performing the analyses needed to achieve the project’s objectives.
Data Content
The data was collected in three periods, i.e., in January 2023, June 2023, and January 2024.
The dataset comprises a sample of messages posted on Cebulka from its inception until January 2024 (including all the messages with drug advertisements). These messages include the initial posts that start each thread and the subsequent posts (replies) within those threads. The dataset is organized into two directories. The “cebulka_adverts” directory contains posts related to drug advertisements (both advertisements and comments). In contrast, the “cebulka_community” directory holds a sample of posts from other parts of the cryptomarket, i.e., those not related directly to trading drugs but rather focusing on discussing illicit substances. The dataset consists of 16,842 posts.
The data has been cleaned and processed using regular expressions in Python. Additionally, all personal information was removed through regular expressions. The data has been hashed to exclude all identifiers related to instant messaging apps and email addresses. Furthermore, all usernames appearing in messages have been eliminated.
The dataset consists of the following files:
Zipped .txt files (“cebulka_adverts.zip” and “cebulka_community.zip”) containing all messages. These files are organized into individual directories that mirror the folder structure found on Cebulka.
Two .csv files that list all the messages, including file names and the content of each post. The first .csv lists messages from “cebulka_adverts.zip,” and the second .csv lists messages from “cebulka_community.zip.”
Ethical Considerations
A set of data handling policies aimed at ensuring safety and ethics has been outlined in the following paper:
Harviainen, J.T., Haasio, A., Ruokolainen, T., Hassan, L., Siuda, P., Hamari, J. (2021). Information Protection in Dark Web Drug Markets Research [in:] Proceedings of the 54th Hawaii International Conference on System Sciences, HICSS 2021, Grand Hyatt Kauai, Hawaii, USA, 4-8 January 2021, Maui, Hawaii, (ed.) Tung X. Bui, Honolulu, HI, pp. 4673-4680.
The primary safeguard was the early-stage hashing of usernames and identifiers from the messages, utilizing automated systems for irreversible hashing. Recognizing that automatic name removal might not catch all identifiers, the data underwent manual review to ensure compliance with research ethics and thorough anonymization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information
The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.
The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.
Structure and content of the dataset
ChEMBL ID |
PubChem ID |
IUPHAR ID | Target |
Activity type | Assay type | Unit | Mean C (0) | ... | Mean PC (0) | ... | Mean B (0) | ... | Mean I (0) | ... | Mean PD (0) | ... | Activity check annotation | Ligand names | Canonical SMILES C | ... | Structure check | Source |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.
Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.
Column content:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset OverviewMEDISEG (MEDication Image SEGmentation) is a high-quality, real-world dataset designed for the development and evaluation of pill recognition models. It contains two subsets:MEDISEG (3-Pills): A controlled dataset featuring three pill types with subtle differences in shape and color.MEDISEG (32-Pills): A more diverse dataset containing 32 distinct pill classes, reflecting real-world challenges such as occlusions, varied lighting conditions, and multiple medications in a single frame.Each subset includes COCO-format annotations with instance segmentation masks, bounding boxes, and class labels.Dataset StructureThe dataset is organized as follows:MEDISEG/│── LICENSE│── metadata.csv│── 3pills/│ ├── annotations.json│ ├── images/│ │ ├── image1.jpg│ │ ├── image2.jpg│── 32pills/│ ├── annotations.json│ ├── images/│ │ ├── image1.jpg│ │ ├── image2.jpgLICENSE: The CC BY 4.0 license under which the dataset is distributed.metadata.csv: Supplementary drug information, including registration numbers, brand names, active ingredients, regulatory classifications, and official URLs.annotations.json: COCO-format annotation files providing segmentation masks, bounding boxes, and class labels.images/: High-resolution JPG images of medications.AcknowledgementsIf you use this dataset, please cite the corresponding publication:bibtex@inproceedings{MEDISEG2025,title = {MEDISEG: A large-scale dataset of medication images with instance segmentation masks for preventing adverse drug events},author = {Chu, Wai Ip and Hirani, Shashi and Tarroni, Giacomo and Li, Ling},journal = {Nature Scientific Data},year = {2025},url = {https://example.com}}
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Drug Consumptions (UCI)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/obeykhadija/drug-consumptions-uci on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Data Set Information:
Database contains records for 1885 respondents. For each respondent 12 attributes are known: Personality measurements which include NEO-FFI-R (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness), BIS-11 (impulsivity), and ImpSS (sensation seeking), level of education, age, gender, country of residence and ethnicity. All input attributes are originally categorical and are quantified. After quantification values of all input features can be considered as real-valued. In addition, participants were questioned concerning their use of 18 legal and illegal drugs (alcohol, amphetamines, amyl nitrite, benzodiazepine, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, mushrooms, nicotine and volatile substance abuse and one fictitious drug (Semeron) which was introduced to identify over-claimers. For each drug they have to select one of the answers: never used the drug, used it over a decade ago, or in the last decade, year, month, week, or day.
Detailed description of database and process of data quantification are presented in E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan and A. N. Gorban, "The Five Factor Model of personality and evaluation of drug consumption risk.," arXiv [Web Link], 2015 Paper above solve binary classification problem for all drugs. For most of drugs sensitivity and specificity are greater than 75%
Since all of the features have been quantified into real values please refer to the link to the original dataset to get more clarity on categorical variables. For example, for EScore (extraversion) 9 people scored 55 which corresponds to a quantified (real) value of in the dataset 2.57309. I have also converted some variables back into their categorical values which are included in the drug_consumption.csv file Original Dataset
Feature Attributes for Quantified Data: 1. ID: is a number of records in an original database. Cannot be related to the participant. It can be used for reference only. 2. Age (Real) is the age of participant 3. Gender: Male or Female 4. Education: level of education of participant 5. Country: country of origin of the participant 6. Ethnicity: ethnicity of participant 7. Nscore (Real) is NEO-FFI-R Neuroticism 8. Escore (Real) is NEO-FFI-R Extraversion 9. Oscore (Real) is NEO-FFI-R Openness to experience. 10. Ascore (Real) is NEO-FFI-R Agreeableness. 11. Cscore (Real) is NEO-FFI-R Conscientiousness. 12. Impulsive (Real) is impulsiveness measured by BIS-11 13. SS (Real) is sensation seeing measured by ImpSS 14. Alcohol: alcohol consumption 15. Amphet: amphetamines consumption 16. Amyl: nitrite consumption 17. Benzos: benzodiazepine consumption 18. Caff: caffeine consumption 19. Cannabis: marijuana consumption 20. Choc: chocolate consumption 21. Coke: cocaine consumption 22. Crack: crack cocaine consumption 23. Ecstasy: ecstasy consumption 24. Heroin: heroin consumption 25. Ketamine: ketamine consumption 26. Legalh: legal highs consumption 27. LSD: LSD consumption 28. Meth: methadone consumption 29. Mushroom: magic mushroom consumption 30. Nicotine: nicotine consumption 31. Semer: class of fictitious drug Semeron consumption (i.e. control) 32. VSA: class of volatile substance abuse consumption
Rating's for Drug Use: - CL0 Never Used - CL1 Used over a Decade Ago - CL2 Used in Last Decade - CL3 Used in Last Year 59 - CL4 Used in Last Month - CL5 Used in Last Week - CL6 Used in Last Day
Elaine Fehrman, Men's Personality Disorder and National Women's Directorate, Rampton Hospital, Retford, Nottinghamshire, DN22 0PD, UK, Elaine.Fehrman@nottshc.nhs.uk
Vincent Egan, Department of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, NG8 1BB, UK, Vincent.Egan@nottingham.ac.uk
Evgeny M. Mirkes Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK, em322@le.ac.uk
Problem which can be solved: - Seven class classifications for each drug separately. - Problem can be transformed to binary classification by union of part of classes into one new class. For example, "Never Used", "Used over a Decade Ago" form class "Non-user" and all other classes form class "User". - The best binarization of classes for each attribute. - Evaluation of risk to be drug consumer for each drug.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset AssociationThis dataset belongs to the project "PPG Signals and Cholesterol Data: Repository for the Validation of Total Blood Cholesterol Estimation Methods" where different PPG signals are presented together with cholesterol information of the subjects. This is done with the objective of validating tools or methods for estimating the total blood cholesterol level from the PPG signal.Dataset DescriptionThis dataset contains files in .csv (Comma-separated values) format, corresponding to the PPG signal of 46 subjects. Subject data such as age, sex, and cholesterol are not found in the files presented here. If these data are needed in the records, they can be located in the following dataset within this project "PPG Signals & Cholesterol Data Subject Files (Format: .csv)". Other data such as weight, height and whether the subject is on medication can be found in the excel document included in the project.Dataset format.csv (Comma-separated values)Other formats available in the project:.txt (Text file).json (JavaScript Object Notation).mat (MATLAB file)
The TouchGET is an affective touch gesture dataset involving 10 kinds of touch gestures and 12 kinds of discrete emotions. The dataset is grouped into fifteen folders using the following name convention: ‘subject’, the subject’s index (from 1 to 15), and the subject’s sex (‘M’ or ‘F’). An example of a folder name is: ‘subject_1_M’. Each folder contains thirteen subfolders, twelve of which are named after the 12 types of emotions, and each of which includes samples of gestures associated with that emotion. The gesture samples are saved as comma-separated value (CSV) files, using the following naming convention: gesture number, touch variant (‘1’ represents gentle, and ‘2’ rude.), and gesture counting index. An example of a file name is: ‘B_1_5.csv’. In addition, the last subfolder is named after ‘rest’, which contains gesture samples that were not selected under any emotion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
"Health Index. Ukraine" is a large-scale, empirical, and representative study aimed at collecting quantitative data on the population's health-related knowledge and behaviors, as well as their evaluations of healthcare service quality based on personal experiences.
The data for this study are derived from sociological surveys of the adult population. The initial survey was conducted in 2016 by the Kyiv International Institute of Sociology (KIIS) in collaboration with Social Indicators Center under initiative and with funding from the International Renaissance Foundation.
The survey covered topics such as health and health-seeking behavior, early disease detection, patient experiences with outpatient and inpatient care (including questions on official and unofficial expenses), ambulance and pediatric services, medication availability, satisfaction with medical care, and perceptions of healthcare reforms.
The survey sample is representative of the 18+ population of Ukraine as a whole, as well as each of the oblasts covered by the study and the city of Kyiv. Temporarily occupied territories of the Autonomous Republic of Crimea, the city of Sevastopol, and certain districts of the Donetsk and Luhansk oblasts, where government authorities temporarily do not exercise their powers, were not included in the study.
Data were collected through face-to-face interviews conducted at respondents' places of residence. In total, 10,178 respondents aged 18 and older were interviewed between May 15 and June 30, 2016.
The data is available in an SAV format (Ukrainian) and a converted CSV format (with a codebook). Field questionnaires (in Ukrainian and Russian) are also included.
The study results, including detailed analytical reports in both Ukrainian and English, as well as selected infographics, are accessible on the project website: https://healthindex.com.ua/.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
"Health Index. Ukraine" is a large-scale, empirical, and representative study aimed at collecting quantitative data on the population's health-related knowledge and behaviors, as well as their evaluations of healthcare service quality based on personal experiences.
In 2017, the second wave of the study was conducted. It was carried out by the Kyiv International Institute of Sociology in collaboration with the Social Indicators Center, under initiative and with funding and support from the International Renaissance Foundation and the World Bank.
The survey covered a range of topics, including health and health-seeking behavior, early disease detection, patient experiences with outpatient and inpatient care (including questions on official and unofficial expenses), pediatric services, medication availability, satisfaction with medical care, and perceptions of healthcare reforms.
The survey sample is representative of the 18+ population of Ukraine as a whole, as well as for each of the oblasts covered by the study and the city of Kyiv. Temporarily occupied territories of the Autonomous Republic of Crimea, the city of Sevastopol, and certain districts of the Donetsk and Luhansk oblasts, where government authorities temporarily do not exercise their powers, were not included in the study.
Data were collected through face-to-face interviews using tablets (CAPI) conducted at respondents' places of residence. In total, 10,184 respondents aged 18 and older were interviewed between May 18 and June 27, 2017.
The data is available in an SAV format (Ukrainian) and a converted CSV format (with a codebook in Ukrainian). Field questionnaires (in Ukrainian and Russian) and technical report describing the methodology (in Ukrainian) are also included.
The study results, including detailed analytical reports in both Ukrainian and English, as well as selected infographics, are accessible on the project website: https://healthindex.com.ua/.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.
The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.
The University of Sheffield are the designated data controller for this dataset.