13 datasets found

s
Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics
orda.shef.ac.uk
txt
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics [Dataset]. https://orda.shef.ac.uk/articles/dataset/Orphan_Drugs_-_Dataset_1_Twitter_issue-networks_as_excluded_publics/16447326
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.16447326.v1
Dataset updated
Oct 22, 2021
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.

The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.

This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.

The University of Sheffield are the designated data controller for this dataset.
Z
Cebulka (Polish dark web cryptomarket and image board) messages data
data.niaid.nih.gov
zenodo.org
Updated Mar 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheba, Patrycja (2024). Cebulka (Polish dark web cryptomarket and image board) messages data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10810938
Explore at:
Dataset updated
Mar 18, 2024
Dataset provided by
Świeca, Leszek
Siuda, Piotr
Cheba, Patrycja
Shi, Haitao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
General Information

Title of Dataset

Cebulka (Polish dark web cryptomarket and image board) messages data.

Data Collectors

Haitao Shi (The University of Edinburgh, UK); Patrycja Cheba (Jagiellonian University); Leszek Świeca (Kazimierz Wielki University in Bydgoszcz, Poland).

Funding Information

The dataset is part of the research supported by the Polish National Science Centre (Narodowe Centrum Nauki) grant 2021/43/B/HS6/00710.

Project title: “Rhizomatic networks, circulation of meanings and contents, and offline contexts of online drug trade” (2022-2025; PLN 956 620; funding institution: Polish National Science Centre [NCN], call: OPUS 22; Principal Investigator: Piotr Siuda [Kazimierz Wielki University in Bydgoszcz, Poland]).

Data Collection Context

Data Source

Polish dark web cryptomarket and image board called Cebulka (http://cebulka7uxchnbpvmqapg5pfos4ngaxglsktzvha7a5rigndghvadeyd.onion/index.php).

Purpose

This dataset was developed within the abovementioned project. The project focuses on studying internet behavior concerning disruptive actions, particularly emphasizing the online narcotics market in Poland. The research seeks to (1) investigate how the open internet, including social media, is used in the drug trade; (2) outline the significance of darknet platforms in the distribution of drugs; and (3) explore the complex exchange of content related to the drug trade between the surface web and the darknet, along with understanding meanings constructed within the drug subculture.

Within this context, Cebulka is identified as a critical digital venue in Poland’s dark web illicit substances scene. Besides serving as a marketplace, it plays a crucial role in shaping the narratives and discussions prevalent in the drug subculture. The dataset has proved to be a valuable tool for performing the analyses needed to achieve the project’s objectives.

Data Content

Data Description

The data was collected in three periods, i.e., in January 2023, June 2023, and January 2024.

The dataset comprises a sample of messages posted on Cebulka from its inception until January 2024 (including all the messages with drug advertisements). These messages include the initial posts that start each thread and the subsequent posts (replies) within those threads. The dataset is organized into two directories. The “cebulka_adverts” directory contains posts related to drug advertisements (both advertisements and comments). In contrast, the “cebulka_community” directory holds a sample of posts from other parts of the cryptomarket, i.e., those not related directly to trading drugs but rather focusing on discussing illicit substances. The dataset consists of 16,842 posts.

Data Cleaning, Processing, and Anonymization

The data has been cleaned and processed using regular expressions in Python. Additionally, all personal information was removed through regular expressions. The data has been hashed to exclude all identifiers related to instant messaging apps and email addresses. Furthermore, all usernames appearing in messages have been eliminated.

File Formats and Variables/Fields

The dataset consists of the following files:

Zipped .txt files (“cebulka_adverts.zip” and “cebulka_community.zip”) containing all messages. These files are organized into individual directories that mirror the folder structure found on Cebulka.

Two .csv files that list all the messages, including file names and the content of each post. The first .csv lists messages from “cebulka_adverts.zip,” and the second .csv lists messages from “cebulka_community.zip.”

Ethical Considerations

Ethics Statement

A set of data handling policies aimed at ensuring safety and ethics has been outlined in the following paper:

Harviainen, J.T., Haasio, A., Ruokolainen, T., Hassan, L., Siuda, P., Hamari, J. (2021). Information Protection in Dark Web Drug Markets Research [in:] Proceedings of the 54th Hawaii International Conference on System Sciences, HICSS 2021, Grand Hyatt Kauai, Hawaii, USA, 4-8 January 2021, Maui, Hawaii, (ed.) Tung X. Bui, Honolulu, HI, pp. 4673-4680.

The primary safeguard was the early-stage hashing of usernames and identifiers from the messages, utilizing automated systems for irreversible hashing. Recognizing that automatic name removal might not catch all identifiers, the data underwent manual review to ensure compliance with research ethics and thorough anonymization.

Data from: A consensus compound/bioactivity dataset for data-driven drug...

zenodo.org
explore.openaire.eu
+1more

zip

Updated May 13, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6320761

Dataset updated

May 13, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Information

The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

Structure and content of the dataset

**Dataset structure**
ChEMBL ID	PubChem ID	IUPHAR ID	Target	Activity type	Assay type	Unit	Mean C (0)	...	Mean PC (0)	...	Mean B (0)	...	Mean I (0)	...	Mean PD (0)	...	Activity check annotation	Ligand names	Canonical SMILES C	...	Structure check	Source

The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

Column content:

ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
Target: biological target of the molecule expressed as the HGNC gene symbol
Activity type: for example, pIC₅₀
Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
Unit: unit of bioactivity measurement
Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
- no comment: bioactivity values are within one log unit;
- check activity data: bioactivity values are not within one log unit;
- only one data point: only one value was available, no comparison and no range calculated;
- no activity value: no precise numeric activity value was available;
- no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
Ligand names: all unique names contained in the five source databases are listed
Canonical SMILES columns: Molecular structure of the compound from each database
Structure check: To denote matching or differing compound structures in different source databases
- match: molecule structures are the same between different sources;
- no match: the structures differ;
- 1 source: no structure comparison is possible, because the molecule comes from only one source database.
Source: From which databases the data come from

f
MEDISEG
city.figshare.com
application/x-gzip
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
William Chu (2025). MEDISEG [Dataset]. http://doi.org/10.25383/city.28574786.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.25383/city.28574786.v1
Dataset updated
Mar 14, 2025
Dataset provided by
City, University of London
Authors
William Chu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset OverviewMEDISEG (MEDication Image SEGmentation) is a high-quality, real-world dataset designed for the development and evaluation of pill recognition models. It contains two subsets:MEDISEG (3-Pills): A controlled dataset featuring three pill types with subtle differences in shape and color.MEDISEG (32-Pills): A more diverse dataset containing 32 distinct pill classes, reflecting real-world challenges such as occlusions, varied lighting conditions, and multiple medications in a single frame.Each subset includes COCO-format annotations with instance segmentation masks, bounding boxes, and class labels.Dataset StructureThe dataset is organized as follows:MEDISEG/│── LICENSE│── metadata.csv│── 3pills/│ ├── annotations.json│ ├── images/│ │ ├── image1.jpg│ │ ├── image2.jpg│── 32pills/│ ├── annotations.json│ ├── images/│ │ ├── image1.jpg│ │ ├── image2.jpgLICENSE: The CC BY 4.0 license under which the dataset is distributed.metadata.csv: Supplementary drug information, including registration numbers, brand names, active ingredients, regulatory classifications, and official URLs.annotations.json: COCO-format annotation files providing segmentation masks, bounding boxes, and class labels.images/: High-resolution JPG images of medications.AcknowledgementsIf you use this dataset, please cite the corresponding publication:bibtex@inproceedings{MEDISEG2025,title = {MEDISEG: A large-scale dataset of medication images with instance segmentation masks for preventing adverse drug events},author = {Chu, Wai Ip and Hirani, Shashi and Tarroni, Giacomo and Li, Ling},journal = {Nature Scientific Data},year = {2025},url = {https://example.com}}
O
Sample of Drugs from QHP drug.json files
healthdata.demo.socrata.com
csv, xlsx, xml
Updated Apr 16, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Sample of Drugs from QHP drug.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Drugs-from-QHP-drug-json-files/jaa8-k3k2
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Apr 16, 2016
Description
CSV output from https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py
A
‘Drug Consumptions (UCI)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Drug Consumptions (UCI)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-drug-consumptions-uci-58a9/20dcfc96/?iid=052-359&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Drug Consumptions (UCI)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/obeykhadija/drug-consumptions-uci on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Data Set Information:

Database contains records for 1885 respondents. For each respondent 12 attributes are known: Personality measurements which include NEO-FFI-R (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness), BIS-11 (impulsivity), and ImpSS (sensation seeking), level of education, age, gender, country of residence and ethnicity. All input attributes are originally categorical and are quantified. After quantification values of all input features can be considered as real-valued. In addition, participants were questioned concerning their use of 18 legal and illegal drugs (alcohol, amphetamines, amyl nitrite, benzodiazepine, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, mushrooms, nicotine and volatile substance abuse and one fictitious drug (Semeron) which was introduced to identify over-claimers. For each drug they have to select one of the answers: never used the drug, used it over a decade ago, or in the last decade, year, month, week, or day.

Detailed description of database and process of data quantification are presented in E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan and A. N. Gorban, "The Five Factor Model of personality and evaluation of drug consumption risk.," arXiv [Web Link], 2015 Paper above solve binary classification problem for all drugs. For most of drugs sensitivity and specificity are greater than 75%

Since all of the features have been quantified into real values please refer to the link to the original dataset to get more clarity on categorical variables. For example, for EScore (extraversion) 9 people scored 55 which corresponds to a quantified (real) value of in the dataset 2.57309. I have also converted some variables back into their categorical values which are included in the drug_consumption.csv file Original Dataset

Content

Feature Attributes for Quantified Data: 1. ID: is a number of records in an original database. Cannot be related to the participant. It can be used for reference only. 2. Age (Real) is the age of participant 3. Gender: Male or Female 4. Education: level of education of participant 5. Country: country of origin of the participant 6. Ethnicity: ethnicity of participant 7. Nscore (Real) is NEO-FFI-R Neuroticism 8. Escore (Real) is NEO-FFI-R Extraversion 9. Oscore (Real) is NEO-FFI-R Openness to experience. 10. Ascore (Real) is NEO-FFI-R Agreeableness. 11. Cscore (Real) is NEO-FFI-R Conscientiousness. 12. Impulsive (Real) is impulsiveness measured by BIS-11 13. SS (Real) is sensation seeing measured by ImpSS 14. Alcohol: alcohol consumption 15. Amphet: amphetamines consumption 16. Amyl: nitrite consumption 17. Benzos: benzodiazepine consumption 18. Caff: caffeine consumption 19. Cannabis: marijuana consumption 20. Choc: chocolate consumption 21. Coke: cocaine consumption 22. Crack: crack cocaine consumption 23. Ecstasy: ecstasy consumption 24. Heroin: heroin consumption 25. Ketamine: ketamine consumption 26. Legalh: legal highs consumption 27. LSD: LSD consumption 28. Meth: methadone consumption 29. Mushroom: magic mushroom consumption 30. Nicotine: nicotine consumption 31. Semer: class of fictitious drug Semeron consumption (i.e. control) 32. VSA: class of volatile substance abuse consumption

Rating's for Drug Use: - CL0 Never Used - CL1 Used over a Decade Ago - CL2 Used in Last Decade - CL3 Used in Last Year 59 - CL4 Used in Last Month - CL5 Used in Last Week - CL6 Used in Last Day

Acknowledgements

Elaine Fehrman, Men's Personality Disorder and National Women's Directorate, Rampton Hospital, Retford, Nottinghamshire, DN22 0PD, UK, Elaine.Fehrman@nottshc.nhs.uk

Vincent Egan, Department of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, NG8 1BB, UK, Vincent.Egan@nottingham.ac.uk

Evgeny M. Mirkes Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK, em322@le.ac.uk

Inspiration

Problem which can be solved: - Seven class classifications for each drug separately. - Problem can be transformed to binary classification by union of part of classes into one new class. For example, "Never Used", "Used over a Decade Ago" form class "Non-user" and all other classes form class "User". - The best binarization of classes for each attribute. - Evaluation of risk to be drug consumer for each drug.

--- Original source retains full ownership of the source dataset ---
Data from: Dataset on drug use in 2020 (COVID-19 lockdown) in Spain and...
zenodo.org
ekoizpen-zientifikoa.ehu.eus
+3more
bin, csv, txt
Updated Mar 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Estévez Danta; Andrea Estévez Danta; Lubertus Bijlsma; Lubertus Bijlsma; Ricardo Capela; Ricardo Capela; Rafael Cela; Rafael Cela; Alberto Celma; Alberto Celma; Félix Hernández; Félix Hernández; Unax Lertxundi; Unax Lertxundi; João Matias; Rosa Montes; Rosa Montes; GORKA ORIVE; GORKA ORIVE; Ailette Prieto; Ailette Prieto; Miguel M. Santos; Miguel M. Santos; Rosario Rodil; Rosario Rodil; José Benito Quintana; José Benito Quintana; João Matias (2024). Dataset on drug use in 2020 (COVID-19 lockdown) in Spain and Portugal by wastewater-based epidemiology [Dataset]. http://doi.org/10.5281/zenodo.10829752
Explore at:
txt, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10829752
Dataset updated
Mar 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrea Estévez Danta; Andrea Estévez Danta; Lubertus Bijlsma; Lubertus Bijlsma; Ricardo Capela; Ricardo Capela; Rafael Cela; Rafael Cela; Alberto Celma; Alberto Celma; Félix Hernández; Félix Hernández; Unax Lertxundi; Unax Lertxundi; João Matias; Rosa Montes; Rosa Montes; GORKA ORIVE; GORKA ORIVE; Ailette Prieto; Ailette Prieto; Miguel M. Santos; Miguel M. Santos; Rosario Rodil; Rosario Rodil; José Benito Quintana; José Benito Quintana; João Matias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This datase contains the metadata associated with this publication:

A. Estévez-Danta, L. Bijlsma, R. Capela, R. Cela, A. Celma, F. Hernández, U. Lertxundi, J. Matias, R. Montes, G. Orive, A. Prieto, M.M. Santos, R. Rodil, J.B. Quintana

Use of illicit drugs, alcohol and tobacco in Spain and Portugal during the COVID-19 crisis in 2020 as measured by wastewater-based epidemiology

Science of the Total Environment, 2022, 836, 155697

https://doi.org/10.1016/j.scitotenv.2022.155697

The data is deposited in ZENODO:

https://zenodo.org/doi/10.5281/zenodo.10829752

If you reuse the data, please cite the publication and ZENODO deposit mentioned above

Explanation of the different sheets of the Excel file (All_Data_STOTEN_2022_155697) or different individual CSV files (named as below):

WWTP_details: explanation of wastewater treatment plats (WWTPs) sampled, flow rates, etc.

Concentrations: concentrations measured in the samples

PNDL: population normalized daily loads calculated per each sample

Consumption: estimated drug use (see the publication for correction factors)

EF: enantiomeric fraction, expressed as fraction of the R-enantiomer for the samples analyzed

Abreviations

AMP Amphetamine

MAMP Methamphetamine

MDMA 3,4-Methylenedioxymethamphetamine

BE Benzoylecgonine

COC Cocaine

THC-COOH 11-Nor-9-carboxy-Δ9-tetrahydrocannabinol

THC Δ9-Tetrahydrocannabinol

COT Cotinine

OH-COT Trans-3'-Hydroxycotinine

NIC Nicotine

EtS Ethyl sulfate
PPG Signals Subject Files (Format: .csv) (Only PPG signal data)
figshare.com
csv
Updated Oct 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deletedd Deletedddd; Jhon Freddy Esquivel Aguirre (2024). PPG Signals Subject Files (Format: .csv) (Only PPG signal data) [Dataset]. http://doi.org/10.6084/m9.figshare.27132288.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27132288.v1
Dataset updated
Oct 1, 2024
Dataset provided by
figshare
Authors
Deletedd Deletedddd; Jhon Freddy Esquivel Aguirre
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset AssociationThis dataset belongs to the project "PPG Signals and Cholesterol Data: Repository for the Validation of Total Blood Cholesterol Estimation Methods" where different PPG signals are presented together with cholesterol information of the subjects. This is done with the objective of validating tools or methods for estimating the total blood cholesterol level from the PPG signal.Dataset DescriptionThis dataset contains files in .csv (Comma-separated values) format, corresponding to the PPG signal of 46 subjects. Subject data such as age, sex, and cholesterol are not found in the files presented here. If these data are needed in the records, they can be located in the following dataset within this project "PPG Signals & Cholesterol Data Subject Files (Format: .csv)". Other data such as weight, height and whether the subject is on medication can be found in the excel document included in the project.Dataset format.csv (Comma-separated values)Other formats available in the project:.txt (Text file).json (JavaScript Object Notation).mat (MATLAB file)
d
Touch Gesture and Emotion dataset of Tianjin University (TouchGET)
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, Yunkai; Meng, Qinghao (2023). Touch Gesture and Emotion dataset of Tianjin University (TouchGET) [Dataset]. http://doi.org/10.7910/DVN/Z9IRNM
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/Z9IRNM
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Li, Yunkai; Meng, Qinghao
Description
The TouchGET is an affective touch gesture dataset involving 10 kinds of touch gestures and 12 kinds of discrete emotions. The dataset is grouped into fifteen folders using the following name convention: ‘subject’, the subject’s index (from 1 to 15), and the subject’s sex (‘M’ or ‘F’). An example of a folder name is: ‘subject_1_M’. Each folder contains thirteen subfolders, twelve of which are named after the 12 types of emotions, and each of which includes samples of gestures associated with that emotion. The gesture samples are saved as comma-separated value (CSV) files, using the following naming convention: gesture number, touch variant (‘1’ represents gentle, and ‘2’ rude.), and gesture counting index. An example of a file name is: ‘B_1_5.csv’. In addition, the last subfolder is named after ‘rest’, which contains gesture samples that were not selected under any emotion.
NHANES 1988-2018
figshare.com
application/gzip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v2
Dataset updated
Feb 18, 2025
Dataset provided by
figshare
Authors
Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).
d
Health Index. Ukraine (2016) - Dataset - B2FIND
b2find.dkrz.de
Updated Jan 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Health Index. Ukraine (2016) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/d73871c3-22d1-5732-a31d-6837e8b959f0
Explore at:
Dataset updated
Jan 11, 2025
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Ukraine
Description
"Health Index. Ukraine" is a large-scale, empirical, and representative study aimed at collecting quantitative data on the population's health-related knowledge and behaviors, as well as their evaluations of healthcare service quality based on personal experiences.
The data for this study are derived from sociological surveys of the adult population. The initial survey was conducted in 2016 by the Kyiv International Institute of Sociology (KIIS) in collaboration with Social Indicators Center under initiative and with funding from the International Renaissance Foundation.
The survey covered topics such as health and health-seeking behavior, early disease detection, patient experiences with outpatient and inpatient care (including questions on official and unofficial expenses), ambulance and pediatric services, medication availability, satisfaction with medical care, and perceptions of healthcare reforms.
The survey sample is representative of the 18+ population of Ukraine as a whole, as well as each of the oblasts covered by the study and the city of Kyiv. Temporarily occupied territories of the Autonomous Republic of Crimea, the city of Sevastopol, and certain districts of the Donetsk and Luhansk oblasts, where government authorities temporarily do not exercise their powers, were not included in the study.
Data were collected through face-to-face interviews conducted at respondents' places of residence. In total, 10,178 respondents aged 18 and older were interviewed between May 15 and June 30, 2016.
The data is available in an SAV format (Ukrainian) and a converted CSV format (with a codebook). Field questionnaires (in Ukrainian and Russian) are also included.
The study results, including detailed analytical reports in both Ukrainian and English, as well as selected infographics, are accessible on the project website: https://healthindex.com.ua/.
d
Health Index. Ukraine (2017) - Dataset - B2FIND
b2find.dkrz.de
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Health Index. Ukraine (2017) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/161c987f-682c-5c07-b204-00624aefae5f
Explore at:
Dataset updated
Jan 15, 2025
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
Ukraine
Description
"Health Index. Ukraine" is a large-scale, empirical, and representative study aimed at collecting quantitative data on the population's health-related knowledge and behaviors, as well as their evaluations of healthcare service quality based on personal experiences.
In 2017, the second wave of the study was conducted. It was carried out by the Kyiv International Institute of Sociology in collaboration with the Social Indicators Center, under initiative and with funding and support from the International Renaissance Foundation and the World Bank.
The survey covered a range of topics, including health and health-seeking behavior, early disease detection, patient experiences with outpatient and inpatient care (including questions on official and unofficial expenses), pediatric services, medication availability, satisfaction with medical care, and perceptions of healthcare reforms.
The survey sample is representative of the 18+ population of Ukraine as a whole, as well as for each of the oblasts covered by the study and the city of Kyiv. Temporarily occupied territories of the Autonomous Republic of Crimea, the city of Sevastopol, and certain districts of the Donetsk and Luhansk oblasts, where government authorities temporarily do not exercise their powers, were not included in the study.
Data were collected through face-to-face interviews using tablets (CAPI) conducted at respondents' places of residence. In total, 10,184 respondents aged 18 and older were interviewed between May 18 and June 27, 2017.
The data is available in an SAV format (Ukrainian) and a converted CSV format (with a codebook in Ukrainian). Field questionnaires (in Ukrainian and Russian) and technical report describing the methodology (in Ukrainian) are also included.
The study results, including detailed analytical reports in both Ukrainian and English, as well as selected infographics, are accessible on the project website: https://healthindex.com.ua/.
Naturalistic Neuroimaging Database
openneuro.org
Updated Jul 21, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2020). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.1
Explore at:
Unique identifier
https://doi.org/10.18112/openneuro.ds002837.v1.1.1
Dataset updated
Jul 21, 2020
Dataset provided by
OpenNeurohttps://openneuro.org/
Authors
Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overview

The Naturalistic Neuroimaging Database (NNDb v1.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

The NIH Toolbox data files are

nih_demographics.csv

nih_data.csv

nih_scores.csv

The stimuli can be found and purchased using the following EAN and ASIN numbers

500 Days of Summer: EAN = 5039036043359; ASIN = B002KKBMSW

Citizenfour: EAN = 5050968002313; ASIN = B00YP65NEI

12 Years a Slave: EAN = 5030305517229; ASIN = B00HR23CCM

Back to the Future: EAN = 5050582401288; ASIN = B000BVK82I

Little Miss Sunshine: EAN = 5039036029667; ASIN = B000JU9OJ4

The Prestige: EAN = 7321902106472; ASIN = B000K7LQS8

Pulp Fiction: EAN = 5060223762043; ASIN = B004UGAMY4

The Shawshank Redemption: EAN = 5037115299635; ASIN = B001CWLFKE

Split: EAN = 5902115603099; ASIN = B071J24232

The Usual Suspects: EAN = 5039036033497; ASIN = B0010YXNGI

Data is organized as follows

The sub-

The derivatives sub-

Some inital stimulus annotations can be found in the stimuli folder.

The mriqc derivatives folder contain the MRIQC no-reference image quality metrics for the NNDb anatomical and functional data.

Notes

Subjects 3-6, 10, 11, 24, 28, 29, 31, 39, 41, 72, 83-85 did not have the original IMA files to format into BIDS, so they were manually created (functionals) or copied in from other subjects (anatomicals). These will be updated once access to UCL facilities is restored after the COVID-19 lockdown.

If you plan to use the raw data with the stimuli / annotations, please be aware that some temporal interpolation is necessary. See our manuscript for details and GitHub for an example script to do this. Or just email one of us.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics [Dataset]. https://orda.shef.ac.uk/articles/dataset/Orphan_Drugs_-_Dataset_1_Twitter_issue-networks_as_excluded_publics/16447326

Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.15131/shef.data.16447326.v1

Dataset updated

Oct 22, 2021

Dataset provided by

The University of Sheffield

Authors

Matthew Hanchard

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.

The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.

This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.

The University of Sheffield are the designated data controller for this dataset.

Clear search

Close search

Google apps

Main menu

Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics

Cebulka (Polish dark web cryptomarket and image board) messages data

Data from: A consensus compound/bioactivity dataset for data-driven drug...

MEDISEG

Sample of Drugs from QHP drug.json files

‘Drug Consumptions (UCI)’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Data from: Dataset on drug use in 2020 (COVID-19 lockdown) in Spain and...

PPG Signals Subject Files (Format: .csv) (Only PPG signal data)

Touch Gesture and Emotion dataset of Tianjin University (TouchGET)

NHANES 1988-2018

Health Index. Ukraine (2016) - Dataset - B2FIND

Health Index. Ukraine (2017) - Dataset - B2FIND

Naturalistic Neuroimaging Database

Overview

The NIH Toolbox data files are

The stimuli can be found and purchased using the following EAN and ASIN numbers

Data is organized as follows

Notes

Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics