13 datasets found
  1. s

    Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics

    • orda.shef.ac.uk
    txt
    Updated Oct 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics [Dataset]. https://orda.shef.ac.uk/articles/dataset/Orphan_Drugs_-_Dataset_1_Twitter_issue-networks_as_excluded_publics/16447326
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 22, 2021
    Dataset provided by
    The University of Sheffield
    Authors
    Matthew Hanchard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.

    The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.

    This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.

    The University of Sheffield are the designated data controller for this dataset.

  2. Z

    Cebulka (Polish dark web cryptomarket and image board) messages data

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cheba, Patrycja (2024). Cebulka (Polish dark web cryptomarket and image board) messages data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10810938
    Explore at:
    Dataset updated
    Mar 18, 2024
    Dataset provided by
    Świeca, Leszek
    Siuda, Piotr
    Cheba, Patrycja
    Shi, Haitao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General Information

    1. Title of Dataset

    Cebulka (Polish dark web cryptomarket and image board) messages data.

    1. Data Collectors

    Haitao Shi (The University of Edinburgh, UK); Patrycja Cheba (Jagiellonian University); Leszek Świeca (Kazimierz Wielki University in Bydgoszcz, Poland).

    1. Funding Information

    The dataset is part of the research supported by the Polish National Science Centre (Narodowe Centrum Nauki) grant 2021/43/B/HS6/00710.

    Project title: “Rhizomatic networks, circulation of meanings and contents, and offline contexts of online drug trade” (2022-2025; PLN 956 620; funding institution: Polish National Science Centre [NCN], call: OPUS 22; Principal Investigator: Piotr Siuda [Kazimierz Wielki University in Bydgoszcz, Poland]).

    Data Collection Context

    1. Data Source

    Polish dark web cryptomarket and image board called Cebulka (http://cebulka7uxchnbpvmqapg5pfos4ngaxglsktzvha7a5rigndghvadeyd.onion/index.php).

    1. Purpose

    This dataset was developed within the abovementioned project. The project focuses on studying internet behavior concerning disruptive actions, particularly emphasizing the online narcotics market in Poland. The research seeks to (1) investigate how the open internet, including social media, is used in the drug trade; (2) outline the significance of darknet platforms in the distribution of drugs; and (3) explore the complex exchange of content related to the drug trade between the surface web and the darknet, along with understanding meanings constructed within the drug subculture.

    Within this context, Cebulka is identified as a critical digital venue in Poland’s dark web illicit substances scene. Besides serving as a marketplace, it plays a crucial role in shaping the narratives and discussions prevalent in the drug subculture. The dataset has proved to be a valuable tool for performing the analyses needed to achieve the project’s objectives.

    Data Content

    1. Data Description

    The data was collected in three periods, i.e., in January 2023, June 2023, and January 2024.

    The dataset comprises a sample of messages posted on Cebulka from its inception until January 2024 (including all the messages with drug advertisements). These messages include the initial posts that start each thread and the subsequent posts (replies) within those threads. The dataset is organized into two directories. The “cebulka_adverts” directory contains posts related to drug advertisements (both advertisements and comments). In contrast, the “cebulka_community” directory holds a sample of posts from other parts of the cryptomarket, i.e., those not related directly to trading drugs but rather focusing on discussing illicit substances. The dataset consists of 16,842 posts.

    1. Data Cleaning, Processing, and Anonymization

    The data has been cleaned and processed using regular expressions in Python. Additionally, all personal information was removed through regular expressions. The data has been hashed to exclude all identifiers related to instant messaging apps and email addresses. Furthermore, all usernames appearing in messages have been eliminated.

    1. File Formats and Variables/Fields

    The dataset consists of the following files:

    Zipped .txt files (“cebulka_adverts.zip” and “cebulka_community.zip”) containing all messages. These files are organized into individual directories that mirror the folder structure found on Cebulka.

    Two .csv files that list all the messages, including file names and the content of each post. The first .csv lists messages from “cebulka_adverts.zip,” and the second .csv lists messages from “cebulka_community.zip.”

    Ethical Considerations

    1. Ethics Statement

    A set of data handling policies aimed at ensuring safety and ethics has been outlined in the following paper:

    Harviainen, J.T., Haasio, A., Ruokolainen, T., Hassan, L., Siuda, P., Hamari, J. (2021). Information Protection in Dark Web Drug Markets Research [in:] Proceedings of the 54th Hawaii International Conference on System Sciences, HICSS 2021, Grand Hyatt Kauai, Hawaii, USA, 4-8 January 2021, Maui, Hawaii, (ed.) Tung X. Bui, Honolulu, HI, pp. 4673-4680.

    The primary safeguard was the early-stage hashing of usernames and identifiers from the messages, utilizing automated systems for irreversible hashing. Recognizing that automatic name removal might not catch all identifiers, the data underwent manual review to ensure compliance with research ethics and thorough anonymization.

  3. Data from: A consensus compound/bioactivity dataset for data-driven drug...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    zip
    Updated May 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6320761
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Laura Isigkeit; Laura Isigkeit; Apirat Chaikuad; Apirat Chaikuad; Daniel Merk; Daniel Merk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Information

    The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144803 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design.

    The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation.

    Structure and content of the dataset

    Dataset structure

    ChEMBL

    ID

    PubChem

    ID

    IUPHAR

    ID

    Target

    Activity

    type

    Assay typeUnitMean C (0)...Mean PC (0)...Mean B (0)...Mean I (0)...Mean PD (0)...Activity check annotationLigand namesCanonical SMILES C...Structure checkSource

    The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file.

    Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format.

    Column content:

    • ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases
    • Target: biological target of the molecule expressed as the HGNC gene symbol
    • Activity type: for example, pIC50
    • Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified
    • Unit: unit of bioactivity measurement
    • Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database
    • Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence
      • no comment: bioactivity values are within one log unit;
      • check activity data: bioactivity values are not within one log unit;
      • only one data point: only one value was available, no comparison and no range calculated;
      • no activity value: no precise numeric activity value was available;
      • no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration
    • Ligand names: all unique names contained in the five source databases are listed
    • Canonical SMILES columns: Molecular structure of the compound from each database
    • Structure check: To denote matching or differing compound structures in different source databases
      • match: molecule structures are the same between different sources;
      • no match: the structures differ;
      • 1 source: no structure comparison is possible, because the molecule comes from only one source database.
    • Source: From which databases the data come from

  4. f

    MEDISEG

    • city.figshare.com
    application/x-gzip
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Chu (2025). MEDISEG [Dataset]. http://doi.org/10.25383/city.28574786.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset provided by
    City, University of London
    Authors
    William Chu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset OverviewMEDISEG (MEDication Image SEGmentation) is a high-quality, real-world dataset designed for the development and evaluation of pill recognition models. It contains two subsets:MEDISEG (3-Pills): A controlled dataset featuring three pill types with subtle differences in shape and color.MEDISEG (32-Pills): A more diverse dataset containing 32 distinct pill classes, reflecting real-world challenges such as occlusions, varied lighting conditions, and multiple medications in a single frame.Each subset includes COCO-format annotations with instance segmentation masks, bounding boxes, and class labels.Dataset StructureThe dataset is organized as follows:MEDISEG/│── LICENSE│── metadata.csv│── 3pills/│ ├── annotations.json│ ├── images/│ │ ├── image1.jpg│ │ ├── image2.jpg│── 32pills/│ ├── annotations.json│ ├── images/│ │ ├── image1.jpg│ │ ├── image2.jpgLICENSE: The CC BY 4.0 license under which the dataset is distributed.metadata.csv: Supplementary drug information, including registration numbers, brand names, active ingredients, regulatory classifications, and official URLs.annotations.json: COCO-format annotation files providing segmentation masks, bounding boxes, and class labels.images/: High-resolution JPG images of medications.AcknowledgementsIf you use this dataset, please cite the corresponding publication:bibtex@inproceedings{MEDISEG2025,title = {MEDISEG: A large-scale dataset of medication images with instance segmentation masks for preventing adverse drug events},author = {Chu, Wai Ip and Hirani, Shashi and Tarroni, Giacomo and Li, Ling},journal = {Nature Scientific Data},year = {2025},url = {https://example.com}}

  5. O

    Sample of Drugs from QHP drug.json files

    • healthdata.demo.socrata.com
    csv, xlsx, xml
    Updated Apr 16, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Sample of Drugs from QHP drug.json files [Dataset]. https://healthdata.demo.socrata.com/CMS-Insurance-Plans/Sample-of-Drugs-from-QHP-drug-json-files/jaa8-k3k2
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Apr 16, 2016
    Description
  6. A

    ‘Drug Consumptions (UCI)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Drug Consumptions (UCI)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-drug-consumptions-uci-58a9/20dcfc96/?iid=052-359&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Drug Consumptions (UCI)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/obeykhadija/drug-consumptions-uci on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Data Set Information:

    Database contains records for 1885 respondents. For each respondent 12 attributes are known: Personality measurements which include NEO-FFI-R (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness), BIS-11 (impulsivity), and ImpSS (sensation seeking), level of education, age, gender, country of residence and ethnicity. All input attributes are originally categorical and are quantified. After quantification values of all input features can be considered as real-valued. In addition, participants were questioned concerning their use of 18 legal and illegal drugs (alcohol, amphetamines, amyl nitrite, benzodiazepine, cannabis, chocolate, cocaine, caffeine, crack, ecstasy, heroin, ketamine, legal highs, LSD, methadone, mushrooms, nicotine and volatile substance abuse and one fictitious drug (Semeron) which was introduced to identify over-claimers. For each drug they have to select one of the answers: never used the drug, used it over a decade ago, or in the last decade, year, month, week, or day.

    Detailed description of database and process of data quantification are presented in E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan and A. N. Gorban, "The Five Factor Model of personality and evaluation of drug consumption risk.," arXiv [Web Link], 2015 Paper above solve binary classification problem for all drugs. For most of drugs sensitivity and specificity are greater than 75%

    Since all of the features have been quantified into real values please refer to the link to the original dataset to get more clarity on categorical variables. For example, for EScore (extraversion) 9 people scored 55 which corresponds to a quantified (real) value of in the dataset 2.57309. I have also converted some variables back into their categorical values which are included in the drug_consumption.csv file Original Dataset

    Content

    Feature Attributes for Quantified Data: 1. ID: is a number of records in an original database. Cannot be related to the participant. It can be used for reference only. 2. Age (Real) is the age of participant 3. Gender: Male or Female 4. Education: level of education of participant 5. Country: country of origin of the participant 6. Ethnicity: ethnicity of participant 7. Nscore (Real) is NEO-FFI-R Neuroticism 8. Escore (Real) is NEO-FFI-R Extraversion 9. Oscore (Real) is NEO-FFI-R Openness to experience. 10. Ascore (Real) is NEO-FFI-R Agreeableness. 11. Cscore (Real) is NEO-FFI-R Conscientiousness. 12. Impulsive (Real) is impulsiveness measured by BIS-11 13. SS (Real) is sensation seeing measured by ImpSS 14. Alcohol: alcohol consumption 15. Amphet: amphetamines consumption 16. Amyl: nitrite consumption 17. Benzos: benzodiazepine consumption 18. Caff: caffeine consumption 19. Cannabis: marijuana consumption 20. Choc: chocolate consumption 21. Coke: cocaine consumption 22. Crack: crack cocaine consumption 23. Ecstasy: ecstasy consumption 24. Heroin: heroin consumption 25. Ketamine: ketamine consumption 26. Legalh: legal highs consumption 27. LSD: LSD consumption 28. Meth: methadone consumption 29. Mushroom: magic mushroom consumption 30. Nicotine: nicotine consumption 31. Semer: class of fictitious drug Semeron consumption (i.e. control) 32. VSA: class of volatile substance abuse consumption

    Rating's for Drug Use: - CL0 Never Used - CL1 Used over a Decade Ago - CL2 Used in Last Decade - CL3 Used in Last Year 59 - CL4 Used in Last Month - CL5 Used in Last Week - CL6 Used in Last Day

    Acknowledgements

    1. Elaine Fehrman, Men's Personality Disorder and National Women's Directorate, Rampton Hospital, Retford, Nottinghamshire, DN22 0PD, UK, Elaine.Fehrman@nottshc.nhs.uk

    2. Vincent Egan, Department of Psychiatry and Applied Psychology, University of Nottingham, Nottingham, NG8 1BB, UK, Vincent.Egan@nottingham.ac.uk

    3. Evgeny M. Mirkes Department of Mathematics, University of Leicester, Leicester, LE1 7RH, UK, em322@le.ac.uk

    Inspiration

    Problem which can be solved: - Seven class classifications for each drug separately. - Problem can be transformed to binary classification by union of part of classes into one new class. For example, "Never Used", "Used over a Decade Ago" form class "Non-user" and all other classes form class "User". - The best binarization of classes for each attribute. - Evaluation of risk to be drug consumer for each drug.

    --- Original source retains full ownership of the source dataset ---

  7. Data from: Dataset on drug use in 2020 (COVID-19 lockdown) in Spain and...

    • zenodo.org
    • ekoizpen-zientifikoa.ehu.eus
    • +3more
    bin, csv, txt
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Estévez Danta; Andrea Estévez Danta; Lubertus Bijlsma; Lubertus Bijlsma; Ricardo Capela; Ricardo Capela; Rafael Cela; Rafael Cela; Alberto Celma; Alberto Celma; Félix Hernández; Félix Hernández; Unax Lertxundi; Unax Lertxundi; João Matias; Rosa Montes; Rosa Montes; GORKA ORIVE; GORKA ORIVE; Ailette Prieto; Ailette Prieto; Miguel M. Santos; Miguel M. Santos; Rosario Rodil; Rosario Rodil; José Benito Quintana; José Benito Quintana; João Matias (2024). Dataset on drug use in 2020 (COVID-19 lockdown) in Spain and Portugal by wastewater-based epidemiology [Dataset]. http://doi.org/10.5281/zenodo.10829752
    Explore at:
    txt, csv, binAvailable download formats
    Dataset updated
    Mar 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrea Estévez Danta; Andrea Estévez Danta; Lubertus Bijlsma; Lubertus Bijlsma; Ricardo Capela; Ricardo Capela; Rafael Cela; Rafael Cela; Alberto Celma; Alberto Celma; Félix Hernández; Félix Hernández; Unax Lertxundi; Unax Lertxundi; João Matias; Rosa Montes; Rosa Montes; GORKA ORIVE; GORKA ORIVE; Ailette Prieto; Ailette Prieto; Miguel M. Santos; Miguel M. Santos; Rosario Rodil; Rosario Rodil; José Benito Quintana; José Benito Quintana; João Matias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    This datase contains the metadata associated with this publication:
    A. Estévez-Danta, L. Bijlsma, R. Capela, R. Cela, A. Celma, F. Hernández, U. Lertxundi, J. Matias, R. Montes, G. Orive, A. Prieto, M.M. Santos, R. Rodil, J.B. Quintana
    Use of illicit drugs, alcohol and tobacco in Spain and Portugal during the COVID-19 crisis in 2020 as measured by wastewater-based epidemiology
    Science of the Total Environment, 2022, 836, 155697
    The data is deposited in ZENODO:
    If you reuse the data, please cite the publication and ZENODO deposit mentioned above
    Explanation of the different sheets of the Excel file (All_Data_STOTEN_2022_155697) or different individual CSV files (named as below):
    • WWTP_details: explanation of wastewater treatment plats (WWTPs) sampled, flow rates, etc.
    • Concentrations: concentrations measured in the samples
    • PNDL: population normalized daily loads calculated per each sample
    • Consumption: estimated drug use (see the publication for correction factors)
    • EF: enantiomeric fraction, expressed as fraction of the R-enantiomer for the samples analyzed
    Abreviations
    • AMP Amphetamine
    • MAMP Methamphetamine
    • MDMA 3,4-Methylenedioxymethamphetamine
    • BE Benzoylecgonine
    • COC Cocaine
    • THC-COOH 11-Nor-9-carboxy-Δ9-tetrahydrocannabinol
    • THC Δ9-Tetrahydrocannabinol
    • COT Cotinine
    • OH-COT Trans-3'-Hydroxycotinine
    • NIC Nicotine
    • EtS Ethyl sulfate
  8. PPG Signals Subject Files (Format: .csv) (Only PPG signal data)

    • figshare.com
    csv
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deletedd Deletedddd; Jhon Freddy Esquivel Aguirre (2024). PPG Signals Subject Files (Format: .csv) (Only PPG signal data) [Dataset]. http://doi.org/10.6084/m9.figshare.27132288.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    figshare
    Authors
    Deletedd Deletedddd; Jhon Freddy Esquivel Aguirre
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset AssociationThis dataset belongs to the project "PPG Signals and Cholesterol Data: Repository for the Validation of Total Blood Cholesterol Estimation Methods" where different PPG signals are presented together with cholesterol information of the subjects. This is done with the objective of validating tools or methods for estimating the total blood cholesterol level from the PPG signal.Dataset DescriptionThis dataset contains files in .csv (Comma-separated values) format, corresponding to the PPG signal of 46 subjects. Subject data such as age, sex, and cholesterol are not found in the files presented here. If these data are needed in the records, they can be located in the following dataset within this project "PPG Signals & Cholesterol Data Subject Files (Format: .csv)". Other data such as weight, height and whether the subject is on medication can be found in the excel document included in the project.Dataset format.csv (Comma-separated values)Other formats available in the project:.txt (Text file).json (JavaScript Object Notation).mat (MATLAB file)

  9. d

    Touch Gesture and Emotion dataset of Tianjin University (TouchGET)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Li, Yunkai; Meng, Qinghao (2023). Touch Gesture and Emotion dataset of Tianjin University (TouchGET) [Dataset]. http://doi.org/10.7910/DVN/Z9IRNM
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Li, Yunkai; Meng, Qinghao
    Description

    The TouchGET is an affective touch gesture dataset involving 10 kinds of touch gestures and 12 kinds of discrete emotions. The dataset is grouped into fifteen folders using the following name convention: ‘subject’, the subject’s index (from 1 to 15), and the subject’s sex (‘M’ or ‘F’). An example of a folder name is: ‘subject_1_M’. Each folder contains thirteen subfolders, twelve of which are named after the 12 types of emotions, and each of which includes samples of gestures associated with that emotion. The gesture samples are saved as comma-separated value (CSV) files, using the following naming convention: gesture number, touch variant (‘1’ represents gentle, and ‘2’ rude.), and gesture counting index. An example of a file name is: ‘B_1_5.csv’. In addition, the last subfolder is named after ‘rest’, which contains gesture samples that were not selected under any emotion.

  10. NHANES 1988-2018

    • figshare.com
    application/gzip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).

  11. d

    Health Index. Ukraine (2016) - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Jan 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Health Index. Ukraine (2016) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/d73871c3-22d1-5732-a31d-6837e8b959f0
    Explore at:
    Dataset updated
    Jan 11, 2025
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    "Health Index. Ukraine" is a large-scale, empirical, and representative study aimed at collecting quantitative data on the population's health-related knowledge and behaviors, as well as their evaluations of healthcare service quality based on personal experiences.
    The data for this study are derived from sociological surveys of the adult population. The initial survey was conducted in 2016 by the Kyiv International Institute of Sociology (KIIS) in collaboration with Social Indicators Center under initiative and with funding from the International Renaissance Foundation.
    The survey covered topics such as health and health-seeking behavior, early disease detection, patient experiences with outpatient and inpatient care (including questions on official and unofficial expenses), ambulance and pediatric services, medication availability, satisfaction with medical care, and perceptions of healthcare reforms.
    The survey sample is representative of the 18+ population of Ukraine as a whole, as well as each of the oblasts covered by the study and the city of Kyiv. Temporarily occupied territories of the Autonomous Republic of Crimea, the city of Sevastopol, and certain districts of the Donetsk and Luhansk oblasts, where government authorities temporarily do not exercise their powers, were not included in the study.
    Data were collected through face-to-face interviews conducted at respondents' places of residence. In total, 10,178 respondents aged 18 and older were interviewed between May 15 and June 30, 2016.
    The data is available in an SAV format (Ukrainian) and a converted CSV format (with a codebook). Field questionnaires (in Ukrainian and Russian) are also included.
    The study results, including detailed analytical reports in both Ukrainian and English, as well as selected infographics, are accessible on the project website: https://healthindex.com.ua/.

  12. d

    Health Index. Ukraine (2017) - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Health Index. Ukraine (2017) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/161c987f-682c-5c07-b204-00624aefae5f
    Explore at:
    Dataset updated
    Jan 15, 2025
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    "Health Index. Ukraine" is a large-scale, empirical, and representative study aimed at collecting quantitative data on the population's health-related knowledge and behaviors, as well as their evaluations of healthcare service quality based on personal experiences.
    In 2017, the second wave of the study was conducted. It was carried out by the Kyiv International Institute of Sociology in collaboration with the Social Indicators Center, under initiative and with funding and support from the International Renaissance Foundation and the World Bank.
    The survey covered a range of topics, including health and health-seeking behavior, early disease detection, patient experiences with outpatient and inpatient care (including questions on official and unofficial expenses), pediatric services, medication availability, satisfaction with medical care, and perceptions of healthcare reforms.
    The survey sample is representative of the 18+ population of Ukraine as a whole, as well as for each of the oblasts covered by the study and the city of Kyiv. Temporarily occupied territories of the Autonomous Republic of Crimea, the city of Sevastopol, and certain districts of the Donetsk and Luhansk oblasts, where government authorities temporarily do not exercise their powers, were not included in the study.
    Data were collected through face-to-face interviews using tablets (CAPI) conducted at respondents' places of residence. In total, 10,184 respondents aged 18 and older were interviewed between May 18 and June 27, 2017.
    The data is available in an SAV format (Ukrainian) and a converted CSV format (with a codebook in Ukrainian). Field questionnaires (in Ukrainian and Russian) and technical report describing the methodology (in Ukrainian) are also included.
    The study results, including detailed analytical reports in both Ukrainian and English, as well as selected infographics, are accessible on the project website: https://healthindex.com.ua/.

  13. Naturalistic Neuroimaging Database

    • openneuro.org
    Updated Jul 21, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper (2020). Naturalistic Neuroimaging Database [Dataset]. http://doi.org/10.18112/openneuro.ds002837.v1.1.1
    Explore at:
    Dataset updated
    Jul 21, 2020
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Sarah Aliko; Jiawen Huang; Florin Gheorghiu; Stefanie Meliss; Jeremy I Skipper
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Overview

    • The Naturalistic Neuroimaging Database (NNDb v1.0) contains datasets from 86 human participants doing the NIH Toolbox and then watching one of 10 full-length movies during functional magnetic resonance imaging (fMRI).The participants were all right-handed, native English speakers, with no history of neurological/psychiatric illnesses, with no hearing impairments, unimpaired or corrected vision and taking no medication. Each movie was stopped in 40-50 minute intervals or when participants asked for a break, resulting in 2-6 runs of BOLD-fMRI. A 10 minute high-resolution defaced T1-weighted anatomical MRI scan (MPRAGE) is also provided.

    The NIH Toolbox data files are

    • nih_demographics.csv
    • nih_data.csv
    • nih_scores.csv

    The stimuli can be found and purchased using the following EAN and ASIN numbers

    • 500 Days of Summer: EAN = 5039036043359; ASIN = B002KKBMSW
    • Citizenfour: EAN = 5050968002313; ASIN = B00YP65NEI
    • 12 Years a Slave: EAN = 5030305517229; ASIN = B00HR23CCM
    • Back to the Future: EAN = 5050582401288; ASIN = B000BVK82I
    • Little Miss Sunshine: EAN = 5039036029667; ASIN = B000JU9OJ4
    • The Prestige: EAN = 7321902106472; ASIN = B000K7LQS8
    • Pulp Fiction: EAN = 5060223762043; ASIN = B004UGAMY4
    • The Shawshank Redemption: EAN = 5037115299635; ASIN = B001CWLFKE
    • Split: EAN = 5902115603099; ASIN = B071J24232
    • The Usual Suspects: EAN = 5039036033497; ASIN = B0010YXNGI

    Data is organized as follows

    • The sub-
    • The derivatives sub-
    • Some inital stimulus annotations can be found in the stimuli folder.
    • The mriqc derivatives folder contain the MRIQC no-reference image quality metrics for the NNDb anatomical and functional data.

    Notes

    • Subjects 3-6, 10, 11, 24, 28, 29, 31, 39, 41, 72, 83-85 did not have the original IMA files to format into BIDS, so they were manually created (functionals) or copied in from other subjects (anatomicals). These will be updated once access to UCL facilities is restored after the COVID-19 lockdown.
    • If you plan to use the raw data with the stimuli / annotations, please be aware that some temporal interpolation is necessary. See our manuscript for details and GitHub for an example script to do this. Or just email one of us.
  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics [Dataset]. https://orda.shef.ac.uk/articles/dataset/Orphan_Drugs_-_Dataset_1_Twitter_issue-networks_as_excluded_publics/16447326

Orphan Drugs - Dataset 1: Twitter issue-networks as excluded publics

Explore at:
txtAvailable download formats
Dataset updated
Oct 22, 2021
Dataset provided by
The University of Sheffield
Authors
Matthew Hanchard
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset comprises of two .csv format files used within workstream 2 of the Wellcome Trust funded ‘Orphan drugs: High prices, access to medicines and the transformation of biopharmaceutical innovation’ project (219875/Z/19/Z). They appear in various outputs, e.g. publications and presentations.

The deposited data were gathered using the University of Amsterdam Digital Methods Institute’s ‘Twitter Capture and Analysis Toolset’ (DMI-TCAT) before being processed and extracted from Gephi. DMI-TCAT queries Twitter’s STREAM Application Programming Interface (API) using SQL and retrieves data on a pre-set text query. It then sends the returned data for storage on a MySQL database. The tool allows for output of that data in various formats. This process aligns fully with Twitter’s service user terms and conditions. The query for the deposited dataset gathered a 1% random sample of all public tweets posted between 10-Feb-2021 and 10-Mar-2021 containing the text ‘Rare Diseases’ and/or ‘Rare Disease Day’, storing it on a local MySQL database managed by the University of Sheffield School of Sociological Studies (http://dmi-tcat.shef.ac.uk/analysis/index.php), accessible only via a valid VPN such as FortiClient and through a permitted active directory user profile. The dataset was output from the MySQL database raw as a .gexf format file, suitable for social network analysis (SNA). It was then opened using Gephi (0.9.2) data visualisation software and anonymised/pseudonymised in Gephi as per the ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee on 02-Jun-201 (reference: 039187). The deposited dataset comprises of two anonymised/pseudonymised social network analysis .csv files extracted from Gephi, one containing node data (Issue-networks as excluded publics – Nodes.csv) and another containing edge data (Issue-networks as excluded publics – Edges.csv). Where participants explicitly provided consent, their original username has been provided. Where they have provided consent on the basis that they not be identifiable, their username has been replaced with an appropriate pseudonym. All other usernames have been anonymised with a randomly generated 16-digit key. The level of anonymity for each Twitter user is provided in column C of deposited file ‘Issue-networks as excluded publics – Nodes.csv’.

This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 26-Aug-2021 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman institute/School of Sociological Studies. ORDA has full permission to store this dataset and to make it open access for public re-use without restriction under a CC BY license, in line with the Wellcome Trust commitment to making all research data Open Access.

The University of Sheffield are the designated data controller for this dataset.

Search
Clear search
Close search
Google apps
Main menu