57 datasets found
  1. UCI ML Drug Review dataset

    • kaggle.com
    Updated Dec 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jessica Li (2018). UCI ML Drug Review dataset [Dataset]. https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018/home
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jessica Li
    Description

    This dataset was used for the Winter 2018 Kaggle University Club Hackathon and is now publicly available. See Acknowledgments section for citation and licensing. Note: The types of data and recommendation based solutions provided by the contestants are purely for NLP learning purposes. They are not suitable for a real world drug recommendations solutions.

    Welcome to the Kaggle University Club Hackathon!

    If you are interested in joining Kaggle University Club, please e-mail Jessica Li at lijessica@google.com

    This Hackathon is open to all undergraduate, master, and PhD students who are part of the Kaggle University Club program. The Hackathon provides students with a chance to build capacity via hands-on ML, learn from one another, and engage in a self-defined project that is meaningful to their careers.

    Teams must register via Google Form to be eligible for the Hackathon. The Hackathon starts on Monday, November 12, 2018 and ends on Monday, December 10, 2018. Teams have one month to work on a team submission. Teams must do all work within the Kernel editor and set Kernel(s) to public at all times.

    Prompt

    The freestyle format of hackathons has time and again stimulated groundbreaking and innovative data insights and technologies. The Kaggle University Club Hackathon recreates this environment virtually on our platform. We challenge you to build a meaningful project around the UCI Machine Learning - Drug Review Dataset. Teams are free to let their creativity run and propose methods to analyze this dataset and form interesting machine learning models.

    Machine learning has permeated nearly all fields and disciplines of study. One hot topic is using natural language processing and sentiment analysis to identify, extract, and make use of subjective information. The UCI ML Drug Review dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites. This data was published in a study on sentiment analysis of drug experience over multiple facets, ex. sentiments learned on specific aspects such as effectiveness and side effects (see the acknowledgments section to learn more).

    The sky's the limit here in terms of what your team can do! Teams are free to add supplementary datasets in conjunction with the drug review dataset in their Kernel. Discussion is highly encouraged within the forum and Slack so everyone can learn from their peers.

    Here are just a couple ideas as to what you could do with the data:

    • Classification: Can you predict the patient's condition based on the review?
    • Regression: Can you predict the rating of the drug based on the review?
    • Sentiment analysis: What elements of a review make it more helpful to others? Which patients tend to have more negative reviews? Can you determine if a review is positive, neutral, or negative?
    • Data visualizations: What kind of drugs are there? What sorts of conditions do these patients have?

    Top Submissions

    There is no one correct answer to this Hackathon, and teams are free to define the direction of their own project. That being said, there are certain core elements generally found across all outstanding Kernels on the Kaggle platform. The best Kernels are:

    1. Complex: How many domains of analysis and topics does this Kernel cover? Does it attempt machine learning methods? Does the Kernel offer a variety of unique analyses and interesting conclusions or solutions?
    2. Original: What is the subject matter of this Kernel? Does it have a well-defined and interesting project scope, narrative or problem? Could the results make an impact? Is it thought provoking?
    3. Approachable: How easy is it to understand this Kernel? Are all thought processes clear? Is the code clean, with useful comments? Are visualizations and processes articulated and self-explanatory?

    Teams with top submissions have a chance to receive exclusive Kaggle University Club swag and be featured on our official blog and across social media.

    IMPORTANT: Teams must set all Kernels to public at all times. This is so we can track each team's progression, but more importantly it encourages collaboration, productive discussion, and healthy inspiration to all teams. It is not so that teams can simply copycat good ideas. If a team's Kernel isn't their own organic work, it will not be considered a top submission. Teams must come up with a project on their own.

    Submission Styling

    The final Kernel submission for the Hackathon must contain the following information:

    • All team members added as collaborators to the Kernel
    • Somewhere at the top of your Kernel, find a space to put down all team member names, university name, club name, and team name (as specified whe...
  2. o

    Medication Ratings and Conditions Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Medication Ratings and Conditions Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/eddbc84d-5fd5-421c-b98b-02677066e4c4
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Health Information Systems & Technology
    Description

    This dataset provides detailed information on various drugs used for a multitude of medical conditions such as acne, cancer, and heart disease. It includes essential details about drug efficacy based on user ratings and experiences, as well as specific information on side effects. The dataset aims to offer insights into how different medications are perceived by users concerning their effectiveness, considering both positive and adverse effects.

    Columns

    • drug_name: The name of the drug.
    • medical_condition: The name of the medical condition the drug is used for. Examples include Pain (defined as "An unpleasant sensory and emotional experience associated with actual or potential tissue damage or described in terms of such damage") and Cold Symptoms (also known as Common Cold or Coryza, characterised by congestion of the nasal mucous membrane, watery nasal rhinorrhoea, and general malaise, with a typical duration of 3–5 days).
    • medical_condition_description: A detailed description of the medical condition.
    • activity: An indicator of recent site visitor activity relative to other medications listed, based on data gathered from drugs.com.
    • rx_otc: Indicates the drug's classification:
      • Rx: Prescription Needed.
      • OTC: Over-the-counter, meaning it can be purchased without a medical prescription.
      • Rx/OTC: Can be either prescription or over-the-counter.
    • pregnancy_category: Classifies the drug's risk to a foetus during pregnancy:
      • A: Adequate and well-controlled studies show no risk in the first trimester (and no evidence of risk later).
      • B: Animal reproduction studies show no foetal risk, but no adequate human studies exist.
      • C: Animal studies show adverse foetal effects, no adequate human studies, but potential benefits may warrant use despite risks.
      • D: Positive evidence of human foetal risk, but potential benefits may warrant use despite risks.
      • X: Studies show foetal abnormalities and human foetal risk, risks clearly outweigh potential benefits.
      • N: FDA has not classified the drug.
    • csa: Controlled Substances Act (CSA) Schedule:
      • M: Multiple schedules, depends on dosage/strength.
      • U: Schedule unknown.
      • N: Not subject to CSA.
      • 1: High abuse potential, no accepted medical use, lack of accepted safety.
      • 2: High abuse potential, accepted medical use (with severe restrictions), abuse may lead to severe dependence.
      • 3: Lower abuse potential than 1 & 2, accepted medical use, abuse may lead to moderate/low physical or high psychological dependence.
      • 4: Low abuse potential relative to 3, accepted medical use, abuse may lead to limited physical or psychological dependence.
      • 5: Low abuse potential relative to 4, accepted medical use, abuse may lead to limited physical or psychological dependence.
    • alcohol: Indicates interaction with alcohol (X = Interacts with Alcohol).
    • rating: User-assigned rating (1 = not effective, 10 = most effective), reflecting effectiveness, side effects, and ease of use. Ratings range from 0.00 to 10.00.
    • no_of_reviews: The number of reviews received for the drug. There are 2912 unique values for this column.

    Distribution

    The dataset is typically provided in a CSV file format. While specific total row/record counts are not explicitly stated, the presence of 2912 unique review counts and a wide range of ratings suggest a substantial number of entries. The data appears to be structured in a tabular manner.

    Usage

    This dataset is ideal for: * Analysing drug efficacy based on real-world user feedback. * Researching user experiences with various medications. * Developing applications related to health information systems. * Performing Natural Language Processing (NLP) on drug descriptions and reviews to extract insights. * Understanding the landscape of prescription (Rx) versus over-the-counter (OTC) medications.

    Coverage

    The dataset's coverage is global, making it relevant for a worldwide audience. It was listed on 11th June 2025. There are no specific notes on demographic scope or data availability for certain groups or years explicitly mentioned.

    License

    CCO

    Who Can Use It

    This dataset is suitable for: * Healthcare Professionals: To gain insights into patient experiences and drug effectiveness. * Researchers: For studies on pharmacology, public health, and patient outcomes. * Data Analysts: To identify trends and patterns in drug usage and side effects. * Software Developers: For building health-related applications, AI models, or recommendation systems. * Patients/Consumers: To inform decisions about medications based on aggregated user experiences.

    Dataset Name Suggestions

    • Drug Efficacy and User Experience Data
    • Medication Ratings and Conditions Dataset
  3. m

    MID: Medicines Information Dataset

    • data.mendeley.com
    Updated Nov 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hezam Gawbah (2024). MID: Medicines Information Dataset [Dataset]. http://doi.org/10.17632/2vk5khfn6v.3
    Explore at:
    Dataset updated
    Nov 26, 2024
    Authors
    Hezam Gawbah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Numerous studies on medicines are conducted day by day. To address shortcomings of medicines information generation, prediction, and classification models, the authors introduce a large medicines information dataset of textual data. For this motivation, the authors named the medicines information dataset ‘MID’ .

    • Value of the data - The dataset comprises extensive medicines information, featuring over 192k rows distributed across 22 diverse therapeutic classes. - The dataset can be beneficial to the classification of therapeutic classes and robust for the prediction and generation of medicines information such as indications or interactions for enhancing efficiencies in clinical trial management, facilitating a detailed analysis of the risk affecting participants in clinical trials. - The dataset includes the name, link, contains, introduction, uses, benefits, side effects, how to use, how the drug works, quick tips, chemical class, habit forming, therapeutic class, action class, safety advice to alcohol, safety advice to pregnancy, safety advice to breastfeeding, safety advice to driving, safety advice to kidney, and safety advice to the liver. - The dataset is big data, making it a suitable corpus for implementing both classical as well as deep learning models. - The dataset provides a useful resource for medical researchers, healthcare professionals, drug manufacturers, data scientists, and enthusiasts interested in exploring the world of medicines and healthcare products preclinical for drug development and design.

    • MID.xlsx provides the raw data, including medicine information. The data collected to ensure an acceleration and save experimental efforts for medicines through help in predicting or generating or classifying of medicine information preclinically.

    • Therapeutic_class_counts.xlsx is summarize distribution of medicines per therapeutic class.

  4. n

    Data from: A dataset quantifying polypharmacy in the United States

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katie J. Quinn; Nigam H. Shah (2018). A dataset quantifying polypharmacy in the United States [Dataset]. http://doi.org/10.5061/dryad.sm847
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 16, 2018
    Dataset provided by
    Stanford Center for Biomedical Informatics Research, Stanford, USA
    Authors
    Katie J. Quinn; Nigam H. Shah
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    United States
    Description

    Polypharmacy is increasingly common in the United States, and contributes to the substantial burden of drug-related morbidity. Yet real-world polypharmacy patterns remain poorly characterized. We have counted the incidence of multi-drug combinations observed in four billion patient-months of outpatient prescription drug claims from 2007-2014 in the Truven Health MarketScan® Databases. Prescriptions are grouped into discrete windows of concomitant drug exposure, which are used to count exposure incidences for combinations of up to five drug ingredients or ATC drug classes. Among patients taking any prescription drug, half are exposed to two or more drugs, and 5% are exposed to 8 or more. The most common multi-drug combinations treat manifestations of metabolic syndrome. Patients are exposed to unique drug combinations in 10% of all exposure windows. Our analysis of multi-drug exposure incidences provides a detailed summary of polypharmacy in a large US cohort, which can prioritize common drug combinations for future safety and efficacy studies.

  5. A

    ‘wholesale vs retail drugs’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘wholesale vs retail drugs’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wholesale-vs-retail-drugs-fa99/a7cfa1ba/?iid=054-496&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘wholesale vs retail drugs’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ramjasmaurya/wholesale-vs-retail-drugs-price-and-purity on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    the Dataset has 3 files all have full details of illegal drugs sold in and around the world. dataset has 1 file is .xlsv format dataset has 2 file is .xlsv format dataset has 3 file is .csv format

    all consist of columns related to price and drug purity according to their wholesale and retail price. Thanks, have a GREAT DAY OR NIGHT. KEEP UPVOTING.........................................................................................................

    --- Original source retains full ownership of the source dataset ---

  6. r

    DrugBank - Open Data Drug and Drug Target Database

    • researchdata.edu.au
    Updated May 2, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    QFAB (2013). DrugBank - Open Data Drug and Drug Target Database [Dataset]. https://researchdata.edu.au/drugbank-open-drug-target-database/14044
    Explore at:
    Dataset updated
    May 2, 2013
    Dataset provided by
    QFAB
    Description

    The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6712 drug entries including 1448 FDA-approved small molecule drugs, 131 FDA-approved biotech (protein/peptide) drugs, 85 nutraceuticals and 5080 experimental drugs. Additionally, 4227 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. DrugBank is supported by David Wishart, Departments of Computing Science X Biological Sciences, University of Alberta. DrugBank is also supported by The Metabolomics Innovation Centre, a Genome Canada-funded core facility serving the scientific community and industry with world-class expertise and cutting-edge technologies in metabolomics.

  7. f

    Table_1_Use of National Database of Health Insurance Claims and Specific...

    • frontiersin.figshare.com
    pdf
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haruka Shida; Kazuhiro Kajiyama; Sono Sawada; Chieko Ishiguro; Mikiko Kubo; Ryota Kimura; Mai Hirano; Noriyuki Komiyama; Toyotaka Iguchi; Yukio Oniyama; Yoshiaki Uyama (2023). Table_1_Use of National Database of Health Insurance Claims and Specific Health Checkups for examining practical utilization and safety signal of a drug to support regulatory assessment on postmarketing drug safety in Japan.pdf [Dataset]. http://doi.org/10.3389/fmed.2023.1096992.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    Frontiers
    Authors
    Haruka Shida; Kazuhiro Kajiyama; Sono Sawada; Chieko Ishiguro; Mikiko Kubo; Ryota Kimura; Mai Hirano; Noriyuki Komiyama; Toyotaka Iguchi; Yukio Oniyama; Yoshiaki Uyama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Japan
    Description

    The Pharmaceuticals and Medical Devices Agency (PMDA) has conducted many pharmacoepidemiological studies for postmarketing drug safety assessments based on real-world data from medical information databases. One of these databases is the National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB), containing health insurance claims of almost all Japanese individuals (over 100 million) since April 2009. This article describes the PMDA’s regulatory experiences in utilizing the NDB for postmarketing drug safety assessment, especially focusing on the recent cases of use of the NDB to examine the practical utilization and safety signal of a drug. The studies helped support regulatory decision-making for postmarketing drug safety, such as considering a revision of prescribing information of a drug, confirming the appropriateness of safety measures, and checking safety signals in real-world situations. Different characteristics between the NDB and the MID-NET® (another database in Japan) were also discussed for appropriate selection of data source for drug safety assessment. Accumulated experiences of pharmacoepidemiological studies based on real-world data for postmarketing drug safety assessment will contribute to evolving regulatory decision-making based on real-world data in Japan.

  8. T

    VA Drug Pricing Database

    • data.va.gov
    • datahub.va.gov
    • +1more
    application/rdfxml +5
    Updated Sep 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). VA Drug Pricing Database [Dataset]. https://www.data.va.gov/dataset/VA-Drug-Pricing-Database/pu94-4asd
    Explore at:
    csv, application/rssxml, application/rdfxml, json, tsv, xmlAvailable download formats
    Dataset updated
    Sep 12, 2019
    Description

    The VA Drug Pricing database contains the current prices for pharmaceuticals purchased by the federal government. These listed prices are based on the Federal Supply Schedule (FSS). This database is mandated by Public Law 102-585, the Veterans Health Care Act of 1992, which sets the maximum amount that a drug may be bought for by the Veterans Health Administration (VHA). The source of this information is contained in printed contracts or data files supplied by the drug manufacturers, representing the pricing agreements between VHA and the manufacturers. Price data is input by the National Acquisition Center (NAC) into the database administered by the Pharmacy Benefits Management Strategic Health Care Group. Information from this database is published on the World Wide Web at the following site: http://www.pbm.va.gov. The users of this database include pharmaceutical manufacturers, drug wholesalers, Office of Inspector General (OIG) and those who purchase pharmaceuticals for the VHA and other government agencies.

  9. Global Real World Evidence Solutions Market By Data Source (Electronic...

    • verifiedmarketresearch.com
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Real World Evidence Solutions Market By Data Source (Electronic Health Records, Claims Data, Registries, Medical Devices), By Therapeutic Area (Oncology, Cardiovascular Diseases, Neurology, Rare Diseases), By Application (Drug Development, Clinical Decision Support, Epidemiological Studies, Post-Marketing Surveillance), By Geographic Scope and Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/real-world-evidence-solutions-market/
    Explore at:
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Real World Evidence Solutions Market size was valued at USD 1.30 Billion in 2024 and is projected to reach USD 3.71 Billion by 2031, growing at a CAGR of 13.92% during the forecast period 2024-2031.

    Global Real World Evidence Solutions Market Drivers

    The market drivers for the Real World Evidence Solutions Market can be influenced by various factors. These may include:

    Growing Need for Evidence-Based Healthcare: Real-world evidence (RWE) is becoming more and more important in healthcare decision-making, according to stakeholders such as payers, providers, and regulators. In addition to traditional clinical trial data, RWE solutions offer important insights into the efficacy, safety, and value of healthcare interventions in real-world situations. Growing Use of RWE by Pharmaceutical Companies: RWE solutions are being used by pharmaceutical companies to assist with market entry, post-marketing surveillance, and drug development initiatives. Pharmaceutical businesses can find new indications for their current medications, improve clinical trial designs, and convince payers and providers of the worth of their products with the use of RWE. Increasing Priority for Value-Based Healthcare: The emphasis on proving the cost- and benefit-effectiveness of healthcare interventions in real-world settings is growing as value-based healthcare models gain traction. To assist value-based decision-making, RWE solutions are essential in evaluating the economic effect and real-world consequences of healthcare interventions. Technological and Data Analytics Advancements: RWE solutions are becoming more capable due to advances in machine learning, artificial intelligence, and big data analytics. With the use of these technologies, healthcare stakeholders can obtain actionable insights from the analysis of vast and varied datasets, including patient-generated data, claims data, and electronic health records. Regulatory Support for RWE Integration: RWE is being progressively integrated into regulatory decision-making processes by regulatory organisations including the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). The FDA's Real-World Evidence Programme and the EMA's Adaptive Pathways and PRIority MEdicines (PRIME) programme are two examples of initiatives that are making it easier to incorporate RWE into regulatory submissions and drug development. Increasing Emphasis on Patient-Centric Healthcare: The value of patient-reported outcomes and real-world experiences in healthcare decision-making is becoming more widely acknowledged. RWE technologies facilitate the collection and examination of patient-centered data, offering valuable insights into treatment efficacy, patient inclinations, and quality of life consequences. Extension of RWE Use Cases: RWE solutions are being used in medication development, post-market surveillance, health economics and outcomes research (HEOR), comparative effectiveness research, and market access, among other healthcare fields. The necessity for a variety of RWE solutions catered to the needs of different stakeholders is being driven by the expansion of RWE use cases.

  10. z

    Counts of Invasive drug resistant Streptococcus pneumoniae disease reported...

    • zenodo.org
    • data.niaid.nih.gov
    json, xml, zip
    Updated Jun 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Invasive drug resistant Streptococcus pneumoniae disease reported in UNITED STATES OF AMERICA: 2001-2010 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.406618009
    Explore at:
    zip, xml, jsonAvailable download formats
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    Project Tycho
    Authors
    Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 30, 2001 - Jan 2, 2010
    Area covered
    United States
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

    • Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
    • Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  11. A

    ‘💉 Opioid Overdose Deaths’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘💉 Opioid Overdose Deaths’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-opioid-overdose-deaths-2a74/19bc33fa/?iid=008-729&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘💉 Opioid Overdose Deaths’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/opioid-overdose-deathse on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Opioid addiction and death rates in the U.S. and abroad have reached "epidemic" levels. The CDC's data reflects the incredible spike in overdoses caused by drugs containing opioids.

    The United States is experiencing an epidemic of drug overdose (poisoning) deaths. Since 2000, the rate of deaths from drug overdoses has increased 137%, including a 200% increase in the rate of overdose deaths involving opioids (opioid pain relievers and heroin). Source: CDC

    In-the-News:

    This data was compiled using the CDC's WONDER database. Opioid overdose deaths are defined as: deaths in which the underlying cause was drug overdose, and the ICD-10 code used was any of the following: T40.0 (Opium), T40.1 (Heroin), T40.2 (Other opioids), T40.3 (Methadone), T40.4 (Other synthetic narcotics), T40.6 (Other and unspecified narcotics).

    Age-adjusted rate of drug overdose deaths and drug overdose deaths involving opioids
    http://i.imgur.com/ObpzUKq.gif" alt="Opioid Death Rate" style="">
    Source: CDC

    What are opioids?
    Opioids are substances that act on opioid receptors to produce morphine-like effects. Opioids are most often used medically to relieve pain. Opioids include opiates, an older term that refers to such drugs derived from opium, including morphine itself. Other opioids are semi-synthetic and synthetic drugs such as hydrocodone, oxycodone and fentanyl; antagonist drugs such as naloxone and endogenous peptides such as the endorphins.[4] The terms opiate and narcotic are sometimes encountered as synonyms for opioid. Source: Wikipedia

    contributors-wanted See comment in Discussion

    Footnotes

    • The crude rate is per 100,000.
    • Certain totals are hidden due to suppression constraints. More Information: http://wonder.cdc.gov/wonder/help/faq.html#Privacy.
    • The population figures are briged-race estimates. The exceptions being years 2000 and 2010, in which Census counts are used.
    • v1.1: Added Opioid Prescriptions Dispensed by US Retailers in that year (millions).

    Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. Multiple Cause of Death 1999-2014 on CDC WONDER Online Database, released 2015. Data are from the Multiple Cause of Death Files, 1999-2014, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/mcd-icd10.html on Oct 19, 2016 2:06:38 PM.

    Citation for Opioid Prescription Data: IMS Health, Vector One: National, years 1991-1996, Data Extracted 2011. IMS Health, National Prescription Audit, years 1997-2013, Data Extracted 2014. Accessed at NIDA article linked (Figure 1) on Oct 23, 2016.

    Data Use Restrictions:
    The Public Health Service Act (42 U.S.C. 242m(d)) provides that the data collected by the National Center for Health Statistics (NCHS) may be used only for the purpose for which they were obtained; any effort to determine the identity of any reported cases, or to use the information for any purpose other than for health statistical reporting and analysis, is against the law. Therefore users will:
    Use these data for health statistical reporting and analysis only.
    For sub-national geography, do not present or publish death counts of 9 or fewer or death rates based on counts of nine or fewer (in figures, graphs, maps, tables, etc.).
    Make no attempt to learn the identity of any person or establishment included in these data.
    Make no disclosure or other use of the identity of any person or establishment discovered inadvertently and advise the NCHS Confidentiality Officer of any such discovery.

    Eve Powell-Griner, Confidentiality Officer
    National Center for Health Statistics
    3311 Toledo Road, Rm 7116
    Hyattsville, MD 20782
    Telephone 301-458-4257 Fax 301-458-4021

    This dataset was created by Health and contains around 800 samples along with Crude Rate, Crude Rate Lower 95% Confidence Interval, technical information and other features such as: - Year - Deaths - and more.

    How to use this dataset

    • Analyze Crude Rate Upper 95% Confidence Interval in relation to Prescriptions Dispensed By Us Retailers In That Year (millions)
    • Study the influence of State on Crude Rate
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Health

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  12. B

    Dataset 2: Interrupted time-series results

    • borealisdata.ca
    Updated Mar 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Global Strategy Lab (2023). Dataset 2: Interrupted time-series results [Dataset]. http://doi.org/10.5683/SP2/PNNQNO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 16, 2023
    Dataset provided by
    Borealis
    Authors
    The Global Strategy Lab
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    All results of the primary interrupted time-series results evaluating targeted and total border closures that met the following criteria: 1) at least seven days of data is available before and after the intervention point, 2) for multiple intervention time series, at least seven days has passed since the last intervention point, and 3) for multiple sequential targeted border closures, the second (or third) intervention is observed to indicate an increase of at least 20% of the world’s population being targeted by the new border closures.

  13. f

    Summary of sacubitril/valsartan AE line listing.

    • figshare.com
    xls
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eszter Palffy; David John Lewis (2024). Summary of sacubitril/valsartan AE line listing. [Dataset]. http://doi.org/10.1371/journal.pone.0295226.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Eszter Palffy; David John Lewis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Patient Support Programmes (PSPs) are used by the pharmaceutical industry to provide education and support to consumers to overcome the challenges they face managing their condition and treatment. Whilst there is an increasing number of PSPs, limited information is available on whether these programmes contribute to safety signals. PSPs do not have a scientific hypothesis, nor are they governed by a protocol. However, by their nature, PSPs inevitably generate adverse event (AE) reports. The main goal of the research was to gather all Novartis-initiated PSPs for sacubitril/valsartan, followed by research in the company safety database to identify all AE reports emanating from these PSPs. Core data sheets (CDS) were reviewed to assess if these PSPs contributed to any new, regulatory-authority approved, validated signals. Overall, AEs entered into the safety database from PSPs confirmed no contribution to CDS updates. Detailed review of real-world data revealed tablet splitting or taking one higher dose tablet a day instead of twice daily. This research, and subsequent analyses, revealed that PSPs did not impact safety label changes for sacubitril/valsartan. It revealed an important finding concerning drug utilisation i.e. splitting of sacubitril/valsartan tablets to reduce cost. This finding suggests that PSPs may contribute important real-world data on patterns of medication usage. There remains a paucity of literature available on this topic, hence further research is required to assess if it would be worth designing PSPs for collecting data on drug utilisation and (lack of) efficacy. Such information from PSPs could be important for all stakeholders.

  14. Drug Discovery Informatics Market Analysis North America, Europe, Asia, Rest...

    • technavio.com
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2022). Drug Discovery Informatics Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, Germany, UK, China, France - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/drug-discovery-informatics-market-industry-analysis
    Explore at:
    Dataset updated
    Feb 23, 2022
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    France, United Kingdom, Germany, United States, Global
    Description

    Snapshot img

    Drug Discovery Informatics Market Size 2024-2028

    The drug discovery informatics market size is forecast to increase by USD 7.29 billion, at a CAGR of 18.17% between 2023 and 2028.

    The market is experiencing significant growth, driven by the increasing R&D investments in the pharmaceutical and biopharmaceutical sectors. The escalating number of clinical trials necessitates advanced informatics solutions to manage and analyze vast amounts of data, thereby fueling market expansion. However, the high setup cost of drug discovery informatics remains a formidable challenge for market entrants, necessitating strategic partnerships and cost optimization measures. Companies seeking to capitalize on this market's potential must address this challenge while staying abreast of evolving technological trends, such as artificial intelligence and machine learning, to streamline drug discovery processes and gain a competitive edge.

    What will be the Size of the Drug Discovery Informatics Market during the forecast period?

    Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
    Request Free SampleThe market is characterized by its continuous and evolving nature, driven by advancements in technology and the increasing complexity of research in the pharmaceutical industry. Drug discovery informatics encompasses various applications, including drug repurposing algorithms, data visualization tools, drug discovery workflows, drug metabolism prediction, and knowledge graph technology. These entities are integrated into comprehensive systems to streamline the drug discovery process. Drug repurposing algorithms leverage historical data to identify new therapeutic applications for existing drugs, while data visualization tools enable researchers to explore large datasets and identify trends. Drug discovery workflows integrate various techniques, such as high-throughput screening data, pharmacophore modeling, and molecular dynamics simulations, to optimize lead compounds. Knowledge graph technology facilitates the integration and analysis of disparate data sources, providing a more holistic understanding of biological systems. Drug metabolism prediction models help researchers assess the potential toxicity and pharmacokinetic properties of compounds, reducing the risk of costly failures in later stages of development. The integration of artificial intelligence applications, such as machine learning algorithms and natural language processing, enhances the capabilities of drug discovery informatics platforms. These technologies enable the analysis of large, complex datasets and the identification of novel patterns and insights. The application of drug discovery informatics extends across various sectors, including biotechnology, pharmaceuticals, and academia, as researchers seek to accelerate the development of new therapeutics and improve the efficiency of the drug discovery process. The ongoing unfolding of market activities and evolving patterns in drug discovery informatics reflect the dynamic nature of this field, as researchers continue to push the boundaries of scientific discovery.

    How is this Drug Discovery Informatics Industry segmented?

    The drug discovery informatics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. ApplicationDiscovery informaticsDevelopment informaticsSolutionSoftwareServicesGeographyNorth AmericaUSEuropeFranceGermanyUKAPACChinaRest of World (ROW)

    By Application Insights

    The discovery informatics segment is estimated to witness significant growth during the forecast period.The drug discovery process is a complex and data-intensive endeavor, involving the identification and validation of potential lead compounds for therapeutic applications. This process encompasses various stages, from target identification to preclinical development. At the forefront of this process, researchers employ diverse technologies to generate leads, such as high-throughput screening, molecular modeling, medicinal chemistry, and structural biology. High-throughput screening enables the rapid identification of compounds that interact with specific targets, while molecular modeling and virtual screening techniques facilitate the prediction of compound-target interactions and the optimization of lead structures. Admet prediction models and in vitro assays help assess the pharmacokinetic properties and toxicity of potential leads, ensuring their safety and efficacy. Compound library management systems enable the organization and retrieval of vast collections of chemical compounds, while structure-activity relationship (SAR) and quantitative structure-activity relationship (QSAR) studies provide insights i

  15. m

    Global Real-world Evidence (RWE) Solutions Market Size, Growth & Trends...

    • meditechinsights.com
    Updated Mar 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Medi-Tech Insights - Medi-Tech (2022). Global Real-world Evidence (RWE) Solutions Market Size, Growth & Trends Report Segmented by Component (Services, Data Sets), Application (Drug Development & Approvals, Medical Device Development & Approvals), End-user, & Regional Forecast to 2030 [Dataset]. https://meditechinsights.com/real-world-evidence-solutions-market/
    Explore at:
    Dataset updated
    Mar 22, 2022
    Dataset authored and provided by
    Medi-Tech Insights - Medi-Tech
    License

    https://meditechinsights.com/privacy-policy/https://meditechinsights.com/privacy-policy/

    Description

    The real-world evidence (RWE) solutions market is expected to expand at a CAGR of ~10% during the forecast period. Key factors driving this growth include increasing regulatory support for RWE adoption, the rising incidence of chronic diseases, increased investment from pharmaceutical companies, the growing focus on personalized medicine and targeted therapies, the widespread adoption of […]

  16. World pharmaceutical sales 2020-2024 by region

    • statista.com
    • ai-chatbox.pro
    Updated Jun 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). World pharmaceutical sales 2020-2024 by region [Dataset]. https://www.statista.com/statistics/272181/world-pharmaceutical-sales-by-region/
    Explore at:
    Dataset updated
    Jun 20, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    World
    Description

    This statistic describes the global pharmaceutical sales in from 2020 to 2024, sorted by regional submarkets. For 2024, total pharmaceutical sales in the United States was estimated to reach around *** billion U.S. dollars. World pharmaceutical sales by regionThe pharmaceutical industry is best known for manufacturing pharmaceutical drugs which aim to diagnose, cure, treat, or prevent diseases. The pharmaceutical sector represents a huge industry, with the global market being worth around *** trillion U.S. dollars. Among the best known top global pharmaceutical companies are Pfizer, Merck and Johnson & Johnson from the U.S., Novartis and Roche from Switzerland, Sanofi from France, etc. Accordingly, North America and Europe are still among the largest global submarkets for pharmaceuticals. In 2024, the United States was still the largest single pharmaceutical market, generating more than *** billion U.S. dollars of revenue. Europe was responsible for generating around *** billion U.S. dollars. These two markets, together with Japan, Canada and Australia, form the so-called established (or developed) markets. The rest of the global pharmaceutical revenue is mainly from emerging markets, which include countries like China, Russia, Brazil and India. In fact, these emerging markets show the fastest increase in pharmaceutical sales. Latin America is the world region with the highest predicted compound annual growth rate until 2028.

  17. GlobalEssentialMedicinesDatabase.xlsx

    • figshare.com
    xlsx
    Updated Mar 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nav Persaud; Maggie Jiang; Roha Shaikh; Anjli Bali; Efosa Oronsaye; Hannah Woods; Gregory Drozdzal; Yathavan Rajakulasingam; Darshanand Maraj; Sapna Wadhawan; Norman Umali; Ri Wang; Marcy McCall; Jeffrey K Aronson; Annette Plüddemann; Lorenzo Moja; Nicola Magrini; Carl Heneghan (2019). GlobalEssentialMedicinesDatabase.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7814246.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 7, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Nav Persaud; Maggie Jiang; Roha Shaikh; Anjli Bali; Efosa Oronsaye; Hannah Woods; Gregory Drozdzal; Yathavan Rajakulasingam; Darshanand Maraj; Sapna Wadhawan; Norman Umali; Ri Wang; Marcy McCall; Jeffrey K Aronson; Annette Plüddemann; Lorenzo Moja; Nicola Magrini; Carl Heneghan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global Essential Medicines Database

    In June of 2017, we searched the WHO Essential Medicines and Health Products Information Portal, an online repository that contains hundreds of publication on medicines and health products related to WHO priorities, and a full-section dedicated to national essential medicines lists (EMLs). A WHO information specialist actively searched for updated versions of national EMLs, including national formularies, reimbursement lists, and lists based on standard treatment guidelines.

    We included all national EMLs that were posted on the WHO’s NEMLs Repository irrespective of publication date and language. When we found more than one national EML from the same country, we used the most recent. We excluded documents that were not EMLs, such as prescribing guidelines. We also included the 20th edition of the WHO Model EML (2017) in this database.

    From each EML we abstracted medicines using International Nonproprietary Names (INNs). For medicines whose names were not in English we used the Anatomical Therapeutic Chemical (ATC) classification system, if available, or translated the names with the help of Google Translate. We listed each medicine individually, whether it was part of a combination product or not. We treated as the same medicine bases and their salts (e.g. promethazine hydrochloride and promethazine) as well as different compounds of the same vitamin or mineral (e.g. ferrous fumarate and ferrous sulfate). We excluded diagnostic agents, antiseptics, disinfectants, and saline solutions.

    In this database "1" and "0" indicate the presence or absence of the medicine respectively on an EML.

  18. Prescription Drugs Introduced to the Market

    • kaggle.com
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishi Damarla (2020). Prescription Drugs Introduced to the Market [Dataset]. https://www.kaggle.com/rishidamarla/prescription-drugs-introduced-to-the-market/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rishi Damarla
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Many drugs are introduced to the market for commercial and household use each year. Thus it is important to know the characteristics of these drugs.

    Content

    In this dataset you'll find info from hundreds of drugs that were introduced in 2019.

    Acknowledgements

    This data comes from https://data.world/chhs/e54d331c-65d3-4c6e-b4ba-390bd7024248.

  19. NRM2018 PET Grand Challenge Dataset

    • openneuro.org
    Updated Jun 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mattia Veronese; Gaia Rizzo; Martin Belzunce; Julia Schubert; Barbara Santangelo; Ayla Mansur; Alex Whittington; Joel Dunn; Graham Searle; Andrew Reader; Roger Gunn (2021). NRM2018 PET Grand Challenge Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds001705.v1.0.1
    Explore at:
    Dataset updated
    Jun 1, 2021
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Mattia Veronese; Gaia Rizzo; Martin Belzunce; Julia Schubert; Barbara Santangelo; Ayla Mansur; Alex Whittington; Joel Dunn; Graham Searle; Andrew Reader; Roger Gunn
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    == Introdution ==

    For many years PET centres around the world have developed and optimised their own analysis pipelines, including a mixture of in-house and independent software, and have implemented different modelling choices for PET image processing and data quantification. As a result, many different methods and tools are available for PET image analysis.

    == Aim of the dataset ==

    This dataset aims to provide a normative tool to assess the performance and consistency of PET modelling approaches on the same data for which the ground truth is known. It was created and released for the NRM2018 PET Grand Challenge. The challenge aimed at evaluating the performances of different PET analysis tools to identify areas and magnitude of receptor binding changes in a PET radioligand neurotransmission study.

    The present dataset refers to 5 simulated human subjects scanned twice. For each subject the first PET scan (ses-baseline) represents baseline conditions; the second scan (ses-displaced) represents the scan after a pharmacological challenge in which the tracer binding has been displaced in certain regions of interest. A total of 10 dynamic scans are provided in the current dataset.

    The nature of the neuroreceptor tracer used for the simulation (hereafter referred to as [11C]LondonPride) wants to be as general as possible. Any similarity to real PET tracer uptake is purely coincidental. Each simulated scan consists of a 90 minutes dynamic PET acquisition after bolus tracer injection as obtained with a Siemens Biograph mMR PET/MR scanner. The data were simulated including attenuation, randoms and scatters effects, the decay of the radiotracer and considering the geometry and resolution of the scanner. PET data can be considered motion-free as no motion or motion-related artifacts are included in the simulated dataset. The data were binned into 23 frames: 4×15 s, 4×60 s, 2×150 s, 10×300 s and 3×600 s. Each frame was reconstructed with the MLEM algorithm with 100 iterations. The reconstructed images available in the dataset are already decay corrected.

    All provided PET images are already normalised in standard MNI space (182x218x182 – 1mm).

    == Data simulation process ==

    For the simulation of each of the 10 scans (5 patients, 2 scans each), time activity curves (TACs) for each voxel of the phantom were generated from the kinetic parameters using the 2TCM equations. The TACs had a resolution of 1 sec and included the effect of the radiotracer decay, which was simulated with a half-life of 20.34 min (11C half-life). Each voxel TAC was binned with the following framing: 4×15 s, 4×60 s, 2×150 s, 10×300 s and 3×600 s by using the mean activity value for each time frame. After this process, the dynamic phantom for each scan is ready to be used in the simulation of each scan. The phantoms had the same resolution as the parametric maps (1×1×1 mm^3).

    Each scan was simulated with a total of 3×10e8 counts and by modelling the different physical effects of a PET acquisition. For each frame of a scan, the phantom was smoothed with a 2.5 mm FWHM kernel (lower than the spatial resolution of the mMR scanner since the phantom was already low resolution) and projected into a span 11 sinogram using the mMR scanner geometry. Then the resulting sinograms were multiplied by the attenuation factors, obtained from an attenuation map generated from the CT image of the patient, and by the normalization factors of the mMR scanner. Next, Poisson noise was introduced by simulating a random process for every sinogram bin, obtaining the sinogram with true events. A uniform sinogram multiplied by the normalization factors was used for the randoms and a smoothed version of the emission sinogram for the scatters, which were scaled in order to have 20% of randoms and 25% of scatters of the total counts. Poisson noise was introduced to randoms and scatters and added to the trues sinogram. Finally, each frame was individually reconstructed using the MLEM algorithm with 100 iterations, a 2.5 mm PSF and the standard mMR voxel size (2.09x2.09x2.03 mm3). The reconstructed images were corrected for the activity decay and resampled into the original MNI space. For the simulation and reconstruction, an in-house reconstruction framework was used (Belzunce and Reader 2017).

    == Simulated Drug ===

    The pharmacological challenge given to the subjects before the second scan (ses-displaced) is based, as is the tracer, on a simulated drug . Any similarity with existing drugs is purely coincidental. The drug has competitive binding to the radiotracer target and has no secondary affinities. The drug is simulated as given as a single oral bolus 30 min prior to the scan.

    == Additional data in the folder ===

    Along with the raw data, some additional derivatives data are provided. This data are 6 regions of displacements helpful for the quantification and analysis. Six regions of displacement have been manually generated (using ITKSnap) and applied consistently to all the subjects to generate displaced 𝑘3 parametric maps. Based on the neuroreceptor theory (Innis, Cunningham et al. 2007), any change in 𝑘3 would produce an equivalent change in BPnd. The regions volumes of the regions ranged from 343mm3 to 2275mm3 and were selected to be in regions of higher tracer uptake at baseline. None of the displacement ROIs has a purely geometrical (e.g. cube or sphere) or anatomical shape. The regions have been created to represent different sizes and different levels of tracer displacement according to the following values:

    +----- ROI -----+----- Volume(mm^3) -----+----- Displacement (%) -----+
    |   ROI1   |    2555       |     27        |
    |   ROI2   |    2275       |     27        |
    |   ROI3   |    1152       |     21        |
    |   ROI4   |    493       |     18        |
    |   ROI5   |    343       |     18        |
    |   ROI6   |    418       |     18        |
    +---------------+------------------------+----------------------------+
    

    The ROIs are not symmetrically spatially distributed across the brain. A definintion of the ROI name can be found in the accompaning dseg.tsv file.

    == References == - Belzunce, M. A. and A. J. Reader (2017). "Assessment of the impact of modeling axial compression on PET image reconstruction." Medical physics 44(10): 5172-5186. - Innis, R. B., V. J. Cunningham, J. Delforge, M. Fujita, A. Gjedde, R. N. Gunn, J. Holden, S. Houle, S. C. Huang, M. Ichise, H. Iida, H. Ito, Y. Kimura, R. A. Koeppe, G. M. Knudsen, J. Knuuti, A. A. Lammertsma, M. Laruelle, J. Logan, R. P. Maguire, M. A. Mintun, E. D. Morris, R. Parsey, J. C. Price, M. Slifstein, V. Sossi, T. Suhara, J. R. Votaw, D. F. Wong and R. E. Carson (2007). "Consensus nomenclature for in vivo imaging of reversibly binding radioligands." J Cereb Blood Flow Metab 27(9): 1533-1539.

    == Appendix: Current Folder Contents ==

    ├── CHANGES ├── LICENSE ├── README ├── dataset_description.json ├── derivatives │ └── masks │ ├── dseg.tsv │ ├── sub-000101 │ │ ├── ses-baseline │ │ │ └── sub-000101_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000101_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000102 │ │ ├── ses-baseline │ │ │ └── sub-000102_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000102_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000103 │ │ ├── ses-baseline │ │ │ └── sub-000103_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000103_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000104 │ │ ├── ses-baseline │ │ │ └── sub-000104_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000104_ses-displaced_label-displacementROI_dseg.nii.gz │ └── sub-000105 │ ├── ses-baseline │ │ └── sub-000105_ses-baseline_label-displacementROI_dseg.nii.gz │ └── ses-displaced │ └── sub-000105_ses-displaced_label-displacementROI_dseg.nii.gz ├── participants.json ├── participants.tsv ├── sub-000101 │ ├── ses-baseline │ │ ├── anat │ │ │ ├── sub-000101_ses-baseline_acq-T1w.json │ │ │ └── sub-000101_ses-baseline_acq-T1w.nii.gz │ │ └── pet │ │ ├── sub-000101_ses-baseline_rec-MLEM_pet.json │ │ └── sub-000101_ses-baseline_rec-MLEM_pet.nii.gz │ └── ses-displaced │ ├── anat │ │ ├── sub-000101_ses-displaced_acq-T1w.json │ │ └── sub-000101_ses-displaced_acq-T1w.nii.gz │ └── pet │ ├── sub-000101_ses-displaced_rec-MLEM_pet.json │ └── sub-000101_ses-displaced_rec-MLEM_pet.nii.gz ├── sub-000102 │ ├── ses-baseline │ │ ├── anat │ │ │ ├── sub-000102_ses-baseline_acq-T1w.json │ │ │ └── sub-000102_ses-baseline_acq-T1w.nii.gz │ │ └── pet │ │ ├── sub-000102_ses-baseline_rec-MLEM_pet.json │ │ └── sub-000102_ses-baseline_rec-MLEM_pet.nii.gz │ └── ses-displaced │ ├── anat │ │ ├── sub-000102_ses-displaced_acq-T1w.json │ │ └──

  20. Drug Indications (Drug Engineering with AI)

    • kaggle.com
    Updated Feb 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepak Deepu (2020). Drug Indications (Drug Engineering with AI) [Dataset]. https://www.kaggle.com/datasets/deepakdeepu8978/drug-indications-drug-engineering-with-ai/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Deepak Deepu
    Description

    Context

    In health care, two exciting uses of artificial intelligence — in the clinic for patient care and in the laboratory for drug discovery are remarkably different applications. That perhaps explains why, though it’s still early days for both, they are developing at different rates and now It is possible today to generate a Novel Drug on your own laptop before this would like take millions of dollars and now all you need is an Internet connection and a laptop .first all of all startups until coma dicin used AI to design a drug in 21 days that is Unprecedented that is unheard of the whole R&D and preclinical trial process to create a drug at least two years generally this would take in 1 days this called virtual screening that is the technical term for this in the pharmaceutical industry and now we can use this model deep learning of course deep reasons reinforcement learning and self mapping too

    Content

    The opportunity is equally compelling in drug discovery, particularly in areas of high unmet need such as rare and hard-to-treat cancers and neurodegenerative conditions. Artificial intelligence can ingest and reason over information from the scientific literature and databases, as well as patient-level data, to identify potential approaches to treat diseases by proposing a drug target, designing a molecule, and defining patients in which to test that molecule to drive greater clinical success.

    Here in this data set consists of physical and chemical properties of drugs with there names .This dataset is a lightly cleaned-up version of the non-proprietary version of the Drug Information Database . Some duplicate rows were removed, and column headers were renamed for brevity.

    The data is available from Feb 18, 2020.

    Column Description and Abbreviations

    AMA: American Medical Association BAN: British Approved Name BT: broader term CAS number or CAS#: Chemical Abstracts Service Registry Number ChEBI: Chemical Entities of Biological Interest CTD: Comparative Toxicogenomics Database CUI: Concept Unique Identifier [UMLS] DB: database DID: Drug-Indication Database eVOC: electronic VOCabularies [Merck internal system] FDA: U.S. Food and Drug Administration GN: generic [drug] name GO: Gene Ontology InChI: International Chemical Identifier MedDRA: Medical Dictionary for Reporting Activities MeSH PA: Medical Subject Headings Pharmacological Action [relations] MeSH: Medical Subject Headings NDFRT: U.S. National Drug Formulary Reference Terminology NLM: U.S. National Library of Medicine NLP: natural language processing NT: narrower term OBO: Open Biological & Biomedical Ontologies OTC: over-the-counter [drugs] PDR: Physicians’ Desk Reference PT: preferred term SNOMEDCT: Systematized NOmenclature of MEDicine Clinical Terminology TR: terminological reduction UMLS: Unified Medical Language System UNII: UNique Ingredient Identifier USAN TC: United States Adopted Names Therapeutic Claim USAN: United States Adopted Names USP: United States Pharmacopeia UTS: UMLS Terminology Services WHO-ATC: World Health Organization Anatomic-Therapeutic-Chemical [classification] WHO-DD: World Health Organization Drug Dictionary

    Acknowledgements

    one interesting Article: Toward a comprehensive drug ontology: extraction of drug-indication relations from diverse information sources

    Inspiration

    Despite the potential of artificial intelligence to identify new targets for disease faster, at lower cost, and with lower failure rates, adoption of this technology is still low. Trust has a significant role to play in that :)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jessica Li (2018). UCI ML Drug Review dataset [Dataset]. https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018/home
Organization logo

UCI ML Drug Review dataset

Over 200,000 patient drug reviews

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 13, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jessica Li
Description

This dataset was used for the Winter 2018 Kaggle University Club Hackathon and is now publicly available. See Acknowledgments section for citation and licensing. Note: The types of data and recommendation based solutions provided by the contestants are purely for NLP learning purposes. They are not suitable for a real world drug recommendations solutions.

Welcome to the Kaggle University Club Hackathon!

If you are interested in joining Kaggle University Club, please e-mail Jessica Li at lijessica@google.com

This Hackathon is open to all undergraduate, master, and PhD students who are part of the Kaggle University Club program. The Hackathon provides students with a chance to build capacity via hands-on ML, learn from one another, and engage in a self-defined project that is meaningful to their careers.

Teams must register via Google Form to be eligible for the Hackathon. The Hackathon starts on Monday, November 12, 2018 and ends on Monday, December 10, 2018. Teams have one month to work on a team submission. Teams must do all work within the Kernel editor and set Kernel(s) to public at all times.

Prompt

The freestyle format of hackathons has time and again stimulated groundbreaking and innovative data insights and technologies. The Kaggle University Club Hackathon recreates this environment virtually on our platform. We challenge you to build a meaningful project around the UCI Machine Learning - Drug Review Dataset. Teams are free to let their creativity run and propose methods to analyze this dataset and form interesting machine learning models.

Machine learning has permeated nearly all fields and disciplines of study. One hot topic is using natural language processing and sentiment analysis to identify, extract, and make use of subjective information. The UCI ML Drug Review dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites. This data was published in a study on sentiment analysis of drug experience over multiple facets, ex. sentiments learned on specific aspects such as effectiveness and side effects (see the acknowledgments section to learn more).

The sky's the limit here in terms of what your team can do! Teams are free to add supplementary datasets in conjunction with the drug review dataset in their Kernel. Discussion is highly encouraged within the forum and Slack so everyone can learn from their peers.

Here are just a couple ideas as to what you could do with the data:

  • Classification: Can you predict the patient's condition based on the review?
  • Regression: Can you predict the rating of the drug based on the review?
  • Sentiment analysis: What elements of a review make it more helpful to others? Which patients tend to have more negative reviews? Can you determine if a review is positive, neutral, or negative?
  • Data visualizations: What kind of drugs are there? What sorts of conditions do these patients have?

Top Submissions

There is no one correct answer to this Hackathon, and teams are free to define the direction of their own project. That being said, there are certain core elements generally found across all outstanding Kernels on the Kaggle platform. The best Kernels are:

  1. Complex: How many domains of analysis and topics does this Kernel cover? Does it attempt machine learning methods? Does the Kernel offer a variety of unique analyses and interesting conclusions or solutions?
  2. Original: What is the subject matter of this Kernel? Does it have a well-defined and interesting project scope, narrative or problem? Could the results make an impact? Is it thought provoking?
  3. Approachable: How easy is it to understand this Kernel? Are all thought processes clear? Is the code clean, with useful comments? Are visualizations and processes articulated and self-explanatory?

Teams with top submissions have a chance to receive exclusive Kaggle University Club swag and be featured on our official blog and across social media.

IMPORTANT: Teams must set all Kernels to public at all times. This is so we can track each team's progression, but more importantly it encourages collaboration, productive discussion, and healthy inspiration to all teams. It is not so that teams can simply copycat good ideas. If a team's Kernel isn't their own organic work, it will not be considered a top submission. Teams must come up with a project on their own.

Submission Styling

The final Kernel submission for the Hackathon must contain the following information:

  • All team members added as collaborators to the Kernel
  • Somewhere at the top of your Kernel, find a space to put down all team member names, university name, club name, and team name (as specified whe...
Search
Clear search
Close search
Google apps
Main menu