If you are interested in joining Kaggle University Club, please e-mail Jessica Li at lijessica@google.com
This Hackathon is open to all undergraduate, master, and PhD students who are part of the Kaggle University Club program. The Hackathon provides students with a chance to build capacity via hands-on ML, learn from one another, and engage in a self-defined project that is meaningful to their careers.
Teams must register via Google Form to be eligible for the Hackathon. The Hackathon starts on Monday, November 12, 2018 and ends on Monday, December 10, 2018. Teams have one month to work on a team submission. Teams must do all work within the Kernel editor and set Kernel(s) to public at all times.
The freestyle format of hackathons has time and again stimulated groundbreaking and innovative data insights and technologies. The Kaggle University Club Hackathon recreates this environment virtually on our platform. We challenge you to build a meaningful project around the UCI Machine Learning - Drug Review Dataset. Teams are free to let their creativity run and propose methods to analyze this dataset and form interesting machine learning models.
Machine learning has permeated nearly all fields and disciplines of study. One hot topic is using natural language processing and sentiment analysis to identify, extract, and make use of subjective information. The UCI ML Drug Review dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites. This data was published in a study on sentiment analysis of drug experience over multiple facets, ex. sentiments learned on specific aspects such as effectiveness and side effects (see the acknowledgments section to learn more).
The sky's the limit here in terms of what your team can do! Teams are free to add supplementary datasets in conjunction with the drug review dataset in their Kernel. Discussion is highly encouraged within the forum and Slack so everyone can learn from their peers.
Here are just a couple ideas as to what you could do with the data:
There is no one correct answer to this Hackathon, and teams are free to define the direction of their own project. That being said, there are certain core elements generally found across all outstanding Kernels on the Kaggle platform. The best Kernels are:
Teams with top submissions have a chance to receive exclusive Kaggle University Club swag and be featured on our official blog and across social media.
IMPORTANT: Teams must set all Kernels to public at all times. This is so we can track each team's progression, but more importantly it encourages collaboration, productive discussion, and healthy inspiration to all teams. It is not so that teams can simply copycat good ideas. If a team's Kernel isn't their own organic work, it will not be considered a top submission. Teams must come up with a project on their own.
The final Kernel submission for the Hackathon must contain the following information:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed information on various drugs used for a multitude of medical conditions such as acne, cancer, and heart disease. It includes essential details about drug efficacy based on user ratings and experiences, as well as specific information on side effects. The dataset aims to offer insights into how different medications are perceived by users concerning their effectiveness, considering both positive and adverse effects.
The dataset is typically provided in a CSV file format. While specific total row/record counts are not explicitly stated, the presence of 2912 unique review counts and a wide range of ratings suggest a substantial number of entries. The data appears to be structured in a tabular manner.
This dataset is ideal for: * Analysing drug efficacy based on real-world user feedback. * Researching user experiences with various medications. * Developing applications related to health information systems. * Performing Natural Language Processing (NLP) on drug descriptions and reviews to extract insights. * Understanding the landscape of prescription (Rx) versus over-the-counter (OTC) medications.
The dataset's coverage is global, making it relevant for a worldwide audience. It was listed on 11th June 2025. There are no specific notes on demographic scope or data availability for certain groups or years explicitly mentioned.
CCO
This dataset is suitable for: * Healthcare Professionals: To gain insights into patient experiences and drug effectiveness. * Researchers: For studies on pharmacology, public health, and patient outcomes. * Data Analysts: To identify trends and patterns in drug usage and side effects. * Software Developers: For building health-related applications, AI models, or recommendation systems. * Patients/Consumers: To inform decisions about medications based on aggregated user experiences.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numerous studies on medicines are conducted day by day. To address shortcomings of medicines information generation, prediction, and classification models, the authors introduce a large medicines information dataset of textual data. For this motivation, the authors named the medicines information dataset ‘MID’ .
• Value of the data - The dataset comprises extensive medicines information, featuring over 192k rows distributed across 22 diverse therapeutic classes. - The dataset can be beneficial to the classification of therapeutic classes and robust for the prediction and generation of medicines information such as indications or interactions for enhancing efficiencies in clinical trial management, facilitating a detailed analysis of the risk affecting participants in clinical trials. - The dataset includes the name, link, contains, introduction, uses, benefits, side effects, how to use, how the drug works, quick tips, chemical class, habit forming, therapeutic class, action class, safety advice to alcohol, safety advice to pregnancy, safety advice to breastfeeding, safety advice to driving, safety advice to kidney, and safety advice to the liver. - The dataset is big data, making it a suitable corpus for implementing both classical as well as deep learning models. - The dataset provides a useful resource for medical researchers, healthcare professionals, drug manufacturers, data scientists, and enthusiasts interested in exploring the world of medicines and healthcare products preclinical for drug development and design.
• MID.xlsx provides the raw data, including medicine information. The data collected to ensure an acceleration and save experimental efforts for medicines through help in predicting or generating or classifying of medicine information preclinically.
• Therapeutic_class_counts.xlsx is summarize distribution of medicines per therapeutic class.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Polypharmacy is increasingly common in the United States, and contributes to the substantial burden of drug-related morbidity. Yet real-world polypharmacy patterns remain poorly characterized. We have counted the incidence of multi-drug combinations observed in four billion patient-months of outpatient prescription drug claims from 2007-2014 in the Truven Health MarketScan® Databases. Prescriptions are grouped into discrete windows of concomitant drug exposure, which are used to count exposure incidences for combinations of up to five drug ingredients or ATC drug classes. Among patients taking any prescription drug, half are exposed to two or more drugs, and 5% are exposed to 8 or more. The most common multi-drug combinations treat manifestations of metabolic syndrome. Patients are exposed to unique drug combinations in 10% of all exposure windows. Our analysis of multi-drug exposure incidences provides a detailed summary of polypharmacy in a large US cohort, which can prioritize common drug combinations for future safety and efficacy studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘wholesale vs retail drugs’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ramjasmaurya/wholesale-vs-retail-drugs-price-and-purity on 28 January 2022.
--- Dataset description provided by original source is as follows ---
the Dataset has 3 files all have full details of illegal drugs sold in and around the world. dataset has 1 file is .xlsv format dataset has 2 file is .xlsv format dataset has 3 file is .csv format
all consist of columns related to price and drug purity according to their wholesale and retail price. Thanks, have a GREAT DAY OR NIGHT. KEEP UPVOTING.........................................................................................................
--- Original source retains full ownership of the source dataset ---
The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6712 drug entries including 1448 FDA-approved small molecule drugs, 131 FDA-approved biotech (protein/peptide) drugs, 85 nutraceuticals and 5080 experimental drugs. Additionally, 4227 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. DrugBank is supported by David Wishart, Departments of Computing Science X Biological Sciences, University of Alberta. DrugBank is also supported by The Metabolomics Innovation Centre, a Genome Canada-funded core facility serving the scientific community and industry with world-class expertise and cutting-edge technologies in metabolomics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Pharmaceuticals and Medical Devices Agency (PMDA) has conducted many pharmacoepidemiological studies for postmarketing drug safety assessments based on real-world data from medical information databases. One of these databases is the National Database of Health Insurance Claims and Specific Health Checkups of Japan (NDB), containing health insurance claims of almost all Japanese individuals (over 100 million) since April 2009. This article describes the PMDA’s regulatory experiences in utilizing the NDB for postmarketing drug safety assessment, especially focusing on the recent cases of use of the NDB to examine the practical utilization and safety signal of a drug. The studies helped support regulatory decision-making for postmarketing drug safety, such as considering a revision of prescribing information of a drug, confirming the appropriateness of safety measures, and checking safety signals in real-world situations. Different characteristics between the NDB and the MID-NET® (another database in Japan) were also discussed for appropriate selection of data source for drug safety assessment. Accumulated experiences of pharmacoepidemiological studies based on real-world data for postmarketing drug safety assessment will contribute to evolving regulatory decision-making based on real-world data in Japan.
The VA Drug Pricing database contains the current prices for pharmaceuticals purchased by the federal government. These listed prices are based on the Federal Supply Schedule (FSS). This database is mandated by Public Law 102-585, the Veterans Health Care Act of 1992, which sets the maximum amount that a drug may be bought for by the Veterans Health Administration (VHA). The source of this information is contained in printed contracts or data files supplied by the drug manufacturers, representing the pricing agreements between VHA and the manufacturers. Price data is input by the National Acquisition Center (NAC) into the database administered by the Pharmacy Benefits Management Strategic Health Care Group. Information from this database is published on the World Wide Web at the following site: http://www.pbm.va.gov. The users of this database include pharmaceutical manufacturers, drug wholesalers, Office of Inspector General (OIG) and those who purchase pharmaceuticals for the VHA and other government agencies.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Real World Evidence Solutions Market size was valued at USD 1.30 Billion in 2024 and is projected to reach USD 3.71 Billion by 2031, growing at a CAGR of 13.92% during the forecast period 2024-2031.
Global Real World Evidence Solutions Market Drivers
The market drivers for the Real World Evidence Solutions Market can be influenced by various factors. These may include:
Growing Need for Evidence-Based Healthcare: Real-world evidence (RWE) is becoming more and more important in healthcare decision-making, according to stakeholders such as payers, providers, and regulators. In addition to traditional clinical trial data, RWE solutions offer important insights into the efficacy, safety, and value of healthcare interventions in real-world situations. Growing Use of RWE by Pharmaceutical Companies: RWE solutions are being used by pharmaceutical companies to assist with market entry, post-marketing surveillance, and drug development initiatives. Pharmaceutical businesses can find new indications for their current medications, improve clinical trial designs, and convince payers and providers of the worth of their products with the use of RWE. Increasing Priority for Value-Based Healthcare: The emphasis on proving the cost- and benefit-effectiveness of healthcare interventions in real-world settings is growing as value-based healthcare models gain traction. To assist value-based decision-making, RWE solutions are essential in evaluating the economic effect and real-world consequences of healthcare interventions. Technological and Data Analytics Advancements: RWE solutions are becoming more capable due to advances in machine learning, artificial intelligence, and big data analytics. With the use of these technologies, healthcare stakeholders can obtain actionable insights from the analysis of vast and varied datasets, including patient-generated data, claims data, and electronic health records. Regulatory Support for RWE Integration: RWE is being progressively integrated into regulatory decision-making processes by regulatory organisations including the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA). The FDA's Real-World Evidence Programme and the EMA's Adaptive Pathways and PRIority MEdicines (PRIME) programme are two examples of initiatives that are making it easier to incorporate RWE into regulatory submissions and drug development. Increasing Emphasis on Patient-Centric Healthcare: The value of patient-reported outcomes and real-world experiences in healthcare decision-making is becoming more widely acknowledged. RWE technologies facilitate the collection and examination of patient-centered data, offering valuable insights into treatment efficacy, patient inclinations, and quality of life consequences. Extension of RWE Use Cases: RWE solutions are being used in medication development, post-market surveillance, health economics and outcomes research (HEOR), comparative effectiveness research, and market access, among other healthcare fields. The necessity for a variety of RWE solutions catered to the needs of different stakeholders is being driven by the expansion of RWE use cases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘💉 Opioid Overdose Deaths’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/opioid-overdose-deathse on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Opioid addiction and death rates in the U.S. and abroad have reached "epidemic" levels. The CDC's data reflects the incredible spike in overdoses caused by drugs containing opioids.
The United States is experiencing an epidemic of drug overdose (poisoning) deaths. Since 2000, the rate of deaths from drug overdoses has increased 137%, including a 200% increase in the rate of overdose deaths involving opioids (opioid pain relievers and heroin). Source: CDC
In-the-News
:
- STAT: 26 overdoses in just hours: Inside a community on the front lines of the opioid epidemic
- NPR: Organ Donations Spike In The Wake Of The Opioid Epidemic, Deadly Opioid Overwhelms First Responders And Crime Labs in Ohio
- Scientific American: Wave of Overdoses with Little-Known Drug Raises Alarm Amid Opioid Crisis
- Washington Post: A 7-year-old told her bus driver she couldn’t wake her parents. Police found them dead at home.
- Wall Street Journal: For Small-Town Cops, Opioid Scourge Hits Close to Home
- Food & Drug Administration: FDA launches competition to spur innovative technologies to help reduce opioid overdose deaths
This data was compiled using the CDC's WONDER database. Opioid overdose deaths are defined as: deaths in which the underlying cause was drug overdose, and the ICD-10 code used was any of the following: T40.0 (Opium), T40.1 (Heroin), T40.2 (Other opioids), T40.3 (Methadone), T40.4 (Other synthetic narcotics), T40.6 (Other and unspecified narcotics).
Age-adjusted rate of drug overdose deaths and drug overdose deaths involving opioids
http://i.imgur.com/ObpzUKq.gif" alt="Opioid Death Rate" style="">
Source: CDCWhat are opioids?
Opioids are substances that act on opioid receptors to produce morphine-like effects. Opioids are most often used medically to relieve pain. Opioids include opiates, an older term that refers to such drugs derived from opium, including morphine itself. Other opioids are semi-synthetic and synthetic drugs such as hydrocodone, oxycodone and fentanyl; antagonist drugs such as naloxone and endogenous peptides such as the endorphins.[4] The terms opiate and narcotic are sometimes encountered as synonyms for opioid. Source: Wikipedia
contributors-wanted
See comment in DiscussionFootnotes
- The crude rate is per 100,000.
- Certain totals are hidden due to suppression constraints. More Information: http://wonder.cdc.gov/wonder/help/faq.html#Privacy.
- The population figures are briged-race estimates. The exceptions being years 2000 and 2010, in which Census counts are used.
- v1.1: Added Opioid Prescriptions Dispensed by US Retailers in that year (millions).
Citation: Centers for Disease Control and Prevention, National Center for Health Statistics. Multiple Cause of Death 1999-2014 on CDC WONDER Online Database, released 2015. Data are from the Multiple Cause of Death Files, 1999-2014, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/mcd-icd10.html on Oct 19, 2016 2:06:38 PM.
Citation for Opioid Prescription Data: IMS Health, Vector One: National, years 1991-1996, Data Extracted 2011. IMS Health, National Prescription Audit, years 1997-2013, Data Extracted 2014. Accessed at NIDA article linked (Figure 1) on Oct 23, 2016.
Data Use Restrictions:
The Public Health Service Act (42 U.S.C. 242m(d)) provides that the data collected by the National Center for Health Statistics (NCHS) may be used only for the purpose for which they were obtained; any effort to determine the identity of any reported cases, or to use the information for any purpose other than for health statistical reporting and analysis, is against the law. Therefore users will:
Use these data for health statistical reporting and analysis only.
For sub-national geography, do not present or publish death counts of 9 or fewer or death rates based on counts of nine or fewer (in figures, graphs, maps, tables, etc.).
Make no attempt to learn the identity of any person or establishment included in these data.
Make no disclosure or other use of the identity of any person or establishment discovered inadvertently and advise the NCHS Confidentiality Officer of any such discovery.
Eve Powell-Griner, Confidentiality Officer
National Center for Health Statistics
3311 Toledo Road, Rm 7116
Hyattsville, MD 20782
Telephone 301-458-4257 Fax 301-458-4021This dataset was created by Health and contains around 800 samples along with Crude Rate, Crude Rate Lower 95% Confidence Interval, technical information and other features such as: - Year - Deaths - and more.
- Analyze Crude Rate Upper 95% Confidence Interval in relation to Prescriptions Dispensed By Us Retailers In That Year (millions)
- Study the influence of State on Crude Rate
- More datasets
If you use this dataset in your research, please credit Health
--- Original source retains full ownership of the source dataset ---
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All results of the primary interrupted time-series results evaluating targeted and total border closures that met the following criteria: 1) at least seven days of data is available before and after the intervention point, 2) for multiple intervention time series, at least seven days has passed since the last intervention point, and 3) for multiple sequential targeted border closures, the second (or third) intervention is observed to indicate an increase of at least 20% of the world’s population being targeted by the new border closures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Patient Support Programmes (PSPs) are used by the pharmaceutical industry to provide education and support to consumers to overcome the challenges they face managing their condition and treatment. Whilst there is an increasing number of PSPs, limited information is available on whether these programmes contribute to safety signals. PSPs do not have a scientific hypothesis, nor are they governed by a protocol. However, by their nature, PSPs inevitably generate adverse event (AE) reports. The main goal of the research was to gather all Novartis-initiated PSPs for sacubitril/valsartan, followed by research in the company safety database to identify all AE reports emanating from these PSPs. Core data sheets (CDS) were reviewed to assess if these PSPs contributed to any new, regulatory-authority approved, validated signals. Overall, AEs entered into the safety database from PSPs confirmed no contribution to CDS updates. Detailed review of real-world data revealed tablet splitting or taking one higher dose tablet a day instead of twice daily. This research, and subsequent analyses, revealed that PSPs did not impact safety label changes for sacubitril/valsartan. It revealed an important finding concerning drug utilisation i.e. splitting of sacubitril/valsartan tablets to reduce cost. This finding suggests that PSPs may contribute important real-world data on patterns of medication usage. There remains a paucity of literature available on this topic, hence further research is required to assess if it would be worth designing PSPs for collecting data on drug utilisation and (lack of) efficacy. Such information from PSPs could be important for all stakeholders.
Drug Discovery Informatics Market Size 2024-2028
The drug discovery informatics market size is forecast to increase by USD 7.29 billion, at a CAGR of 18.17% between 2023 and 2028.
The market is experiencing significant growth, driven by the increasing R&D investments in the pharmaceutical and biopharmaceutical sectors. The escalating number of clinical trials necessitates advanced informatics solutions to manage and analyze vast amounts of data, thereby fueling market expansion. However, the high setup cost of drug discovery informatics remains a formidable challenge for market entrants, necessitating strategic partnerships and cost optimization measures. Companies seeking to capitalize on this market's potential must address this challenge while staying abreast of evolving technological trends, such as artificial intelligence and machine learning, to streamline drug discovery processes and gain a competitive edge.
What will be the Size of the Drug Discovery Informatics Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2018-2022 and forecasts 2024-2028 - in the full report.
Request Free SampleThe market is characterized by its continuous and evolving nature, driven by advancements in technology and the increasing complexity of research in the pharmaceutical industry. Drug discovery informatics encompasses various applications, including drug repurposing algorithms, data visualization tools, drug discovery workflows, drug metabolism prediction, and knowledge graph technology. These entities are integrated into comprehensive systems to streamline the drug discovery process. Drug repurposing algorithms leverage historical data to identify new therapeutic applications for existing drugs, while data visualization tools enable researchers to explore large datasets and identify trends. Drug discovery workflows integrate various techniques, such as high-throughput screening data, pharmacophore modeling, and molecular dynamics simulations, to optimize lead compounds.
Knowledge graph technology facilitates the integration and analysis of disparate data sources, providing a more holistic understanding of biological systems. Drug metabolism prediction models help researchers assess the potential toxicity and pharmacokinetic properties of compounds, reducing the risk of costly failures in later stages of development. The integration of artificial intelligence applications, such as machine learning algorithms and natural language processing, enhances the capabilities of drug discovery informatics platforms. These technologies enable the analysis of large, complex datasets and the identification of novel patterns and insights. The application of drug discovery informatics extends across various sectors, including biotechnology, pharmaceuticals, and academia, as researchers seek to accelerate the development of new therapeutics and improve the efficiency of the drug discovery process.
The ongoing unfolding of market activities and evolving patterns in drug discovery informatics reflect the dynamic nature of this field, as researchers continue to push the boundaries of scientific discovery.
How is this Drug Discovery Informatics Industry segmented?
The drug discovery informatics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. ApplicationDiscovery informaticsDevelopment informaticsSolutionSoftwareServicesGeographyNorth AmericaUSEuropeFranceGermanyUKAPACChinaRest of World (ROW)
By Application Insights
The discovery informatics segment is estimated to witness significant growth during the forecast period.The drug discovery process is a complex and data-intensive endeavor, involving the identification and validation of potential lead compounds for therapeutic applications. This process encompasses various stages, from target identification to preclinical development. At the forefront of this process, researchers employ diverse technologies to generate leads, such as high-throughput screening, molecular modeling, medicinal chemistry, and structural biology. High-throughput screening enables the rapid identification of compounds that interact with specific targets, while molecular modeling and virtual screening techniques facilitate the prediction of compound-target interactions and the optimization of lead structures. Admet prediction models and in vitro assays help assess the pharmacokinetic properties and toxicity of potential leads, ensuring their safety and efficacy. Compound library management systems enable the organization and retrieval of vast collections of chemical compounds, while structure-activity relationship (SAR) and quantitative structure-activity relationship (QSAR) studies provide insights i
https://meditechinsights.com/privacy-policy/https://meditechinsights.com/privacy-policy/
The real-world evidence (RWE) solutions market is expected to expand at a CAGR of ~10% during the forecast period. Key factors driving this growth include increasing regulatory support for RWE adoption, the rising incidence of chronic diseases, increased investment from pharmaceutical companies, the growing focus on personalized medicine and targeted therapies, the widespread adoption of […]
This statistic describes the global pharmaceutical sales in from 2020 to 2024, sorted by regional submarkets. For 2024, total pharmaceutical sales in the United States was estimated to reach around *** billion U.S. dollars. World pharmaceutical sales by regionThe pharmaceutical industry is best known for manufacturing pharmaceutical drugs which aim to diagnose, cure, treat, or prevent diseases. The pharmaceutical sector represents a huge industry, with the global market being worth around *** trillion U.S. dollars. Among the best known top global pharmaceutical companies are Pfizer, Merck and Johnson & Johnson from the U.S., Novartis and Roche from Switzerland, Sanofi from France, etc. Accordingly, North America and Europe are still among the largest global submarkets for pharmaceuticals. In 2024, the United States was still the largest single pharmaceutical market, generating more than *** billion U.S. dollars of revenue. Europe was responsible for generating around *** billion U.S. dollars. These two markets, together with Japan, Canada and Australia, form the so-called established (or developed) markets. The rest of the global pharmaceutical revenue is mainly from emerging markets, which include countries like China, Russia, Brazil and India. In fact, these emerging markets show the fastest increase in pharmaceutical sales. Latin America is the world region with the highest predicted compound annual growth rate until 2028.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Global Essential Medicines Database
In June of 2017, we searched the WHO Essential Medicines and Health Products Information Portal, an online repository that contains hundreds of publication on medicines and health products related to WHO priorities, and a full-section dedicated to national essential medicines lists (EMLs). A WHO information specialist actively searched for updated versions of national EMLs, including national formularies, reimbursement lists, and lists based on standard treatment guidelines.
We included all national EMLs that were posted on the WHO’s NEMLs Repository irrespective of publication date and language. When we found more than one national EML from the same country, we used the most recent. We excluded documents that were not EMLs, such as prescribing guidelines. We also included the 20th edition of the WHO Model EML (2017) in this database.
From each EML we abstracted medicines using International Nonproprietary Names (INNs). For medicines whose names were not in English we used the Anatomical Therapeutic Chemical (ATC) classification system, if available, or translated the names with the help of Google Translate. We listed each medicine individually, whether it was part of a combination product or not. We treated as the same medicine bases and their salts (e.g. promethazine hydrochloride and promethazine) as well as different compounds of the same vitamin or mineral (e.g. ferrous fumarate and ferrous sulfate). We excluded diagnostic agents, antiseptics, disinfectants, and saline solutions.
In this database "1" and "0" indicate the presence or absence of the medicine respectively on an EML.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Many drugs are introduced to the market for commercial and household use each year. Thus it is important to know the characteristics of these drugs.
In this dataset you'll find info from hundreds of drugs that were introduced in 2019.
This data comes from https://data.world/chhs/e54d331c-65d3-4c6e-b4ba-390bd7024248.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
== Introdution ==
For many years PET centres around the world have developed and optimised their own analysis pipelines, including a mixture of in-house and independent software, and have implemented different modelling choices for PET image processing and data quantification. As a result, many different methods and tools are available for PET image analysis.
== Aim of the dataset ==
This dataset aims to provide a normative tool to assess the performance and consistency of PET modelling approaches on the same data for which the ground truth is known. It was created and released for the NRM2018 PET Grand Challenge. The challenge aimed at evaluating the performances of different PET analysis tools to identify areas and magnitude of receptor binding changes in a PET radioligand neurotransmission study.
The present dataset refers to 5 simulated human subjects scanned twice. For each subject the first PET scan (ses-baseline) represents baseline conditions; the second scan (ses-displaced) represents the scan after a pharmacological challenge in which the tracer binding has been displaced in certain regions of interest. A total of 10 dynamic scans are provided in the current dataset.
The nature of the neuroreceptor tracer used for the simulation (hereafter referred to as [11C]LondonPride) wants to be as general as possible. Any similarity to real PET tracer uptake is purely coincidental. Each simulated scan consists of a 90 minutes dynamic PET acquisition after bolus tracer injection as obtained with a Siemens Biograph mMR PET/MR scanner. The data were simulated including attenuation, randoms and scatters effects, the decay of the radiotracer and considering the geometry and resolution of the scanner. PET data can be considered motion-free as no motion or motion-related artifacts are included in the simulated dataset. The data were binned into 23 frames: 4×15 s, 4×60 s, 2×150 s, 10×300 s and 3×600 s. Each frame was reconstructed with the MLEM algorithm with 100 iterations. The reconstructed images available in the dataset are already decay corrected.
All provided PET images are already normalised in standard MNI space (182x218x182 – 1mm).
== Data simulation process ==
For the simulation of each of the 10 scans (5 patients, 2 scans each), time activity curves (TACs) for each voxel of the phantom were generated from the kinetic parameters using the 2TCM equations. The TACs had a resolution of 1 sec and included the effect of the radiotracer decay, which was simulated with a half-life of 20.34 min (11C half-life). Each voxel TAC was binned with the following framing: 4×15 s, 4×60 s, 2×150 s, 10×300 s and 3×600 s by using the mean activity value for each time frame. After this process, the dynamic phantom for each scan is ready to be used in the simulation of each scan. The phantoms had the same resolution as the parametric maps (1×1×1 mm^3).
Each scan was simulated with a total of 3×10e8 counts and by modelling the different physical effects of a PET acquisition. For each frame of a scan, the phantom was smoothed with a 2.5 mm FWHM kernel (lower than the spatial resolution of the mMR scanner since the phantom was already low resolution) and projected into a span 11 sinogram using the mMR scanner geometry. Then the resulting sinograms were multiplied by the attenuation factors, obtained from an attenuation map generated from the CT image of the patient, and by the normalization factors of the mMR scanner. Next, Poisson noise was introduced by simulating a random process for every sinogram bin, obtaining the sinogram with true events. A uniform sinogram multiplied by the normalization factors was used for the randoms and a smoothed version of the emission sinogram for the scatters, which were scaled in order to have 20% of randoms and 25% of scatters of the total counts. Poisson noise was introduced to randoms and scatters and added to the trues sinogram. Finally, each frame was individually reconstructed using the MLEM algorithm with 100 iterations, a 2.5 mm PSF and the standard mMR voxel size (2.09x2.09x2.03 mm3). The reconstructed images were corrected for the activity decay and resampled into the original MNI space. For the simulation and reconstruction, an in-house reconstruction framework was used (Belzunce and Reader 2017).
== Simulated Drug ===
The pharmacological challenge given to the subjects before the second scan (ses-displaced) is based, as is the tracer, on a simulated drug . Any similarity with existing drugs is purely coincidental. The drug has competitive binding to the radiotracer target and has no secondary affinities. The drug is simulated as given as a single oral bolus 30 min prior to the scan.
== Additional data in the folder ===
Along with the raw data, some additional derivatives data are provided. This data are 6 regions of displacements helpful for the quantification and analysis. Six regions of displacement have been manually generated (using ITKSnap) and applied consistently to all the subjects to generate displaced 𝑘3 parametric maps. Based on the neuroreceptor theory (Innis, Cunningham et al. 2007), any change in 𝑘3 would produce an equivalent change in BPnd. The regions volumes of the regions ranged from 343mm3 to 2275mm3 and were selected to be in regions of higher tracer uptake at baseline. None of the displacement ROIs has a purely geometrical (e.g. cube or sphere) or anatomical shape. The regions have been created to represent different sizes and different levels of tracer displacement according to the following values:
+----- ROI -----+----- Volume(mm^3) -----+----- Displacement (%) -----+
| ROI1 | 2555 | 27 |
| ROI2 | 2275 | 27 |
| ROI3 | 1152 | 21 |
| ROI4 | 493 | 18 |
| ROI5 | 343 | 18 |
| ROI6 | 418 | 18 |
+---------------+------------------------+----------------------------+
The ROIs are not symmetrically spatially distributed across the brain. A definintion of the ROI name can be found in the accompaning dseg.tsv file.
== References == - Belzunce, M. A. and A. J. Reader (2017). "Assessment of the impact of modeling axial compression on PET image reconstruction." Medical physics 44(10): 5172-5186. - Innis, R. B., V. J. Cunningham, J. Delforge, M. Fujita, A. Gjedde, R. N. Gunn, J. Holden, S. Houle, S. C. Huang, M. Ichise, H. Iida, H. Ito, Y. Kimura, R. A. Koeppe, G. M. Knudsen, J. Knuuti, A. A. Lammertsma, M. Laruelle, J. Logan, R. P. Maguire, M. A. Mintun, E. D. Morris, R. Parsey, J. C. Price, M. Slifstein, V. Sossi, T. Suhara, J. R. Votaw, D. F. Wong and R. E. Carson (2007). "Consensus nomenclature for in vivo imaging of reversibly binding radioligands." J Cereb Blood Flow Metab 27(9): 1533-1539.
== Appendix: Current Folder Contents ==
├── CHANGES ├── LICENSE ├── README ├── dataset_description.json ├── derivatives │ └── masks │ ├── dseg.tsv │ ├── sub-000101 │ │ ├── ses-baseline │ │ │ └── sub-000101_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000101_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000102 │ │ ├── ses-baseline │ │ │ └── sub-000102_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000102_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000103 │ │ ├── ses-baseline │ │ │ └── sub-000103_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000103_ses-displaced_label-displacementROI_dseg.nii.gz │ ├── sub-000104 │ │ ├── ses-baseline │ │ │ └── sub-000104_ses-baseline_label-displacementROI_dseg.nii.gz │ │ └── ses-displaced │ │ └── sub-000104_ses-displaced_label-displacementROI_dseg.nii.gz │ └── sub-000105 │ ├── ses-baseline │ │ └── sub-000105_ses-baseline_label-displacementROI_dseg.nii.gz │ └── ses-displaced │ └── sub-000105_ses-displaced_label-displacementROI_dseg.nii.gz ├── participants.json ├── participants.tsv ├── sub-000101 │ ├── ses-baseline │ │ ├── anat │ │ │ ├── sub-000101_ses-baseline_acq-T1w.json │ │ │ └── sub-000101_ses-baseline_acq-T1w.nii.gz │ │ └── pet │ │ ├── sub-000101_ses-baseline_rec-MLEM_pet.json │ │ └── sub-000101_ses-baseline_rec-MLEM_pet.nii.gz │ └── ses-displaced │ ├── anat │ │ ├── sub-000101_ses-displaced_acq-T1w.json │ │ └── sub-000101_ses-displaced_acq-T1w.nii.gz │ └── pet │ ├── sub-000101_ses-displaced_rec-MLEM_pet.json │ └── sub-000101_ses-displaced_rec-MLEM_pet.nii.gz ├── sub-000102 │ ├── ses-baseline │ │ ├── anat │ │ │ ├── sub-000102_ses-baseline_acq-T1w.json │ │ │ └── sub-000102_ses-baseline_acq-T1w.nii.gz │ │ └── pet │ │ ├── sub-000102_ses-baseline_rec-MLEM_pet.json │ │ └── sub-000102_ses-baseline_rec-MLEM_pet.nii.gz │ └── ses-displaced │ ├── anat │ │ ├── sub-000102_ses-displaced_acq-T1w.json │ │ └──
In health care, two exciting uses of artificial intelligence — in the clinic for patient care and in the laboratory for drug discovery are remarkably different applications. That perhaps explains why, though it’s still early days for both, they are developing at different rates and now It is possible today to generate a Novel Drug on your own laptop before this would like take millions of dollars and now all you need is an Internet connection and a laptop .first all of all startups until coma dicin used AI to design a drug in 21 days that is Unprecedented that is unheard of the whole R&D and preclinical trial process to create a drug at least two years generally this would take in 1 days this called virtual screening that is the technical term for this in the pharmaceutical industry and now we can use this model deep learning of course deep reasons reinforcement learning and self mapping too
The opportunity is equally compelling in drug discovery, particularly in areas of high unmet need such as rare and hard-to-treat cancers and neurodegenerative conditions. Artificial intelligence can ingest and reason over information from the scientific literature and databases, as well as patient-level data, to identify potential approaches to treat diseases by proposing a drug target, designing a molecule, and defining patients in which to test that molecule to drive greater clinical success.
Here in this data set consists of physical and chemical properties of drugs with there names .This dataset is a lightly cleaned-up version of the non-proprietary version of the Drug Information Database . Some duplicate rows were removed, and column headers were renamed for brevity.
The data is available from Feb 18, 2020.
AMA: American Medical Association BAN: British Approved Name BT: broader term CAS number or CAS#: Chemical Abstracts Service Registry Number ChEBI: Chemical Entities of Biological Interest CTD: Comparative Toxicogenomics Database CUI: Concept Unique Identifier [UMLS] DB: database DID: Drug-Indication Database eVOC: electronic VOCabularies [Merck internal system] FDA: U.S. Food and Drug Administration GN: generic [drug] name GO: Gene Ontology InChI: International Chemical Identifier MedDRA: Medical Dictionary for Reporting Activities MeSH PA: Medical Subject Headings Pharmacological Action [relations] MeSH: Medical Subject Headings NDFRT: U.S. National Drug Formulary Reference Terminology NLM: U.S. National Library of Medicine NLP: natural language processing NT: narrower term OBO: Open Biological & Biomedical Ontologies OTC: over-the-counter [drugs] PDR: Physicians’ Desk Reference PT: preferred term SNOMEDCT: Systematized NOmenclature of MEDicine Clinical Terminology TR: terminological reduction UMLS: Unified Medical Language System UNII: UNique Ingredient Identifier USAN TC: United States Adopted Names Therapeutic Claim USAN: United States Adopted Names USP: United States Pharmacopeia UTS: UMLS Terminology Services WHO-ATC: World Health Organization Anatomic-Therapeutic-Chemical [classification] WHO-DD: World Health Organization Drug Dictionary
one interesting Article: Toward a comprehensive drug ontology: extraction of drug-indication relations from diverse information sources
Despite the potential of artificial intelligence to identify new targets for disease faster, at lower cost, and with lower failure rates, adoption of this technology is still low. Trust has a significant role to play in that :)
If you are interested in joining Kaggle University Club, please e-mail Jessica Li at lijessica@google.com
This Hackathon is open to all undergraduate, master, and PhD students who are part of the Kaggle University Club program. The Hackathon provides students with a chance to build capacity via hands-on ML, learn from one another, and engage in a self-defined project that is meaningful to their careers.
Teams must register via Google Form to be eligible for the Hackathon. The Hackathon starts on Monday, November 12, 2018 and ends on Monday, December 10, 2018. Teams have one month to work on a team submission. Teams must do all work within the Kernel editor and set Kernel(s) to public at all times.
The freestyle format of hackathons has time and again stimulated groundbreaking and innovative data insights and technologies. The Kaggle University Club Hackathon recreates this environment virtually on our platform. We challenge you to build a meaningful project around the UCI Machine Learning - Drug Review Dataset. Teams are free to let their creativity run and propose methods to analyze this dataset and form interesting machine learning models.
Machine learning has permeated nearly all fields and disciplines of study. One hot topic is using natural language processing and sentiment analysis to identify, extract, and make use of subjective information. The UCI ML Drug Review dataset provides patient reviews on specific drugs along with related conditions and a 10-star patient rating system reflecting overall patient satisfaction. The data was obtained by crawling online pharmaceutical review sites. This data was published in a study on sentiment analysis of drug experience over multiple facets, ex. sentiments learned on specific aspects such as effectiveness and side effects (see the acknowledgments section to learn more).
The sky's the limit here in terms of what your team can do! Teams are free to add supplementary datasets in conjunction with the drug review dataset in their Kernel. Discussion is highly encouraged within the forum and Slack so everyone can learn from their peers.
Here are just a couple ideas as to what you could do with the data:
There is no one correct answer to this Hackathon, and teams are free to define the direction of their own project. That being said, there are certain core elements generally found across all outstanding Kernels on the Kaggle platform. The best Kernels are:
Teams with top submissions have a chance to receive exclusive Kaggle University Club swag and be featured on our official blog and across social media.
IMPORTANT: Teams must set all Kernels to public at all times. This is so we can track each team's progression, but more importantly it encourages collaboration, productive discussion, and healthy inspiration to all teams. It is not so that teams can simply copycat good ideas. If a team's Kernel isn't their own organic work, it will not be considered a top submission. Teams must come up with a project on their own.
The final Kernel submission for the Hackathon must contain the following information: