Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
By US Open Data Portal, data.gov [source]
This Electronic Health Information Legal Epidemiology dataset offers an extensive collection of legal and epidemiological data that can be used to understand the complexities of electronic health information. It contains a detailed balance of variables, including legal requirements, enforcement mechanisms, proprietary tools, access restrictions, privacy and security implications, data rights and responsibilities, user accounts and authentication systems. This powerful set provides researchers with real-world insights into the functioning of EHI law in order to assess its impact on patient safety and public health outcomes. With such data it is possible to gain a better understanding of current policies regarding the regulation of electronic health information as well as their potential for improvement in safeguarding patient confidentiality. Use this dataset to explore how these laws impact our healthcare system by exploring patterns across different groups over time or analyze changes leading up to new versions or updates. Make exciting discoveries with this comprehensive dataset!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Start by familiarizing yourself with the different columns of the dataset. Examine each column closely and look up any unfamiliar terminology to get a better understanding of what the columns are referencing.
Once you understand the data and what it is intended to represent, think about how you might want to use it in your analysis. You may want to create a research question, or narrower focus for your project surrounding legal epidemiology of electronic health information that can be answered with this data set.
After creating your research plan, begin manipulating and cleaning up the data as needed in order to prepare it for analysis or visualization as specified in your project plan or research question/model design steps you have outlined .
4 .Next, perform exploratory data analysis (EDA) on relevant subsets of data from specific countries if needed on specific subsets based on targets of interests (e.g gender). Filter out irrelevant information necessary for drawing meaningful insights; analyze patterns and trends observed in your filtered datasets ; compare areas which have differing rates e-health related rules and regulations tying decisions made by elected officials strongly driven by demographics , socioeconomics factors ,ideology etc.. . Look out for correlations using statistical information as needed throughout all stages in process from filtering out dis-informative subgroups from full population set til generating visualizations(graphs/ diagrams) depicting valid insight leveraging descriptive / predictive models properly validate against reference datasets when available always keep openness principal during gathering info especially when needs requires contact external sources such validating multiple sources work best provide strong seals establishing validity accuracy facts statement representing humans case scenarios digital support suitably localized supporting local languages culture respectively while keeping secure datasets private visible limited particular users duly authorized access 5 Finally create concrete summaries reporting discoveries create share findings preferably infographics showcasing evidence observances providing overall assessment main conclusions protocols developed so far broader community indirectly related interested professionals able benefit those results ideas complete transparently freely adapted locally ported increase overall global society level enhancing potentiality range impact derive conditions allowing wider adoption increased usage diffusion capture wide spread change movement affect global e-health legal domain clear manner
- Studying how technology affects public health policies and practice - Using the data, researchers can look at the various types of legal regulations related to electronic health information to examine any relations between technology and public health decisions in certain areas or regions.
- Evaluating trends in legal epidemiology – With this data, policymakers can identify patterns that help measure the evolution of electronic health information regulations over time and investigate why such rules are changing within different states or countries.
- Analysing possible impacts on healthcare costs – Looking at changes in laws, regulations, and standards relate...
Facebook
TwitterData consist of CMS Medicare data files which are restricted access and cannot be released publicly. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. EPA cannot release CBI, or data protected by copyright, patent, or otherwise subject to trade secret restrictions. Request for access to CBI data may be directed to the dataset owner by an authorized person by contacting the party listed. It can be accessed through the following means: CMS Medicare data are available from: https://www.cms.gov/data-research/files-for-order/data-disclosures-and-data-use-agreements-duas/limited-data-set-lds with the requirement of a signed Data Use Agreement. . Weather data are available at https://prism.oregonstate.edu/. Format: The data that support the findings of this study are available from the Centers for Medicare and Medicaid Services (CMS). Restrictions apply to the availability of these data, which were provided under a Data Use Agreement specific to this study. Data are available from: https://www.cms.gov/data-research/files-for-order/data-disclosures-and-data-use-agreements-duas/limited-data-set-lds with the requirement of a signed Data Use Agreement. Data do not contain personally identifiable information but contain are classified as Limited Data Set files and their distribution require an agreement and between CMS and the requester and approval by CMS. Weather data are available at https://prism.oregonstate.edu/. Because the data do not contain identifiable private information and were not obtained through interaction or intervention with individuals, the Institutional Review Board for the University of North Carolina and the US Environmental Protection Agency Human Research Protocol Officer determined that use of this data does not constitute human subjects research. This dataset is associated with the following publication: Wade, T., and C. Herbert. Weather conditions and legionellosis: a nationwide case-crossover study among Medicare recipients. EPIDEMIOLOGY AND INFECTION. Cambridge University Press, Cambridge, UK, 152: E125, (2024).
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This Public Health Portfolio (Directly Funded Research - Programme and Training Awards) dataset contains NIHR directly funded research awards where the funding is allocated to an award holder or host organisation to carry out a specific piece of research or complete a training award. The NIHR also invests significantly in centres of excellence, collaborations, services and facilities to support research in England. Collectively these form NIHR infrastructure support. NIHR infrastructure supported projects are available in the Public Health Portfolio (Infrastructure Support) dataset which you can find here.NIHR directly funded research awards (Programmes and Training Awards) that were funded between January 2006 and the present extraction date are eligible for inclusion in this dataset. An agreed inclusion/exclusion criteria is used to categorise awards as public health awards (see below). Following inclusion in the dataset, public health awards are second level coded to one of the four Public Health Outcomes Framework domains. These domains are: (1) wider determinants (2) health improvement (3) health protection (4) healthcare and premature mortality.More information on the Public Health Outcomes Framework domains can be found here.This dataset is updated quarterly to include new NIHR awards categorised as public health awards. Please note that for those Public Health Research Programme projects showing an Award Budget of £0.00, the project is undertaken by an on-call team for example, PHIRST, Public Health Review Team, or Knowledge Mobilisation Team, as part of an ongoing programme of work.Inclusion CriteriaThe NIHR Public Health Overview project team worked with colleagues across NIHR public health research to define the inclusion criteria for NIHR public health research. NIHR directly funded research awards are categorised as public health if they are determined to be ‘investigations of interventions in, or studies of, populations that are anticipated to have an effect on health or on health inequity at a population level.’ This definition of public health is intentionally broad to capture the wide range of NIHR public health research across prevention, health improvement, health protection, and healthcare services (both within and outside of NHS settings). This dataset does not reflect the NIHR’s total investment in public health research. The intention is to showcase a subset of the wider NIHR public health portfolio. This dataset includes NIHR directly funded research awards categorised as public health awards. This dataset does not include public health awards or projects funded by any of the three NIHR Research Schools or NIHR Health Protection Research Units.DisclaimersUsers of this dataset should acknowledge the broad definition of public health that has been used to develop the inclusion criteria for this dataset. Please note that this dataset is currently subject to a limited data quality review. We are working to improve our data collection methodologies. Please also note that some awards may also appear in other NIHR curated datasets. Further InformationFurther information on the individual awards shown in the dataset can be found on the NIHR’s Funding & Awards website here. Further information on individual NIHR Research Programme’s decision making processes for funding health and social care research can be found here.Further information on NIHR’s investment in public health research can be found as follows:The NIHR is one of the main funders of public health research in the UK. Public health research falls within the remit of a range of NIHR Directly Funded Research (Programmes and Training Awards), and NIHR Infrastructure Support. NIHR School for Public Health here.NIHR Public Health Policy Research Unit here. NIHR Health Protection Research Units here.NIHR Public Health Research Programme Health Determinants Research Collaborations (HDRC) here.NIHR Public Health Research Programme Public Health Intervention Responsive Studies Teams (PHIRST) here.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Shows ACE project hexagons with no documented rare species occurrences (as of 12/2009), likely representing areas of limited survey data. Biological index values in these areas may be artificially low due to lack of data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project includes three datasets: the first dataset compiles dataset metadata commonalities that were identified from 48 Canadian restricted health data sources. The second dataset compiles access process metadata commonalities extracted from the same 48 data sources. The third dataset maps metadata commonalities of the first dataset to existing metadata standards including DataCite, DDI, DCAT, and DATS. This mapping exercise was completed to determine whether metadata used by restricted data sources aligned with existing standards for research data.
Facebook
TwitterThe Healthcare Cost and Utilization Project (HCUP) Nationwide Emergency Department Sample (NEDS) is the largest all-payer emergency department (ED) database in the United States. yielding national estimates of hospital-owned ED visits. Unweighted, it contains data from over 30 million ED visits each year. Weighted, it estimates roughly 145 million ED visits nationally. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels.
Sampled from the HCUP State Inpatient Databases (SID) and State Emergency Department Databases (SEDD), the HCUP NEDS can be used to create national and regional estimates of ED care. The SID contain information on patients initially seen in the ED and subsequently admitted to the same hospital. The SEDD capture information on ED visits that do not result in an admission (i.e., treat-and-release visits and transfers to another hospital). Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels.
The NEDS contain information about geographic characteristics, hospital characteristics, patient characteristics, and the nature of visits (e.g., common reasons for ED visits, including injuries). The NEDS contains clinical and resource use information included in a typical discharge abstract, with safeguards to protect the privacy of individual patients, physicians, and hospitals (as required by data sources). It includes ED charge information for over 85% of patients, regardless of expected payer, including but not limited to Medicare, Medicaid, private insurance, self-pay, or those billed as ‘no charge’. The NEDS excludes data elements that could directly or indirectly identify individuals, hospitals, or states.Restricted access data files are available with a data use agreement and brief online security training.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
(rev. 2021-04-01) Request dataset here: https://champshealth.org/data-access/ Version 4.2 of the standard, limited CHAMPS dataset included these files: case demographics (including MITS measurements), verbal autopsy, DeCoDe results, TAC results, and lab results. Sixteen new columns were added to this dataset; one column was renamed. This dataset is updated monthly. The version of this dataset changes quarterly when a revised ICD-10 mapping document is uploaded.
Facebook
Twitterhttps://www.caida.org/about/legal/aua/https://www.caida.org/about/legal/aua/
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
CAIDA's Spoofer project provides information about deployed Source Address Validation (SAV) policy of ASes in the Internet. The Spoofer API is public data; The restricted dataset includes information that we do not provide through the public API, including the results of traceroute and tracespoof measurements. The dataset is provided in database format.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/36231/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/36231/terms
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). The study sampled over 150,000 mailing addresses across the United States to create a national sample of people who use or do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete an interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population (CNP) at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Unit (PSU)s and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the CNP at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the CNP at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This "second replenishment sample" was combined for estimation and analysis purposes with the Wave 7 adult and youth respondents from the Wave 4 Cohorts who were at least age 15 and in the CNP at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Dataset 0002 (DS0002) contains the data from the State Design Data. This file contains 7 variables and 82,139 cases. The state identifier in the State Design file reflects the participant's state of residence at the time of selection and recruitment for the PATH Study. Dataset 1011 (DS1011) contains the data from the Wave 1 Adult Questionnaire. This data file contains 2,021 variables and 32,320 cases. Each of the cases represents a single, completed interview. Dataset 1012 (DS1012) contains the data from the Wave 1 Youth and Parent Questionnaire. This file contains 1,431 variables and 13,651 cases. Dataset 1411 (DS1411) contains the Wave 1 State Identifier data for Adults and has 5 variables and 32,320 cases. Dataset 1412 (DS1412) contains the Wave 1 State Identifier data for Youth (and Parents) and has 5 variables and 13,651 cases. The same 5 variables are in each State Identifier dataset, including PERSONID for linking the State Identifier to the questionnaire and biomarker data and 3 variables designating the state (state Federal Information Processing System (FIPS), state abbreviation, and full name of the state). The State Identifier values in these datasets represent participants' state of residence at the time of Wave 1, which is also their state of residence at the time of recruitment. Dataset 1611 (DS1611) contains the Tobacco Universal Product Code (UPC) data from Wave 1. This data file contains 32 variables and 8,601 cases. This file contains UPC values on the packages of tobacco products used or in the possession of adult respondents at the time of Wave 1. The UPC values can be used to identify and validate the specific products used by respondents and augment the analyses of the characteristics of tobacco products used
Facebook
TwitterThe Denominator File combines Medicare beneficiary entitlement status information from administrative enrollment records with third-party payer information and GHP enrollment information. The Denominator File contains data on all Medicare beneficiaries enrolled and or entitled in a given year. It is an abbreviated version of the Enrollment Data Base (EDB) (selected data elements). It does not contain data on all beneficiaries ever entitled to Medicare. The file contains data only for beneficiaries who were entitled during the year of the data. These data are available annually in May of the current year for the prior year.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Limited dataset providing physician's name, company, city, state, zip, and website. Dataset is useful for someone looking to build a more complete lead generation list with web scrapping or data entry to supplement current information. List was developed from manual data entry and web search.
Lead Generation
doctors,medicine,physicians
1775
$29.99
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/38008/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/38008/terms
The PATH Study was launched in 2011 to inform the Food and Drug Administration's regulatory activities under the Family Smoking Prevention and Tobacco Control Act (TCA). The PATH Study is a collaboration between the National Institute on Drug Abuse (NIDA), National Institutes of Health (NIH), and the Center for Tobacco Products (CTP), Food and Drug Administration (FDA). For Wave 1 (baseline), the study sampled over 150,000 mailing addresses across the United States to create a national sample of people who do and do not use tobacco. 45,971 adults and youth constitute the first (baseline) wave, Wave 1, of data collected by this longitudinal cohort study. These 45,971 adults and youth along with 7,207 "shadow youth" (youth ages 9 to 11 sampled at Wave 1) make up the 53,178 participants that constitute the Wave 1 Cohort. Respondents are asked to complete an interview at each follow-up wave. Youth who turn 18 by the current wave of data collection are considered "aged-up adults" and are invited to complete the Adult Interview. Additionally, "shadow youth" are considered "aged-up youth" upon turning 12 years old, when they are asked to complete the Youth Interview after parental consent. At Wave 4, a probability sample of 14,098 adults, youth, and shadow youth ages 10 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 4. This sample was recruited from residential addresses not selected for Wave 1 in the same sampled Primary Sampling Units (PSUs) and segments using similar within-household sampling procedures. This "replenishment sample" was combined for estimation and analysis purposes with Wave 4 adult and youth respondents from the Wave 1 Cohort who were in the civilian, noninstitutionalized population at the time of Wave 4. This combined set of Wave 4 participants, 52,731 participants in total, forms the Wave 4 Cohort. At Wave 7, a probability sample of 14,863 adults, youth, and shadow youth ages 9 to 11 was selected from the civilian, noninstitutionalized population at the time of Wave 7. This sample was recruited from residential addresses not selected for Wave 1 or Wave 4 in the same sampled PSUs and segments using similar within-household sampling procedures. This second replenishment sample was combined for estimation and analysis purposes with Wave 7 adult and youth respondents from the Wave 4 Cohort who were at least age 15 and in the civilian, noninstitutionalized population at the time of Wave 7. This combined set of Wave 7 participants, 46,169 participants in total, forms the Wave 7 Cohort. Please refer to the Restricted-Use Files User Guide that provides further details about children designated as "shadow youth" and the formation of the Wave 1, Wave 4, and Wave 7 Cohorts. Dataset 0001 (DS0001) contains the data from the Public-Use File Master Linkage File (PUF-MLF). This file contains 93 variables and 82,139 cases. The file provides a master list of every person's unique identification number and what type of respondent they were in each wave for data that are available in the Public-Use Files and Special Collection Public-Use Files. Dataset 0002 (DS0002) contains the data from the Restricted-Use File Master Linkage File (RUF-MLF). This file contains 202 variables and 82,139 cases. The file provides a master list of every person's unique identification number and what type of respondent they were in each wave for data that are available in the Restricted-Use Files, Special Collection Restricted-Use Files, and Biomarker Restricted-Use Files.
Facebook
TwitterThe National Center for Advancing Translational Sciences (NCATS) has systematically compiled clinical, laboratory and diagnostic data from electronic health records to support COVID-19 research efforts via the National COVID Cohort Collaborative (N3C) Data Enclave. As of August 2, 2022, the repository contains information from over 15 million patients (including 5.8 million COVID-19 positive patients) across the United States.
The N3C Data Enclave is organized into 3 levels of data with varying access restrictions:
Facebook
TwitterThe National Electronic Health Records Survey (NEHRS) is an annual survey of non-federally employed, office-based physicians practicing in the United States (excluding those in the specialties of anesthesiology, radiology, and pathology). NEHRS began in 2008 and was originally designed as an annual mail supplement to the National Ambulatory Medical Care Survey (NAMCS). Since 2012, NEHRS has been administered as a survey independent of NAMCS. Data from NEHRS can be used to produce state and national estimates of EHR adoption and capabilities, burden associated with EHRs, and progress physicians have made towards meeting the policy goals of the HITECH Act. In more recent years, survey questions have also asked about Promoting Interoperability programs, sponsored by the Centers for Medicare and Medicaid Services (CMS).
Restricted file data dictionaries are available.
Facebook
TwitterRestricted Data Files Available at the Data Centers Researchers and users with approved research projects can access restricted data files that have not been publicly released for reasons of confidentiality at the AHRQ Data Center in Rockville, Maryland. Qualified researchers can also access restricted data files through the U.S. Census Research Data Center (RDC) network (http://www.census.gov/ces/dataproducts/index.html -- Scroll down the page and click on the Agency for Health Care Research and Quality (AHRQ) link.) For information on the RDC research proposal process and the data sets available, read AHRQ-Census Bureau agreement on access to restricted MEPS data.
Facebook
TwitterNo description provided
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was primarily designed for the Helsinki Tomography Challenge 2022 (HTC2022), but it can be used for generic algorithm research and development in 2D CT reconstruction.
The dataset contains 2D tomographic measurements, i.e., sinograms and the affiliated metadata containing measurement geometry and other specifications. The sinograms have already been pre-processed with background and flat-field corrections, and compensated for a slightly misaligned center of rotation in the cone-beam computed tomography scanner. The log-transforms from intensity measurements to attenuation data have also been already computed. The data has been stored as MATLAB structs and saved in .mat file format.
The purpose of HTC2022 was to develop algorithms for limited angle tomography. The challenge data consists of tomographic measurements of two sets of plastic phantoms with a diameter of 7 cm and with holes of differing shapes cut into them. The first set is the teaching data, containing five training phantoms. The second set consists of 21 test phantoms used in the challenge to test algorithm performance. The test phantom data was released after the competition period ended.
The training phantoms were designed to facilitate algorithm development and benchmarking for the challenge itself. Four of the training phantoms contain holes. These are labeled ta, tb, tc, and td. A fifth training phantom is a solid disc with no holes. We encourage subsampling these datasets to create limited data sinograms and comparing the reconstruction results to the ground truth obtainable from the full-data sinograms. Note that the phantoms are not all identically centered.
The teaching data includes the following files for each phantom:
Also included in the teaching dataset is a MATLAB example script for how to work with the CT data.
The challenge test data is arranged into seven different difficulty levels, labeled 1-7, with each level containing three different phantoms, labeled A-C. As the difficulty level increases, the number of holes increases and their shapes become increasingly complex. Furthermore, the view angle is reduced as the difficulty level increases, starting with a 90 degree field of view at level 1, and reducing by 10 degrees at each increasing level of difficulty. The view-angles in the challenge data will not all begin from 0 degrees.
The test data includes the following files for each phantom:
Also included in the test dataset is a collage in .PNG format, showing all the ground truth segmentation images and the photographs of the phantoms together.
As the orientation of CT reconstructions can depend on the tools used, we have included the example reconstructions for each of the phantoms to demonstrate how the reconstructions obtained from the sinograms and the specified geometry should be oriented. The reconstructions have been computed using the filtered back-projection algorithm (FBP) provided by the ASTRA Toolbox.
We have also included segmentation examples of the reconstructions to demonstrate the desired format for the final competition entries. The segmentation images for obtained by the following steps:
1) Set all negative pixel values in the reconstruction to zero.
2) Determine a threshold level using Otsu's method.
3) Globally threshold the image using the threshold level.
4) Perform a morphological closing on the image using a disc with a radius of 3 pixels.
The competitors were not obliged to follow the above procedure, and were encouraged to explore various segmentation techniques for the limited angle reconstructions.
For getting started with the data, we recommend the following MATLAB toolboxes:
HelTomo - Helsinki Tomography Toolbox
https://github.com/Diagonalizable/HelTomo/
The ASTRA Toolbox
https://www.astra-toolbox.com/
Spot – A Linear-Operator Toolbox
https://www.cs.ubc.ca/labs/scl/spot/
Using the above toolboxes for the Challenge was by no means compulsory: the metadata for each dataset contains a full specification of the measurement geometry, and the competitors were free to use any and all computational tools they want to in computing the reconstructions and segmentations.
All measurements were conducted at the Industrial Mathematics Computed Tomography Laboratory at the University of Helsinki.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset, titled TikTok Viral Trends 2025, provides a curated snapshot of 50 trending TikTok videos from September 2025, capturing the platform's dynamic content landscape. Sourced from real-time web analyses and social media insights (e.g., X posts, trend reports from reputable sources like Ramdam, NapoleonCat, and Tokchart), it focuses on viral videos across diverse categories such as Entertainment, Music, Comedy, Lifestyle, Beauty, Sustainability, and Technology. The dataset is designed for data scientists, researchers, and enthusiasts interested in analyzing social media trends, predicting virality, or exploring multimodal machine learning applications (e.g., NLP, time-series, or clustering). It stands out from existing Kaggle datasets by offering fresh, 2025-specific data with rich metadata, including engagement metrics, hashtags, and sound/trend associations.
tiktok_data.csv).post:72, web:65).The dataset contains the following 12 columns:
- video_id: Unique identifier for each video or trend (integer or hashtag-based).
- author: Creator username or group (anonymized as "Unknown" where not specified).
- description: Brief summary of the video content or trend, derived from source context.
- upload_date: Approximate or exact posting date (YYYY-MM-DD).
- views: Reported view count (e.g., millions, billions for hashtag aggregates; "N/A" if unavailable).
- likes: Reported like count (e.g., thousands, millions; "N/A" if unavailable).
- shares: Share count (often "N/A" due to limited public data).
- comments: Comment count (often "N/A" due to limited public data).
- hashtags: Key hashtags associated with the video or trend (e.g., #Kpop, #Viral).
- category: Inferred content category (e.g., Entertainment, Music, Comedy, Lifestyle, Sustainability, Tech).
- sound_or_trend: Associated audio track or challenge name driving the trend (e.g., "Soda Pop dance", "JUMP").
- source: Citation of data origin (e.g., post:72 for X post ID, web:65 for web source ID).
#Perfume reaching 39.3B views.This dataset is ideal for a variety of machine learning and data analysis tasks on Kaggle, including but not limited to:
- Virality Prediction: Use views, likes, and hashtags to train regression or classification models (e.g., XGBoost, neural networks) to predict video success.
- Trend Analysis: Apply clustering (e.g., K-means) or topic modeling (e.g., LDA) to identify emerging content themes or regional differences.
- NLP Applications: Analyze descriptions and hashtags with BERT or word embeddings to study sentiment, cultural trends, or influencer impact.
- Time-Series Forecasting: Leverage upload_date and engagement metrics for temporal analysis of trend lifecycles.
- Recommendation Systems: Build content recommendation models based on category, sound, or hashtag similarities.
- Social Media Ethics: Explore AI-driven trends (e.g., deepfake Identity Swaps) for studies on misinformation or content authenticity.
#Ominous). Exact metrics may vary slightly due to real-time fluctuations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The BrisT1D Dataset features the data collected in a longitudinal study in which 24 young adults with Type 1 Diabetes (T1D) in the UK were given smartwatches to use alongside their T1D self-management. During the six-month study, participants donated the data collected by their T1D devices and smartwatches and were involved in monthly interviews or focus groups. The anonymised transcripts of these study sessions are included along with the device data, in both an anonymised raw state and a processed state. This is the Restricted-Access parts of the BrisT1D Dataset. It includes the raw state device data. The processed state device data, study forms, transcripts, and demographic data can be found in the BrisT1D-Open Dataset (https://doi.org/10.5523/bris.33z5jc8fa6tob21ptrugzqog08), available through data.bris (University of Bristol Data Repository) under Open Access.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This work was conducted by the Diverse Rotations Improve Valuable Ecosystem Services (DRIVES) project, based in the USDA-ARS Sustainable Agricultural Systems Lab in Beltsville, MD. The DRIVES team compiled a database of 20-plus long-term cropping systems experiments in North America in order to conduct cross-site research. This repository contains all scripts from our first research paper from the DRIVES database: "Rotational complexity increases cropping system output under poorer growing conditions," published in One Earth (in press). This analysis uses crop yield and experimental design data from the DRIVES database and public data sources for crop prices and inflation. This repository includes limited datasets derived from public sources or lacking connection to site IDs. We do not have permission to share the full primary dataset, but can provide data upon request with permission from site contacts.The scripts show all data setup, analysis, and visualization steps used to investigate how crop rotation diversity (defined by rotation length and the number of species) impacts productivity of whole rotations and component crops under varying growing conditions. We used Bayesian multilevel modeling fit to data from 20 long-term cropping systems datasets in North America (434 site-years, 36,000 observations). Rotation- and crop-level productivity were quantified as dollar output, using price coefficients derived from National Agriculture Statistics Service (NASS) price data (included in repository). Growing condtions were quantified using an Environmental Index calculated from site-year average output. Bayesian multilevel models were implemented using the 'brms' R package, which is a wrapper for Stan. Descriptions of all files are included in README.pdf.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
By US Open Data Portal, data.gov [source]
This Electronic Health Information Legal Epidemiology dataset offers an extensive collection of legal and epidemiological data that can be used to understand the complexities of electronic health information. It contains a detailed balance of variables, including legal requirements, enforcement mechanisms, proprietary tools, access restrictions, privacy and security implications, data rights and responsibilities, user accounts and authentication systems. This powerful set provides researchers with real-world insights into the functioning of EHI law in order to assess its impact on patient safety and public health outcomes. With such data it is possible to gain a better understanding of current policies regarding the regulation of electronic health information as well as their potential for improvement in safeguarding patient confidentiality. Use this dataset to explore how these laws impact our healthcare system by exploring patterns across different groups over time or analyze changes leading up to new versions or updates. Make exciting discoveries with this comprehensive dataset!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Start by familiarizing yourself with the different columns of the dataset. Examine each column closely and look up any unfamiliar terminology to get a better understanding of what the columns are referencing.
Once you understand the data and what it is intended to represent, think about how you might want to use it in your analysis. You may want to create a research question, or narrower focus for your project surrounding legal epidemiology of electronic health information that can be answered with this data set.
After creating your research plan, begin manipulating and cleaning up the data as needed in order to prepare it for analysis or visualization as specified in your project plan or research question/model design steps you have outlined .
4 .Next, perform exploratory data analysis (EDA) on relevant subsets of data from specific countries if needed on specific subsets based on targets of interests (e.g gender). Filter out irrelevant information necessary for drawing meaningful insights; analyze patterns and trends observed in your filtered datasets ; compare areas which have differing rates e-health related rules and regulations tying decisions made by elected officials strongly driven by demographics , socioeconomics factors ,ideology etc.. . Look out for correlations using statistical information as needed throughout all stages in process from filtering out dis-informative subgroups from full population set til generating visualizations(graphs/ diagrams) depicting valid insight leveraging descriptive / predictive models properly validate against reference datasets when available always keep openness principal during gathering info especially when needs requires contact external sources such validating multiple sources work best provide strong seals establishing validity accuracy facts statement representing humans case scenarios digital support suitably localized supporting local languages culture respectively while keeping secure datasets private visible limited particular users duly authorized access 5 Finally create concrete summaries reporting discoveries create share findings preferably infographics showcasing evidence observances providing overall assessment main conclusions protocols developed so far broader community indirectly related interested professionals able benefit those results ideas complete transparently freely adapted locally ported increase overall global society level enhancing potentiality range impact derive conditions allowing wider adoption increased usage diffusion capture wide spread change movement affect global e-health legal domain clear manner
- Studying how technology affects public health policies and practice - Using the data, researchers can look at the various types of legal regulations related to electronic health information to examine any relations between technology and public health decisions in certain areas or regions.
- Evaluating trends in legal epidemiology – With this data, policymakers can identify patterns that help measure the evolution of electronic health information regulations over time and investigate why such rules are changing within different states or countries.
- Analysing possible impacts on healthcare costs – Looking at changes in laws, regulations, and standards relate...