Facebook
TwitterMillennials were the largest generation group in the United States in 2024, with an estimated population of ***** million. Born between 1981 and 1996, Millennials recently surpassed Baby Boomers as the biggest group, and they will continue to be a major part of the population for many years. The rise of Generation Alpha Generation Alpha is the most recent to have been named, and many group members will not be able to remember a time before smartphones and social media. As of 2024, the oldest Generation Alpha members were still only aging into adolescents. However, the group already makes up around ***** percent of the U.S. population, and they are said to be the most racially and ethnically diverse of all the generation groups. Boomers vs. Millennials The number of Baby Boomers, whose generation was defined by the boom in births following the Second World War, has fallen by around ***** million since 2010. However, they remain the second-largest generation group, and aging Boomers are contributing to steady increases in the median age of the population. Meanwhile, the Millennial generation continues to grow, and one reason for this is the increasing number of young immigrants arriving in the United States.
Facebook
TwitterIn 2024, Millennials were the largest generation group in the United States, making up about 21.81 percent of the population. However, Generation Z was not far behind, with Gen Z accounting for around 20.81 percent of the population in that year.
Facebook
TwitterIn 2024, there were approximately ** million millennials in the United Kingdom, making it the largest generational cohort at that time. Millennials surpassed the Baby Boomer generation as the largest generation for the first time in 2019. The two youngest generations, Gen Z and Gen Alpha, numbered approximately **** million, and ****million respectively. Gen X are, as of the most recent year, the second-largest generation in the UK at ** million people. The population born before the end of the Second World War in mid-1945 was just over **** million in this year. Post-War Baby Boom The baby boomer generation was the largest generation for much of this period due to the spike in births that happened after the Second World War. In 1947, for example, there were over *** million live births in the United Kingdom, compared with just ******* live births just thirty years later in 1977. Members of this generation are typically the parents of millennials, and were the driving force behind the countercultural movement of the 1960s, due to their large numbers relative to older generations at the time. The next generational cohort after Boomers are Generation X, born between 1965 and 1980. This generation had fewer members than the Boomer generation for most of its existence, and only became larger than it in 2021. Millennials and Gen Z As of 2024, the most common single year of age in the United Kingdom was 33, with approximately ******* people this age. Furthermore, people aged between 30 and 34 were the most numerous age group in this year, at almost *** million people. As of 2024, people in this age group were Millennials, the large generation who came of age in the late 1990s and early 2000s. Many members of this generation entered the workforce following the 2008 financial crash, and suffered through high levels of unemployment during the early 2010s. The generation that followed Millennials, Generation Z, have also experienced tough socio-economic conditions recently, with key formative years dominated by the COVID-19 pandemic, climate change, and an increasingly unstable geopolitical situation.
Facebook
Twitteroptimum-benchmark/top-text-generation-models dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset consists upon the nuclear power generation percentage by countries. Thought it seems short but contains interesting information about countries in nuclear club. Specially about France which has the largest share in her power generation capacity by nuclear reactors and one of the largest electricity in Europe. Though France is planning to reduce its dependency over nuclear power.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Video-As-Prompt: Unified Semantic Control for Video Generation
🔥 News
Oct 24, 2025: 📖 We release the first unified semantic video generation model, Video-As-Prompt (VAP)! Oct 24, 2025: 🤗 We release the VAP-Data, the largest semantic-controlled video generation datasets with more than $100K$ samples! Oct 24, 2025: 👋 We present the technical report of Video-As-Prompt, please check out the details and spark some discussion!… See the full description on the dataset page: https://huggingface.co/datasets/BianYx/VAP-Data.
Facebook
TwitterSource: BP, World Energy Statistics 2017, June 2017.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description:
The "Global News Articles" dataset was acquired through the NewsAPI, a powerful tool that provides access to a vast collection of news articles from various sources around the world. The dataset contains a curated selection of news articles covering a wide range of topics, including politics, business, technology, health, and more.
Context:
In today's fast-paced world, staying informed about global events is essential. This dataset aims to provide researchers, journalists, and analysts with a comprehensive source of news articles for analysis and insight generation. By leveraging the NewsAPI, we have gathered a diverse set of articles to facilitate research, trend analysis, sentiment analysis, and other data-driven tasks.
Inspiration:
The inspiration behind creating this dataset stems from the growing need for reliable and easily accessible news data for analytical purposes. With the proliferation of digital media and the abundance of news sources available online, there is a wealth of information waiting to be tapped into. This dataset serves as a valuable resource for anyone interested in studying trends, patterns, and developments in the global news landscape.
Sources:
The primary source of the data is the NewsAPI, which aggregates news articles from thousands of sources worldwide. The dataset includes articles from reputable news outlets, blogs, and online publications. Only the title, content, and headlines features have been extracted from the articles to provide concise yet informative data for analysis.
Acquisition of Data through NewsAPI:
By leveraging the capabilities of NewsAPI, we have curated a valuable dataset that provides insights into global news trends, enabling informed decision-making and analysis in diverse fields.
Facebook
Twitter➡️ You can choose from multiple data formats, delivery frequency options, and delivery methods;
➡️ You can select raw or clean and AI-enriched datasets;
➡️ Multiple APIs designed for effortless search and enrichment (accessible using a user-friendly self-service tool);
➡️ Fresh data: daily updates, easy change tracking with dedicated data fields, and a constant flow of new data;
➡️ You get all necessary resources for evaluating our data: a free consultation, a data sample, or free credits for testing our APIs.
Coresignal's employee data enables you to create and improve innovative data-driven solutions and extract actionable business insights. These datasets are popular among companies from different industries, including HR and sales technology and investment.
Employee Data use cases:
✅ Source best-fit talent for your recruitment needs
Coresignal's Employee Data can help source the best-fit talent for your recruitment needs by providing the most up-to-date information on qualified candidates globally.
✅ Fuel your lead generation pipeline
Enhance lead generation with 712M+ up-to-date employee records from the largest professional network. Our Employee Data can help you develop a qualified list of potential clients and enrich your own database.
✅ Analyze talent for investment opportunities
Employee Data can help you generate actionable signals and identify new investment opportunities earlier than competitors or perform deeper analysis of companies you're interested in.
➡️ Why 400+ data-powered businesses choose Coresignal:
Facebook
Twitterhttps://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy
According to our latest research, the Synthetic Dataset Generation market size was valued at $1.2 billion in 2024 and is projected to reach $8.7 billion by 2033, expanding at an impressive CAGR of 24.6% during 2024–2033. The primary driving force behind this global expansion is the escalating demand for high-quality, diverse, and bias-free datasets to fuel advanced artificial intelligence (AI) and machine learning (ML) models. As organizations across industries face increasing challenges in acquiring large-scale, annotated, and privacy-compliant real-world data, synthetic dataset generation has emerged as a transformative solution. This technology not only accelerates the development and deployment of AI systems but also addresses critical data privacy, security, and cost constraints, making it indispensable in today’s data-centric economy.
North America currently holds the largest share of the global synthetic dataset generation market, accounting for over 38% of the total market value in 2024. The region’s dominance is primarily attributed to its mature technology ecosystem, robust investment in AI research, and the early adoption of synthetic data solutions by leading enterprises and tech giants. The presence of major synthetic data vendors, a strong network of academic research institutions, and proactive regulatory guidance on data privacy have collectively accelerated market growth in North America. Furthermore, favorable government policies and funding initiatives aimed at advancing AI innovation continue to foster a thriving environment for synthetic dataset generation, particularly in sectors such as healthcare, finance, and autonomous vehicles.
Asia Pacific is the fastest-growing region in the synthetic dataset generation market, projected to register a remarkable CAGR of 29.3% from 2024 to 2033. This exceptional growth is driven by increasing digital transformation initiatives, rapid adoption of AI-powered solutions, and significant investments by both public and private sectors. Countries like China, Japan, South Korea, and India are aggressively expanding their AI capabilities, leading to a surge in demand for synthetic data to support machine learning and computer vision applications. The region is witnessing heightened interest from global technology vendors, who are establishing partnerships and R&D centers to tap into the burgeoning opportunities. The proliferation of smart devices, e-commerce, and fintech innovations further amplifies the need for scalable and secure synthetic datasets.
Emerging economies in Latin America, the Middle East, and Africa are gradually embracing synthetic dataset generation, though adoption remains at an early stage due to infrastructural and regulatory challenges. Localized demand is primarily concentrated in industries such as government, BFSI, and telecommunications, where data privacy and localization policies are stringent. While these regions hold significant potential for future growth, market expansion is currently restrained by limited technical expertise, slower digital infrastructure development, and the need for tailored synthetic data solutions that address unique regional requirements. Nonetheless, increasing awareness, pilot projects, and supportive policy reforms are expected to accelerate adoption in the coming years.
| Attributes | Details |
| Report Title | Synthetic Dataset Generation Market Research Report 2033 |
| By Component | Software, Services |
| By Data Type | Text, Image, Video, Audio, Tabular, Others |
| By Application | Machine Learning, Computer Vision, Natural Language Processing, Data Augmentation, Robotics, Autonomous Vehicles, Healthcare, Finance, Retail, Others |
| By Deployment Mode | On-Premises, Cloud |
| By End |
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
🔍 Dataset Note: DropletVideo-1M is the premium subset of DropletVideo-10M, filtered with aesthetic score > 4.51 and image quality score > 7.51.
✈️ Introduction
The challenge of spatiotemporal consistency has long existed in the field of video generation. We have released the open-source dataset DropletVideo-10M —the world's largest video generation dataset with spatiotemporal consistency. It… See the full description on the dataset page: https://huggingface.co/datasets/DropletX/DropletVideo-10M.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
AD Datasets. We select 6 public AD datasets by reviewing EEG-based AD detection papers published between 2018 and 2024. They are AD-Auditory, ADFSU, ADFTD, ADSZ, APAVA, and BrainLat.
Data Preprocessing Artifacts Removal. Some datasets have already undergone preprocessing steps during data collection, such as artifact removal and filtering. We perform a secondary preprocessing to align all datasets uniformly for training. All the fine-tuning datasets are guaranteed to be artifacts-free.
Channel Alignment. We align all datasets to a standard set of 19 channels, which include Fp1, Fp2, F7, F3, Fz, F4, F8, T3/T7, C3, Cz, C4, T4/T8, T5/P7, P3, Pz, P4, T6/P8, O1, and O2, based on the international 10-20 system. For datasets with fewer than 19 channels, we interpolate the missing channels using the MNE EEG processing package. For datasets with more than 19 channels, we select the 19 channels based on the channel name and discard the others. In cases where datasets use different channel montages, such as the Biosemi headcaps with 32, 64, 128 channels, we select the 19 closest channels by calculating the Euclidean distance between their 3D coordinates. The channel alignment allows us to pre-train the models on different datasets with any backbone encoder and perform unified fine-tuning on all AD datasets in one run.
Frequency Alignment. In addition to channel alignment, we resample all datasets to a uniform sampling frequency of 128Hz, which is commonly used and preserves the key frequency bands (delta δ, theta θ, alpha α, beta β, gamma γ), while also reducing noise.
Sample Segmentation. For deep learning training, we segment the EEG trials within each subject into 1-second samples, which results in 128 timestamps per sample, as the sampling frequency is aligned to 128Hz.
Frequency Filtering. We then apply frequency filtering to each sample, ranging from 0.5Hz to 45Hz, to remove frequency bands that do not correspond to brain activities.
Standard Normalization. After frequency filtering, we perform standard normalization on each sample, applied individually to each channel, to ensure that the data is centered and scaled consistently.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22111407%2F3cb85d68af733d50c47e10ebae6c955f%2FScreenshot%202025-06-12%20193619.png?generation=1749737224159760&alt=media" alt="">
Facebook
Twitter➡️ You can choose from multiple data formats, delivery frequency options, and delivery methods;
➡️ Extensive datasets with job postings data from 5 leading B2B data sources;
➡️ Jobs API designed for effortless search and enrichment (accessible using a user-friendly self-service tool);
➡️ Fresh data: daily updates, easy change tracking with dedicated data fields, and a constant flow of new data;
➡️ You get all necessary resources for evaluating our data: a free consultation, a data sample, or free credits for testing the API.
✅ For HR tech
Job posting data can provide insights into the demand for different types of jobs and skills, as well as trends in job postings over time. With access to historical data, companies can develop predictive models.
✅ For Investors
Explore expansion trends, analyze hiring practices, and predict company or industry growth rates, enabling the extraction of actionable strategic and operational insights. At a larger scale of analysis, Job Postings Data can be leveraged to forecast market trends and predict the growth of specific industries.
✅ For Lead generation
Coresignal’s Job Postings Data is ideal for lead generation and determining purchasing intent. In B2B sales, job postings can help identify the best time to approach a prospective client.
➡️ Why 400+ data-powered businesses choose Coresignal:
Facebook
TwitterThe total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the synthetic lab data generation market size reached USD 1.42 billion globally in 2024, reflecting a robust momentum in the adoption of synthetic data solutions across healthcare and life sciences. The market is anticipated to grow at a compelling CAGR of 26.7% from 2025 to 2033, with the global market expected to reach USD 13.11 billion by the end of the forecast period. This remarkable growth is primarily driven by increasing regulatory pressures on data privacy, the need for high-quality and diverse datasets for AI and machine learning applications, and the surging demand for advanced research and diagnostics in the healthcare sector. As per our latest research, the synthetic lab data generation market is rapidly transforming the landscape of healthcare research and development by providing scalable, privacy-compliant, and realistic datasets that accelerate innovation while minimizing risk.
One of the most significant growth factors propelling the synthetic lab data generation market is the intensifying focus on data privacy and security, especially in the healthcare sector. With stringent regulations such as HIPAA, GDPR, and other data protection laws being enforced globally, organizations are facing mounting challenges in accessing and sharing real patient data for research, development, and training purposes. Synthetic lab data offers a viable solution by generating artificial, yet statistically accurate, datasets that mirror real-world data without exposing sensitive patient information. This capability not only ensures compliance with regulatory frameworks but also enables seamless data sharing across organizations, research institutions, and even geographical boundaries, thereby fostering collaborative innovation and expediting the pace of scientific discovery.
Another key driver for the synthetic lab data generation market is the escalating demand for high-fidelity data to fuel artificial intelligence and machine learning models in healthcare. The accuracy and efficacy of AI-driven solutions, particularly in diagnostics, drug discovery, and personalized medicine, are heavily reliant on the availability of large, diverse, and well-annotated datasets. However, acquiring such datasets from real-world sources is often fraught with challenges related to data scarcity, imbalance, and privacy concerns. Synthetic lab data generation tools bridge this gap by creating vast volumes of tailored datasets that can be customized to represent rare diseases, specific demographics, or unique clinical scenarios. This not only enhances the robustness and generalizability of AI models but also accelerates the development and deployment of next-generation healthcare solutions.
In addition to privacy and AI enablement, the synthetic lab data generation market is benefiting from the growing emphasis on cost efficiency and operational agility in healthcare research and diagnostics. Traditional data collection methods are time-consuming, expensive, and frequently limited by logistical and ethical constraints. Synthetic data generation, on the other hand, significantly reduces the time and cost associated with data acquisition, annotation, and preprocessing. This enables pharmaceutical companies, hospitals, and research institutes to conduct large-scale studies, simulate clinical trials, and train medical professionals without the need for extensive real-world data collection. The ability to rapidly generate high-quality synthetic datasets is emerging as a strategic advantage for organizations seeking to accelerate innovation, improve patient outcomes, and stay ahead in the competitive healthcare landscape.
Regionally, North America continues to dominate the synthetic lab data generation market, accounting for the largest revenue share in 2024, followed by Europe and the Asia Pacific. The region’s leadership can be attributed to the presence of major technology vendors, advanced healthcare infrastructure, and a proactive regulatory environment that encourages the adoption of privacy-preserving technologies. Meanwhile, the Asia Pacific region is witnessing the fastest growth, driven by increasing investments in healthcare digitization, a burgeoning pharmaceutical sector, and rising awareness about data privacy. Europe remains a key market, supported by strong research funding and a robust regulatory framework. The Middle East & Africa and Latin America are also showing promising growth, albeit from a smaller base, as healthcare moderni
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The dataset is summarized as 3 files of GAFAM, FAANG and MATANA and dumped as binary format (pickle).
1.Use numpy to load each dataset
```python
import numpy as np
path = '/kaggle/input/gafamfaang-and-matana-stock-values-economics/gafam_stock.pkl'
d_GAFAM = np.load(path, allow_pickle="TRUE") d_GAFAM ```
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2F5b50a9819bf0fc91af1685e90e3411fe%2F2024-05-09%20133213.png?generation=1715229152267507&alt=media" alt="">
2.Get DataFrame by company names of keys
# Key is conpany name
print(d_GAFAM.keys())
Google = list(d_GAFAM.keys())[0]
# Value is stock prices
print(Google)
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2F768006a23ffd55895d2d08797d6a9f2b%2F2024-05-09%20133720.png?generation=1715229482741776&alt=media" alt="">
d_GAFAM[Google]
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2Fe9b33fba64671438c76d8b59796088f4%2F2024-05-09%20133741.png?generation=1715229498487856&alt=media" alt="">
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files contain the analysis files used to create the tables and figures found in "The Role of Immigrant Generation and Mentors in Educational Attainment."The abstract for the paper is found below: Social capital, including engagement with mentors, facilitates educational attainment. However, engagement with mentors differs significantly across groups of adolescents with different backgrounds, including immigrant background. We investigate how immigrant generation predicts adolescents’ engagement with mentors and different types of mentors (i.e., school-based and non-school-based), the association of mentors with educational attainment, and these estimates’ heterogeneity based on the immigrant generation. We analyzed nationally representative Add Health data from N=11,242 adolescents using school-fixed effects linear probability models. Results show that adolescents from immigrant generations 1 and 2 were less likely than those from generation 3+ to have a mentor, but there were no significant differences in engaging with school-based mentors. Mentors predicted educational attainment; school-based mentor effects were larger than non-school-based mentor effects. The associations between mentors and college attendance and graduation were largest for 1st-generation immigrants. Our findings indicate the importance of structures supporting relationship-building and mentorship in schools and wider communities.
Facebook
TwitterT1DiabetesGranada
A longitudinal multi-modal dataset of type 1 diabetes mellitus
Documented by:
Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4
Background
Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.
Data Records
The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.
Patient_info.csv
Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Sex – Sex of the patient. Values: F (for female), masculine (for male)
Birth_year – Year of birth of the patient. Format: YYYY.
Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.
Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.
Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.
Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.
Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.
Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.
Glucose_measurements.csv
Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.
Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.
Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.
Biochemical_parameters.csv
Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.
Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.
Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.
Diagnostics.csv
Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:
Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.
Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).
Technical Validation
Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.
Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.
Usage Notes
For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.
The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.
Graphs_and_stats.ipynb
The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.
Code Availability
The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.
Original_patient_info_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.
Glucose_measurements_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.
Biochemical_parameters_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.
Diagnostic_curation.ipynb
In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.
Get_patient_info_variables.ipynb
In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.
Data Usage Agreement
The conditions for use are as follows:
You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.
You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.
You will require
Facebook
TwitterMTG is a multilingual multiway text generation benchmark suite. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Abstract The inequalities in the labor market among Chilean social workers were examined, analyzing whether they differ from the trends observed in other professions. Two samples of the National Socioeconomic Characterization Survey (CASEN) database from the Ministry of Social Development of Chile (2015) were used. The contingency coefficient determined the intensity of the association between economic income and contractual condition, in relation to the variables gender, generation and ethnicity. The results indicated that the proportion of social workers in the tenth part of the population with largest national income varies according to generation and ethnic group, and the proportion of those having permanent work varies according to gender and generation. In both cases, generation has the strongest association, observing more pronounced inequalities among social workers than among other professionals. There is a debate about the reproduction of inequalities in social work - associated with neoliberalism - and the ethical-political challenges that this implies.
Facebook
TwitterMillennials were the largest generation group in the United States in 2024, with an estimated population of ***** million. Born between 1981 and 1996, Millennials recently surpassed Baby Boomers as the biggest group, and they will continue to be a major part of the population for many years. The rise of Generation Alpha Generation Alpha is the most recent to have been named, and many group members will not be able to remember a time before smartphones and social media. As of 2024, the oldest Generation Alpha members were still only aging into adolescents. However, the group already makes up around ***** percent of the U.S. population, and they are said to be the most racially and ethnically diverse of all the generation groups. Boomers vs. Millennials The number of Baby Boomers, whose generation was defined by the boom in births following the Second World War, has fallen by around ***** million since 2010. However, they remain the second-largest generation group, and aging Boomers are contributing to steady increases in the median age of the population. Meanwhile, the Millennial generation continues to grow, and one reason for this is the increasing number of young immigrants arriving in the United States.