100+ datasets found
  1. U.S. population by generation 2024

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). U.S. population by generation 2024 [Dataset]. https://www.statista.com/statistics/797321/us-population-by-generation/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    Millennials were the largest generation group in the United States in 2024, with an estimated population of ***** million. Born between 1981 and 1996, Millennials recently surpassed Baby Boomers as the biggest group, and they will continue to be a major part of the population for many years. The rise of Generation Alpha Generation Alpha is the most recent to have been named, and many group members will not be able to remember a time before smartphones and social media. As of 2024, the oldest Generation Alpha members were still only aging into adolescents. However, the group already makes up around ***** percent of the U.S. population, and they are said to be the most racially and ethnically diverse of all the generation groups. Boomers vs. Millennials The number of Baby Boomers, whose generation was defined by the boom in births following the Second World War, has fallen by around ***** million since 2010. However, they remain the second-largest generation group, and aging Boomers are contributing to steady increases in the median age of the population. Meanwhile, the Millennial generation continues to grow, and one reason for this is the increasing number of young immigrants arriving in the United States.

  2. U.S. population share by generation 2024

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, U.S. population share by generation 2024 [Dataset]. https://www.statista.com/statistics/296974/us-population-share-by-generation/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, Millennials were the largest generation group in the United States, making up about 21.81 percent of the population. However, Generation Z was not far behind, with Gen Z accounting for around 20.81 percent of the population in that year.

  3. Population of the UK 1990-2024, by generation

    • statista.com
    Updated Nov 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Population of the UK 1990-2024, by generation [Dataset]. https://www.statista.com/statistics/528577/uk-population-by-generation/
    Explore at:
    Dataset updated
    Nov 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United Kingdom
    Description

    In 2024, there were approximately ** million millennials in the United Kingdom, making it the largest generational cohort at that time. Millennials surpassed the Baby Boomer generation as the largest generation for the first time in 2019. The two youngest generations, Gen Z and Gen Alpha, numbered approximately **** million, and ****million respectively. Gen X are, as of the most recent year, the second-largest generation in the UK at ** million people. The population born before the end of the Second World War in mid-1945 was just over **** million in this year. Post-War Baby Boom The baby boomer generation was the largest generation for much of this period due to the spike in births that happened after the Second World War. In 1947, for example, there were over *** million live births in the United Kingdom, compared with just ******* live births just thirty years later in 1977. Members of this generation are typically the parents of millennials, and were the driving force behind the countercultural movement of the 1960s, due to their large numbers relative to older generations at the time. The next generational cohort after Boomers are Generation X, born between 1965 and 1980. This generation had fewer members than the Boomer generation for most of its existence, and only became larger than it in 2021. Millennials and Gen Z As of 2024, the most common single year of age in the United Kingdom was 33, with approximately ******* people this age. Furthermore, people aged between 30 and 34 were the most numerous age group in this year, at almost *** million people. As of 2024, people in this age group were Millennials, the large generation who came of age in the late 1990s and early 2000s. Many members of this generation entered the workforce following the 2008 financial crash, and suffered through high levels of unemployment during the early 2010s. The generation that followed Millennials, Generation Z, have also experienced tough socio-economic conditions recently, with key formative years dominated by the COVID-19 pandemic, climate change, and an increasingly unstable geopolitical situation.

  4. h

    top-text-generation-models

    • huggingface.co
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Optimum-Benchmark (2025). top-text-generation-models [Dataset]. https://huggingface.co/datasets/optimum-benchmark/top-text-generation-models
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2025
    Dataset authored and provided by
    Optimum-Benchmark
    Description

    optimum-benchmark/top-text-generation-models dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Nuclear Power Generation By Countries

    • kaggle.com
    zip
    Updated Aug 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tariq Mahmood (2023). Nuclear Power Generation By Countries [Dataset]. https://www.kaggle.com/datasets/tariqbashir/nuclear-power-generation-by-countries
    Explore at:
    zip(1225 bytes)Available download formats
    Dataset updated
    Aug 15, 2023
    Authors
    Tariq Mahmood
    Description

    Dataset consists upon the nuclear power generation percentage by countries. Thought it seems short but contains interesting information about countries in nuclear club. Specially about France which has the largest share in her power generation capacity by nuclear reactors and one of the largest electricity in Europe. Though France is planning to reduce its dependency over nuclear power.

  6. h

    VAP-Data

    • huggingface.co
    Updated Oct 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuxuan BIAN (2025). VAP-Data [Dataset]. https://huggingface.co/datasets/BianYx/VAP-Data
    Explore at:
    Dataset updated
    Oct 27, 2025
    Authors
    Yuxuan BIAN
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Video-As-Prompt: Unified Semantic Control for Video Generation

      🔥 News
    

    Oct 24, 2025: 📖 We release the first unified semantic video generation model, Video-As-Prompt (VAP)! Oct 24, 2025: 🤗 We release the VAP-Data, the largest semantic-controlled video generation datasets with more than $100K$ samples! Oct 24, 2025: 👋 We present the technical report of Video-As-Prompt, please check out the details and spark some discussion!… See the full description on the dataset page: https://huggingface.co/datasets/BianYx/VAP-Data.

  7. k

    Top 20 Countries Wind Power Generation Capacity

    • datasource.kapsarc.org
    • data.kapsarc.org
    • +1more
    Updated Dec 26, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Top 20 Countries Wind Power Generation Capacity [Dataset]. https://datasource.kapsarc.org/explore/dataset/top-20-countries-wind-power-generation-capacity/
    Explore at:
    Dataset updated
    Dec 26, 2017
    Description

    Source: BP, World Energy Statistics 2017, June 2017.

  8. Top Global News Headlines

    • kaggle.com
    zip
    Updated Mar 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayam Kumar (2024). Top Global News Headlines [Dataset]. https://www.kaggle.com/datasets/sayamkumar/top-global-news-headlines
    Explore at:
    zip(482669821 bytes)Available download formats
    Dataset updated
    Mar 17, 2024
    Authors
    Sayam Kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description:

    The "Global News Articles" dataset was acquired through the NewsAPI, a powerful tool that provides access to a vast collection of news articles from various sources around the world. The dataset contains a curated selection of news articles covering a wide range of topics, including politics, business, technology, health, and more.

    Context:

    In today's fast-paced world, staying informed about global events is essential. This dataset aims to provide researchers, journalists, and analysts with a comprehensive source of news articles for analysis and insight generation. By leveraging the NewsAPI, we have gathered a diverse set of articles to facilitate research, trend analysis, sentiment analysis, and other data-driven tasks.

    Inspiration:

    The inspiration behind creating this dataset stems from the growing need for reliable and easily accessible news data for analytical purposes. With the proliferation of digital media and the abundance of news sources available online, there is a wealth of information waiting to be tapped into. This dataset serves as a valuable resource for anyone interested in studying trends, patterns, and developments in the global news landscape.

    Sources:

    The primary source of the data is the NewsAPI, which aggregates news articles from thousands of sources worldwide. The dataset includes articles from reputable news outlets, blogs, and online publications. Only the title, content, and headlines features have been extracted from the articles to provide concise yet informative data for analysis.

    Acquisition of Data through NewsAPI:

    • Accessing NewsAPI: The data acquisition process begins with accessing the NewsAPI platform, which provides developers with access to real-time and historical news articles.
    • Querying for Articles: Using predefined search criteria or custom queries, we retrieve a diverse set of news articles covering various topics and regions.
    • Extracting Features: From each article, we extract essential features such as the title, content, and headlines to capture the main points and key information.
    • Compilation of Dataset: The data is compiled into a structured dataset, ready for exploration and analysis by researchers and analysts.

    By leveraging the capabilities of NewsAPI, we have curated a valuable dataset that provides insights into global news trends, enabling informed decision-making and analysis in diverse fields.

  9. d

    Coresignal | Employee Data | From the Largest Professional Network | Global...

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coresignal, Coresignal | Employee Data | From the Largest Professional Network | Global / 712M+ Records / 5 Years of Historical Data / Updated Daily [Dataset]. https://datarade.ai/data-products/public-resume-data-coresignal
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    Coresignal
    Area covered
    Christmas Island, Palestine, Réunion, Russian Federation, French Guiana, Latvia, Eritrea, Bosnia and Herzegovina, Brunei Darussalam, Macao
    Description

    ➡️ You can choose from multiple data formats, delivery frequency options, and delivery methods;

    ➡️ You can select raw or clean and AI-enriched datasets;

    ➡️ Multiple APIs designed for effortless search and enrichment (accessible using a user-friendly self-service tool);

    ➡️ Fresh data: daily updates, easy change tracking with dedicated data fields, and a constant flow of new data;

    ➡️ You get all necessary resources for evaluating our data: a free consultation, a data sample, or free credits for testing our APIs.

    Coresignal's employee data enables you to create and improve innovative data-driven solutions and extract actionable business insights. These datasets are popular among companies from different industries, including HR and sales technology and investment.

    Employee Data use cases:

    ✅ Source best-fit talent for your recruitment needs

    Coresignal's Employee Data can help source the best-fit talent for your recruitment needs by providing the most up-to-date information on qualified candidates globally.

    ✅ Fuel your lead generation pipeline

    Enhance lead generation with 712M+ up-to-date employee records from the largest professional network. Our Employee Data can help you develop a qualified list of potential clients and enrich your own database.

    ✅ Analyze talent for investment opportunities

    Employee Data can help you generate actionable signals and identify new investment opportunities earlier than competitors or perform deeper analysis of companies you're interested in.

    ➡️ Why 400+ data-powered businesses choose Coresignal:

    1. Experienced data provider (in the market since 2016);
    2. Exceptional client service;
    3. Responsible and secure data collection.
  10. R

    Synthetic Dataset Generation Market Research Report 2033

    • researchintelo.com
    csv, pdf, pptx
    Updated Oct 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Intelo (2025). Synthetic Dataset Generation Market Research Report 2033 [Dataset]. https://researchintelo.com/report/synthetic-dataset-generation-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Oct 2, 2025
    Dataset authored and provided by
    Research Intelo
    License

    https://researchintelo.com/privacy-and-policyhttps://researchintelo.com/privacy-and-policy

    Time period covered
    2024 - 2033
    Area covered
    Global
    Description

    Synthetic Dataset Generation Market Outlook



    According to our latest research, the Synthetic Dataset Generation market size was valued at $1.2 billion in 2024 and is projected to reach $8.7 billion by 2033, expanding at an impressive CAGR of 24.6% during 2024–2033. The primary driving force behind this global expansion is the escalating demand for high-quality, diverse, and bias-free datasets to fuel advanced artificial intelligence (AI) and machine learning (ML) models. As organizations across industries face increasing challenges in acquiring large-scale, annotated, and privacy-compliant real-world data, synthetic dataset generation has emerged as a transformative solution. This technology not only accelerates the development and deployment of AI systems but also addresses critical data privacy, security, and cost constraints, making it indispensable in today’s data-centric economy.



    Regional Outlook



    North America currently holds the largest share of the global synthetic dataset generation market, accounting for over 38% of the total market value in 2024. The region’s dominance is primarily attributed to its mature technology ecosystem, robust investment in AI research, and the early adoption of synthetic data solutions by leading enterprises and tech giants. The presence of major synthetic data vendors, a strong network of academic research institutions, and proactive regulatory guidance on data privacy have collectively accelerated market growth in North America. Furthermore, favorable government policies and funding initiatives aimed at advancing AI innovation continue to foster a thriving environment for synthetic dataset generation, particularly in sectors such as healthcare, finance, and autonomous vehicles.



    Asia Pacific is the fastest-growing region in the synthetic dataset generation market, projected to register a remarkable CAGR of 29.3% from 2024 to 2033. This exceptional growth is driven by increasing digital transformation initiatives, rapid adoption of AI-powered solutions, and significant investments by both public and private sectors. Countries like China, Japan, South Korea, and India are aggressively expanding their AI capabilities, leading to a surge in demand for synthetic data to support machine learning and computer vision applications. The region is witnessing heightened interest from global technology vendors, who are establishing partnerships and R&D centers to tap into the burgeoning opportunities. The proliferation of smart devices, e-commerce, and fintech innovations further amplifies the need for scalable and secure synthetic datasets.



    Emerging economies in Latin America, the Middle East, and Africa are gradually embracing synthetic dataset generation, though adoption remains at an early stage due to infrastructural and regulatory challenges. Localized demand is primarily concentrated in industries such as government, BFSI, and telecommunications, where data privacy and localization policies are stringent. While these regions hold significant potential for future growth, market expansion is currently restrained by limited technical expertise, slower digital infrastructure development, and the need for tailored synthetic data solutions that address unique regional requirements. Nonetheless, increasing awareness, pilot projects, and supportive policy reforms are expected to accelerate adoption in the coming years.



    Report Scope





    Attributes Details
    Report Title Synthetic Dataset Generation Market Research Report 2033
    By Component Software, Services
    By Data Type Text, Image, Video, Audio, Tabular, Others
    By Application Machine Learning, Computer Vision, Natural Language Processing, Data Augmentation, Robotics, Autonomous Vehicles, Healthcare, Finance, Retail, Others
    By Deployment Mode On-Premises, Cloud
    By End

  11. h

    DropletVideo-10M

    • huggingface.co
    Updated Mar 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IEIT-AGI (2025). DropletVideo-10M [Dataset]. https://huggingface.co/datasets/DropletX/DropletVideo-10M
    Explore at:
    Dataset updated
    Mar 11, 2025
    Authors
    IEIT-AGI
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🔍 Dataset Note: DropletVideo-1M is the premium subset of DropletVideo-10M, filtered with aesthetic score > 4.51 and image quality score > 7.51.

      ✈️ Introduction
    

    The challenge of spatiotemporal consistency has long existed in the field of video generation. We have released the open-source dataset DropletVideo-10M —the world's largest video generation dataset with spatiotemporal consistency. It… See the full description on the dataset page: https://huggingface.co/datasets/DropletX/DropletVideo-10M.

  12. Largest Alzheimer EEG dataset

    • kaggle.com
    zip
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACE (2025). Largest Alzheimer EEG dataset [Dataset]. https://www.kaggle.com/datasets/codingyodha/largest-alzheimer-eeg-dataset
    Explore at:
    zip(1474600102 bytes)Available download formats
    Dataset updated
    Jun 12, 2025
    Authors
    ACE
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    AD Datasets. We select 6 public AD datasets by reviewing EEG-based AD detection papers published between 2018 and 2024. They are AD-Auditory, ADFSU, ADFTD, ADSZ, APAVA, and BrainLat.

    Data Preprocessing Artifacts Removal. Some datasets have already undergone preprocessing steps during data collection, such as artifact removal and filtering. We perform a secondary preprocessing to align all datasets uniformly for training. All the fine-tuning datasets are guaranteed to be artifacts-free.

    Channel Alignment. We align all datasets to a standard set of 19 channels, which include Fp1, Fp2, F7, F3, Fz, F4, F8, T3/T7, C3, Cz, C4, T4/T8, T5/P7, P3, Pz, P4, T6/P8, O1, and O2, based on the international 10-20 system. For datasets with fewer than 19 channels, we interpolate the missing channels using the MNE EEG processing package. For datasets with more than 19 channels, we select the 19 channels based on the channel name and discard the others. In cases where datasets use different channel montages, such as the Biosemi headcaps with 32, 64, 128 channels, we select the 19 closest channels by calculating the Euclidean distance between their 3D coordinates. The channel alignment allows us to pre-train the models on different datasets with any backbone encoder and perform unified fine-tuning on all AD datasets in one run.

    Frequency Alignment. In addition to channel alignment, we resample all datasets to a uniform sampling frequency of 128Hz, which is commonly used and preserves the key frequency bands (delta δ, theta θ, alpha α, beta β, gamma γ), while also reducing noise.

    Sample Segmentation. For deep learning training, we segment the EEG trials within each subject into 1-second samples, which results in 128 timestamps per sample, as the sampling frequency is aligned to 128Hz.

    Frequency Filtering. We then apply frequency filtering to each sample, ranging from 0.5Hz to 45Hz, to remove frequency bands that do not correspond to brain activities.

    Standard Normalization. After frequency filtering, we perform standard normalization on each sample, applied individually to each channel, to ensure that the data is centered and scaled consistently. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22111407%2F3cb85d68af733d50c47e10ebae6c955f%2FScreenshot%202025-06-12%20193619.png?generation=1749737224159760&alt=media" alt="">

  13. d

    Coresignal | Job Postings Data | Largest Professional Network + Indeed Jobs...

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Coresignal, Coresignal | Job Postings Data | Largest Professional Network + Indeed Jobs + 3 Other Sources | Global / 437M+ Records / Updated Monthly [Dataset]. https://datarade.ai/data-products/job-postings-data-coresignal
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    Coresignal
    Area covered
    Germany, Cook Islands, Austria, Qatar, Iraq, Vietnam, Barbados, Malawi, Iceland, Angola
    Description

    ➡️ You can choose from multiple data formats, delivery frequency options, and delivery methods;

    ➡️ Extensive datasets with job postings data from 5 leading B2B data sources;

    ➡️ Jobs API designed for effortless search and enrichment (accessible using a user-friendly self-service tool);

    ➡️ Fresh data: daily updates, easy change tracking with dedicated data fields, and a constant flow of new data;

    ➡️ You get all necessary resources for evaluating our data: a free consultation, a data sample, or free credits for testing the API.

    ✅ For HR tech

    Job posting data can provide insights into the demand for different types of jobs and skills, as well as trends in job postings over time. With access to historical data, companies can develop predictive models.

    ✅ For Investors

    Explore expansion trends, analyze hiring practices, and predict company or industry growth rates, enabling the extraction of actionable strategic and operational insights. At a larger scale of analysis, Job Postings Data can be leveraged to forecast market trends and predict the growth of specific industries.

    ✅ For Lead generation

    Coresignal’s Job Postings Data is ideal for lead generation and determining purchasing intent. In B2B sales, job postings can help identify the best time to approach a prospective client.

    ➡️ Why 400+ data-powered businesses choose Coresignal:

    1. Experienced data provider (in the market since 2016);
    2. Exceptional client service;
    3. Responsible and secure data collection.
  14. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  15. D

    Synthetic Lab Data Generation Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Synthetic Lab Data Generation Market Research Report 2033 [Dataset]. https://dataintelo.com/report/synthetic-lab-data-generation-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Synthetic Lab Data Generation Market Outlook



    According to our latest research, the synthetic lab data generation market size reached USD 1.42 billion globally in 2024, reflecting a robust momentum in the adoption of synthetic data solutions across healthcare and life sciences. The market is anticipated to grow at a compelling CAGR of 26.7% from 2025 to 2033, with the global market expected to reach USD 13.11 billion by the end of the forecast period. This remarkable growth is primarily driven by increasing regulatory pressures on data privacy, the need for high-quality and diverse datasets for AI and machine learning applications, and the surging demand for advanced research and diagnostics in the healthcare sector. As per our latest research, the synthetic lab data generation market is rapidly transforming the landscape of healthcare research and development by providing scalable, privacy-compliant, and realistic datasets that accelerate innovation while minimizing risk.




    One of the most significant growth factors propelling the synthetic lab data generation market is the intensifying focus on data privacy and security, especially in the healthcare sector. With stringent regulations such as HIPAA, GDPR, and other data protection laws being enforced globally, organizations are facing mounting challenges in accessing and sharing real patient data for research, development, and training purposes. Synthetic lab data offers a viable solution by generating artificial, yet statistically accurate, datasets that mirror real-world data without exposing sensitive patient information. This capability not only ensures compliance with regulatory frameworks but also enables seamless data sharing across organizations, research institutions, and even geographical boundaries, thereby fostering collaborative innovation and expediting the pace of scientific discovery.




    Another key driver for the synthetic lab data generation market is the escalating demand for high-fidelity data to fuel artificial intelligence and machine learning models in healthcare. The accuracy and efficacy of AI-driven solutions, particularly in diagnostics, drug discovery, and personalized medicine, are heavily reliant on the availability of large, diverse, and well-annotated datasets. However, acquiring such datasets from real-world sources is often fraught with challenges related to data scarcity, imbalance, and privacy concerns. Synthetic lab data generation tools bridge this gap by creating vast volumes of tailored datasets that can be customized to represent rare diseases, specific demographics, or unique clinical scenarios. This not only enhances the robustness and generalizability of AI models but also accelerates the development and deployment of next-generation healthcare solutions.




    In addition to privacy and AI enablement, the synthetic lab data generation market is benefiting from the growing emphasis on cost efficiency and operational agility in healthcare research and diagnostics. Traditional data collection methods are time-consuming, expensive, and frequently limited by logistical and ethical constraints. Synthetic data generation, on the other hand, significantly reduces the time and cost associated with data acquisition, annotation, and preprocessing. This enables pharmaceutical companies, hospitals, and research institutes to conduct large-scale studies, simulate clinical trials, and train medical professionals without the need for extensive real-world data collection. The ability to rapidly generate high-quality synthetic datasets is emerging as a strategic advantage for organizations seeking to accelerate innovation, improve patient outcomes, and stay ahead in the competitive healthcare landscape.




    Regionally, North America continues to dominate the synthetic lab data generation market, accounting for the largest revenue share in 2024, followed by Europe and the Asia Pacific. The region’s leadership can be attributed to the presence of major technology vendors, advanced healthcare infrastructure, and a proactive regulatory environment that encourages the adoption of privacy-preserving technologies. Meanwhile, the Asia Pacific region is witnessing the fastest growth, driven by increasing investments in healthcare digitization, a burgeoning pharmaceutical sector, and rising awareness about data privacy. Europe remains a key market, supported by strong research funding and a robust regulatory framework. The Middle East & Africa and Latin America are also showing promising growth, albeit from a smaller base, as healthcare moderni

  16. GAFAM,FAANG and MATANA Stock Values - 💰Economics

    • kaggle.com
    zip
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    waticson (2024). GAFAM,FAANG and MATANA Stock Values - 💰Economics [Dataset]. https://www.kaggle.com/datasets/yutodennou/gafamfaang-and-matana-stock-values-economics
    Explore at:
    zip(3245504 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    waticson
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Stock data of huge IT companies.

    The dataset is summarized as 3 files of GAFAM, FAANG and MATANA and dumped as binary format (pickle).

    I introduce how to use dataset below:

    1.Use numpy to load each dataset ```python import numpy as np

    Any file names

    path = '/kaggle/input/gafamfaang-and-matana-stock-values-economics/gafam_stock.pkl'

    Use numpy to load dictionary format

    d_GAFAM = np.load(path, allow_pickle="TRUE") d_GAFAM ```

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2F5b50a9819bf0fc91af1685e90e3411fe%2F2024-05-09%20133213.png?generation=1715229152267507&alt=media" alt="">

    2.Get DataFrame by company names of keys

    # Key is conpany name
    print(d_GAFAM.keys())
    Google = list(d_GAFAM.keys())[0]
    
    # Value is stock prices
    print(Google)
    

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2F768006a23ffd55895d2d08797d6a9f2b%2F2024-05-09%20133720.png?generation=1715229482741776&alt=media" alt="">

    d_GAFAM[Google]
    

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2993575%2Fe9b33fba64671438c76d8b59796088f4%2F2024-05-09%20133741.png?generation=1715229498487856&alt=media" alt="">

    😊Enjoy analyzing time series data!!!

  17. o

    Data from: The Role of Immigrant Generation and Mentors in Educational...

    • openicpsr.org
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anita Caduff; Nabamallika Dehingia; Anita Raj (2023). The Role of Immigrant Generation and Mentors in Educational Attainment [Dataset]. http://doi.org/10.3886/E192406V1
    Explore at:
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    University of California San Diego
    Authors
    Anita Caduff; Nabamallika Dehingia; Anita Raj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These files contain the analysis files used to create the tables and figures found in "The Role of Immigrant Generation and Mentors in Educational Attainment."The abstract for the paper is found below: Social capital, including engagement with mentors, facilitates educational attainment. However, engagement with mentors differs significantly across groups of adolescents with different backgrounds, including immigrant background. We investigate how immigrant generation predicts adolescents’ engagement with mentors and different types of mentors (i.e., school-based and non-school-based), the association of mentors with educational attainment, and these estimates’ heterogeneity based on the immigrant generation. We analyzed nationally representative Add Health data from N=11,242 adolescents using school-fixed effects linear probability models. Results show that adolescents from immigrant generations 1 and 2 were less likely than those from generation 3+ to have a mentor, but there were no significant differences in engaging with school-based mentors. Mentors predicted educational attainment; school-based mentor effects were larger than non-school-based mentor effects. The associations between mentors and college attendance and graduation were largest for 1st-generation immigrants. Our findings indicate the importance of structures supporting relationship-building and mentorship in schools and wider communities.

  18. u

    Data from: T1DiabetesGranada: a longitudinal multi-modal dataset of type 1...

    • produccioncientifica.ugr.es
    • data.niaid.nih.gov
    Updated 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc429b9e7c03b01bd53b7
    Explore at:
    Dataset updated
    2023
    Authors
    Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel; Rodriguez-Leon, Ciro; Aviles Perez, Maria Dolores; Banos, Oresti; Quesada-Charneco, Miguel; Lopez-Ibarra, Pablo J; Villalonga, Claudia; Munoz-Torres, Manuel
    Description

    T1DiabetesGranada

    A longitudinal multi-modal dataset of type 1 diabetes mellitus

    Documented by:

    Rodriguez-Leon, C., Aviles-Perez, M. D., Banos, O., Quesada-Charneco, M., Lopez-Ibarra, P. J., Villalonga, C., & Munoz-Torres, M. (2023). T1DiabetesGranada: a longitudinal multi-modal dataset of type 1 diabetes mellitus. Scientific Data, 10(1), 916. https://doi.org/10.1038/s41597-023-02737-4

    Background

    Type 1 diabetes mellitus (T1D) patients face daily difficulties in keeping their blood glucose levels within appropriate ranges. Several techniques and devices, such as flash glucose meters, have been developed to help T1D patients improve their quality of life. Most recently, the data collected via these devices is being used to train advanced artificial intelligence models to characterize the evolution of the disease and support its management. The main problem for the generation of these models is the scarcity of data, as most published works use private or artificially generated datasets. For this reason, this work presents T1DiabetesGranada, a open under specific permission longitudinal dataset that not only provides continuous glucose levels, but also patient demographic and clinical information. The dataset includes 257780 days of measurements over four years from 736 T1D patients from the province of Granada, Spain. This dataset progresses significantly beyond the state of the art as one the longest and largest open datasets of continuous glucose measurements, thus boosting the development of new artificial intelligence models for glucose level characterization and prediction.

    Data Records

    The data are stored in four comma-separated values (CSV) files which are available in T1DiabetesGranada.zip. These files are described in detail below.

    Patient_info.csv

    Patient_info.csv is the file containing information about the patients, such as demographic data, start and end dates of blood glucose level measurements and biochemical parameters, number of biochemical parameters or number of diagnostics. This file is composed of 736 records, one for each patient in the dataset, and includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Sex – Sex of the patient. Values: F (for female), masculine (for male)

    Birth_year – Year of birth of the patient. Format: YYYY.

    Initial_measurement_date – Date of the first blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

    Final_measurement_date – Date of the last blood glucose level measurement of the patient in the Glucose_measurements.csv file. Format: YYYY-MM-DD.

    Number_of_days_with_measures – Number of days with blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 8 to 1463.

    Number_of_measurements – Number of blood glucose level measurements of the patient, extracted from the Glucose_measurements.csv file. Values: ranging from 400 to 137292.

    Initial_biochemical_parameters_date – Date of the first biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

    Final_biochemical_parameters_date – Date of the last biochemical test to measure some biochemical parameter of the patient, extracted from the Biochemical_parameters.csv file. Format: YYYY-MM-DD.

    Number_of_biochemical_parameters – Number of biochemical parameters measured on the patient, extracted from the Biochemical_parameters.csv file. Values: ranging from 4 to 846.

    Number_of_diagnostics – Number of diagnoses realized to the patient, extracted from the Diagnostics.csv file. Values: ranging from 1 to 24.

    Glucose_measurements.csv

    Glucose_measurements.csv is the file containing the continuous blood glucose level measurements of the patients. The file is composed of more than 22.6 million records that constitute the time series of continuous blood glucose level measurements. It includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Measurement_date – Date of the blood glucose level measurement. Format: YYYY-MM-DD.

    Measurement_time – Time of the blood glucose level measurement. Format: HH:MM:SS.

    Measurement – Value of the blood glucose level measurement in mg/dL. Values: ranging from 40 to 500.

    Biochemical_parameters.csv

    Biochemical_parameters.csv is the file containing data of the biochemical tests performed on patients to measure their biochemical parameters. This file is composed of 87482 records and includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Reception_date – Date of receipt in the laboratory of the sample to measure the biochemical parameter. Format: YYYY-MM-DD.

    Name – Name of the measured biochemical parameter. Values: 'Potassium', 'HDL cholesterol', 'Gammaglutamyl Transferase (GGT)', 'Creatinine', 'Glucose', 'Uric acid', 'Triglycerides', 'Alanine transaminase (GPT)', 'Chlorine', 'Thyrotropin (TSH)', 'Sodium', 'Glycated hemoglobin (Ac)', 'Total cholesterol', 'Albumin (urine)', 'Creatinine (urine)', 'Insulin', 'IA ANTIBODIES'.

    Value – Value of the biochemical parameter. Values: ranging from -4.0 to 6446.74.

    Diagnostics.csv

    Diagnostics.csv is the file containing diagnoses of diabetes mellitus complications or other diseases that patients have in addition to type 1 diabetes mellitus. This file is composed of 1757 records and includes the following variables:

    Patient_ID – Unique identifier of the patient. Format: LIB19XXXX.

    Code – ICD-9-CM diagnosis code. Values: subset of 594 of the ICD-9-CM codes (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

    Description – ICD-9-CM long description. Values: subset of 594 of the ICD-9-CM long description (https://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/codes).

    Technical Validation

    Blood glucose level measurements are collected using FreeStyle Libre devices, which are widely used for healthcare in patients with T1D. Abbott Diabetes Care, Inc., Alameda, CA, USA, the manufacturer company, has conducted validation studies of these devices concluding that the measurements made by their sensors compare to YSI analyzer devices (Xylem Inc.), the gold standard, yielding results of 99.9% of the time within zones A and B of the consensus error grid. In addition, other studies external to the company concluded that the accuracy of the measurements is adequate.

    Moreover, it was also checked in most cases the blood glucose level measurements per patient were continuous (i.e. a sample at least every 15 minutes) in the Glucose_measurements.csv file as they should be.

    Usage Notes

    For data downloading, it is necessary to be authenticated on the Zenodo platform, accept the Data Usage Agreement and send a request specifying full name, email, and the justification of the data use. This request will be processed by the Secretary of the Department of Computer Engineering, Automatics, and Robotics of the University of Granada and access to the dataset will be granted.

    The files that compose the dataset are CSV type files delimited by commas and are available in T1DiabetesGranada.zip. A Jupyter Notebook (Python v. 3.8) with code that may help to a better understanding of the dataset, with graphics and statistics, is available in UsageNotes.zip.

    Graphs_and_stats.ipynb

    The Jupyter Notebook generates tables, graphs and statistics for a better understanding of the dataset. It has four main sections, one dedicated to each file in the dataset. In addition, it has useful functions such as calculating the patient age, deleting a patient list from a dataset file and leaving only a patient list in a dataset file.

    Code Availability

    The dataset was generated using some custom code located in CodeAvailability.zip. The code is provided as Jupyter Notebooks created with Python v. 3.8. The code was used to conduct tasks such as data curation and transformation, and variables extraction.

    Original_patient_info_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with patient data. Mainly irrelevant rows and columns are removed, and the sex variable is recoded.

    Glucose_measurements_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with the continuous glucose level measurements of the patients. Principally rows without information or duplicated rows are removed and the variable with the timestamp is transformed into two new variables, measurement date and measurement time.

    Biochemical_parameters_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with patient data of the biochemical tests performed on patients to measure their biochemical parameters. Mainly irrelevant rows and columns are removed and the variable with the name of the measured biochemical parameter is translated.

    Diagnostic_curation.ipynb

    In the Jupyter Notebook is preprocessed the original file with patient data of the diagnoses of diabetes mellitus complications or other diseases that patients have in addition to T1D.

    Get_patient_info_variables.ipynb

    In the Jupyter Notebook it is coded the feature extraction process from the files Glucose_measurements.csv, Biochemical_parameters.csv and Diagnostics.csv to complete the file Patient_info.csv. It is divided into six sections, the first three to extract the features from each of the mentioned files and the next three to add the extracted features to the resulting new file.

    Data Usage Agreement

    The conditions for use are as follows:

    You confirm that you will not attempt to re-identify research participants for any reason, including for re-identification theory research.

    You commit to keeping the T1DiabetesGranada dataset confidential and secure and will not redistribute data or Zenodo account credentials.

    You will require

  19. t

    MTG: A Benchmark Suite for Multilingual Text Generation

    • service.tib.eu
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MTG: A Benchmark Suite for Multilingual Text Generation [Dataset]. https://service.tib.eu/ldmservice/dataset/mtg--a-benchmark-suite-for-multilingual-text-generation
    Explore at:
    Dataset updated
    Dec 17, 2024
    Description

    MTG is a multilingual multiway text generation benchmark suite. It is the first-proposed multilingual multiway text generation dataset with the largest human-annotated data (400k). It includes four generation tasks (story generation, question generation, title generation and text summarization) across five languages (English, German, French, Spanish and Chinese).

  20. Data from: Generational, gender and ethnic inequalities among Chilean social...

    • scielo.figshare.com
    jpeg
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felipe Saravia; Juan Saavedra (2023). Generational, gender and ethnic inequalities among Chilean social workers [Dataset]. http://doi.org/10.6084/m9.figshare.20006665.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Felipe Saravia; Juan Saavedra
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Abstract The inequalities in the labor market among Chilean social workers were examined, analyzing whether they differ from the trends observed in other professions. Two samples of the National Socioeconomic Characterization Survey (CASEN) database from the Ministry of Social Development of Chile (2015) were used. The contingency coefficient determined the intensity of the association between economic income and contractual condition, in relation to the variables gender, generation and ethnicity. The results indicated that the proportion of social workers in the tenth part of the population with largest national income varies according to generation and ethnic group, and the proportion of those having permanent work varies according to gender and generation. In both cases, generation has the strongest association, observing more pronounced inequalities among social workers than among other professionals. There is a debate about the reproduction of inequalities in social work - associated with neoliberalism - and the ethical-political challenges that this implies.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). U.S. population by generation 2024 [Dataset]. https://www.statista.com/statistics/797321/us-population-by-generation/
Organization logo

U.S. population by generation 2024

Explore at:
97 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 19, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description

Millennials were the largest generation group in the United States in 2024, with an estimated population of ***** million. Born between 1981 and 1996, Millennials recently surpassed Baby Boomers as the biggest group, and they will continue to be a major part of the population for many years. The rise of Generation Alpha Generation Alpha is the most recent to have been named, and many group members will not be able to remember a time before smartphones and social media. As of 2024, the oldest Generation Alpha members were still only aging into adolescents. However, the group already makes up around ***** percent of the U.S. population, and they are said to be the most racially and ethnically diverse of all the generation groups. Boomers vs. Millennials The number of Baby Boomers, whose generation was defined by the boom in births following the Second World War, has fallen by around ***** million since 2010. However, they remain the second-largest generation group, and aging Boomers are contributing to steady increases in the median age of the population. Meanwhile, the Millennial generation continues to grow, and one reason for this is the increasing number of young immigrants arriving in the United States.

Search
Clear search
Close search
Google apps
Main menu