100+ datasets found
  1. Daily Social Media Active Users

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
    Explore at:
    zip(126814 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Shaik Barood Mohammed Umar Adnaan Faiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

    Dataset Breakdown:

    • Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

    • Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

    • Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

    • Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

    • Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

    • Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

    • Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

    Context and Use Cases:

    • This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

    Researchers, data scientists, and developers can use this dataset to:

    • Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

    • Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

    • Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

    • Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

    • Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

    • Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

    The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

    Future Considerations:

    As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

    By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...

  2. EveryPolitician

    • kaggle.com
    zip
    Updated Aug 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EveryPolitician (2017). EveryPolitician [Dataset]. https://www.kaggle.com/datasets/everypolitician/everypoliticiansample
    Explore at:
    zip(3742391 bytes)Available download formats
    Dataset updated
    Aug 14, 2017
    Dataset authored and provided by
    EveryPolitician
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    EveryPolitician is a project with the goal of providing data about every politician in the world. They collect open data on as many politicians as they can find and these datasets are just a small sample of the data available at http://www.everypolitician.org.

    Content

    Each country has their own governmental structure and EveryPolitician provides data for as many countries as possible. At the time of publishing, there was information on politicians from 233 countries. I chose to publish JSON files for these 10 countries:

    • Australia
    • Brazil
    • China
    • France
    • India
    • Nigeria
    • Russia
    • South_Africa
    • UK
    • US

    These JSON files follow the POPOLO format

    Acknowledgements

    These data were collected from http://everypolitician.org/. Their website has more data than I have published here - this is a small sample.

  3. d

    International Cigarette Consumption Database v1.3

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poirier, Mathieu JP; Guindon, G Emmanuel; Sritharan, Lathika; Hoffman, Steven J (2023). International Cigarette Consumption Database v1.3 [Dataset]. http://doi.org/10.5683/SP2/AOVUW7
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Poirier, Mathieu JP; Guindon, G Emmanuel; Sritharan, Lathika; Hoffman, Steven J
    Time period covered
    Jan 1, 1970 - Jan 1, 2015
    Description

    This database contains tobacco consumption data from 1970-2015 collected through a systematic search coupled with consultation with country and subject-matter experts. Data quality appraisal was conducted by at least two research team members in duplicate, with greater weight given to official government sources. All data was standardized into units of cigarettes consumed and a detailed accounting of data quality and sourcing was prepared. Data was found for 82 of 214 countries for which searches for national cigarette consumption data were conducted, representing over 95% of global cigarette consumption and 85% of the world’s population. Cigarette consumption fell in most countries over the past three decades but trends in country specific consumption were highly variable. For example, China consumed 2.5 million metric tonnes (MMT) of cigarettes in 2013, more than Russia (0.36 MMT), the United States (0.28 MMT), Indonesia (0.28 MMT), Japan (0.20 MMT), and the next 35 highest consuming countries combined. The US and Japan achieved reductions of more than 0.1 MMT from a decade earlier, whereas Russian consumption plateaued, and Chinese and Indonesian consumption increased by 0.75 MMT and 0.1 MMT, respectively. These data generally concord with modelled country level data from the Institute for Health Metrics and Evaluation and have the additional advantage of not smoothing year-over-year discontinuities that are necessary for robust quasi-experimental impact evaluations. Before this study, publicly available data on cigarette consumption have been limited—either inappropriate for quasi-experimental impact evaluations (modelled data), held privately by companies (proprietary data), or widely dispersed across many national statistical agencies and research organisations (disaggregated data). This new dataset confirms that cigarette consumption has decreased in most countries over the past three decades, but that secular country specific consumption trends are highly variable. The findings underscore the need for more robust processes in data reporting, ideally built into international legal instruments or other mandated processes. To monitor the impact of the WHO Framework Convention on Tobacco Control and other tobacco control interventions, data on national tobacco production, trade, and sales should be routinely collected and openly reported. The first use of this database for a quasi-experimental impact evaluation of the WHO Framework Convention on Tobacco Control is: Hoffman SJ, Poirier MJP, Katwyk SRV, Baral P, Sritharan L. Impact of the WHO Framework Convention on Tobacco Control on global cigarette consumption: quasi-experimental evaluations using interrupted time series analysis and in-sample forecast event modelling. BMJ. 2019 Jun 19;365:l2287. doi: https://doi.org/10.1136/bmj.l2287 Another use of this database was to systematically code and classify longitudinal cigarette consumption trajectories in European countries since 1970 in: Poirier MJ, Lin G, Watson LK, Hoffman SJ. Classifying European cigarette consumption trajectories from 1970 to 2015. Tobacco Control. 2022 Jan. DOI: 10.1136/tobaccocontrol-2021-056627. Statement of Contributions: Conceived the study: GEG, SJH Identified multi-country datasets: GEG, MP Extracted data from multi-country datasets: MP Quality assessment of data: MP, GEG Selection of data for final analysis: MP, GEG Data cleaning and management: MP, GL Internet searches: MP (English, French, Spanish, Portuguese), GEG (English, French), MYS (Chinese), SKA (Persian), SFK (Arabic); AG, EG, BL, MM, YM, NN, EN, HR, KV, CW, and JW (English), GL (English) Identification of key informants: GEG, GP Project Management: LS, JM, MP, SJH, GEG Contacts with Statistical Agencies: MP, GEG, MYS, SKA, SFK, GP, BL, MM, YM, NN, HR, KV, JW, GL Contacts with key informants: GEG, MP, GP, MYS, GP Funding: GEG, SJH SJH: Hoffman, SJ; JM: Mammone J; SRVK: Rogers Van Katwyk, S; LS: Sritharan, L; MT: Tran, M; SAK: Al-Khateeb, S; AG: Grjibovski, A.; EG: Gunn, E; SKA: Kamali-Anaraki, S; BL: Li, B; MM: Mahendren, M; YM: Mansoor, Y; NN: Natt, N; EN: Nwokoro, E; HR: Randhawa, H; MYS: Yunju Song, M; KV: Vercammen, K; CW: Wang, C; JW: Woo, J; MJPP: Poirier, MJP; GEG: Guindon, EG; GP: Paraje, G; GL Gigi Lin Key informants who provided data: Corne van Walbeek (South Africa, Jamaica) Frank Chaloupka (US) Ayda Yurekli (Turkey) Dardo Curti (Uruguay) Bungon Ritthiphakdee (Thailand) Jakub Lobaszewski (Poland) Guillermo Paraje (Chile, Argentina) Key informants who provided useful insights: Carlos Manuel Guerrero López (Mexico) Muhammad Jami Husain (Bangladesh) Nigar Nargis (Bangladesh) Rijo M John (India) Evan Blecher (Nigeria, Indonesia, Philippines, South Africa) Yagya Karki (Nepal) Anne CK Quah (Malaysia) Nery Suarez Lugo (Cuba) Agencies providing assistance: Irani... Visit https://dataone.org/datasets/sha256%3Aaa1b4aae69c3399c96bfbf946da54abd8f7642332d12ccd150c42ad400e9699b for complete metadata about this dataset.

  4. Global Data Regulation Diagnostic Survey Dataset 2021 - Afghanistan, Angola,...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2023). Global Data Regulation Diagnostic Survey Dataset 2021 - Afghanistan, Angola, Argentina...and 77 more [Dataset]. https://microdata.worldbank.org/index.php/catalog/3866
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    Time period covered
    2020
    Area covered
    Angola, Argentina...and 77 more, Afghanistan
    Description

    Abstract

    The Global Data Regulation Diagnostic provides a comprehensive assessment of the quality of the data governance environment. Diagnostic results show that countries have put in greater effort in adopting enabler regulatory practices than in safeguard regulatory practices. However, for public intent data, enablers for private intent data, safeguards for personal and nonpersonal data, cybersecurity and cybercrime, as well as cross-border data flows. Across all these dimensions, no income group demonstrates advanced regulatory frameworks across all dimensions, indicating significant room for the regulatory development of both enablers and safeguards remains at an intermediate stage: 47 percent of enabler good practices and 41 percent of good safeguard practices are adopted across countries. Under the enabler and safeguard pillars, the diagnostic covers dimensions of e-commerce/e-transactions, enablers further improvement on data governance environment.

    The Global Data Regulation Diagnostic is the first comprehensive assessment of laws and regulations on data governance. It covers enabler and safeguard regulatory practices in 80 countries providing indicators to assess and compare their performance. This Global Data Regulation Diagnostic develops objective and standardized indicators to measure the regulatory environment for the data economy across countries. The indicators aim to serve as a diagnostic tool so countries can assess and compare their performance vis-á-vis other countries. Understanding the gap with global regulatory good practices is a necessary first step for governments when identifying and prioritizing reforms.

    Geographic coverage

    80 countries

    Analysis unit

    Country

    Kind of data

    Observation data/ratings [obs]

    Sampling procedure

    The diagnostic is based on a detailed assessment of domestic laws, regulations, and administrative requirements in 80 countries selected to ensure a balanced coverage across income groups, regions, and different levels of digital technology development. Data are further verified through a detailed desk research of legal texts, reflecting the regulatory status of each country as of June 1, 2020.

    Mode of data collection

    Mail Questionnaire [mail]

    Research instrument

    The questionnaire comprises 37 questions designed to determine if a country has adopted good regulatory practice on data governance. The responses are then scored and assigned a normative interpretation. Related questions fall into seven clusters so that when the scores are averaged, each cluster provides an overall sense of how it performs in its corresponding regulatory and legal dimensions. These seven dimensions are: (1) E-commerce/e-transaction; (2) Enablers for public intent data; (3) Enablers for private intent data; (4) Safeguards for personal data; (5) Safeguards for nonpersonal data; (6) Cybersecurity and cybercrime; (7) Cross-border data transfers.

    Response rate

    100%

  5. N

    Median Household Income by Racial Categories in Hill Country Village, TX (,...

    • neilsberg.com
    csv, json
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Median Household Income by Racial Categories in Hill Country Village, TX (, in 2023 inflation-adjusted dollars) [Dataset]. https://www.neilsberg.com/research/datasets/e0a7a46c-f665-11ef-a994-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 1, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hill Country Village, Texas
    Variables measured
    Median Household Income for Asian Population, Median Household Income for Black Population, Median Household Income for White Population, Median Household Income for Some other race Population, Median Household Income for Two or more races Population, Median Household Income for American Indian and Alaska Native Population, Median Household Income for Native Hawaiian and Other Pacific Islander Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the median household income within each racial category idetified by the US Census Bureau, we conducted an initial analysis and categorization of the data. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). It is important to note that the median household income estimates exclusively represent the identified racial categories and do not incorporate any ethnicity classifications. Households are categorized, and median incomes are reported based on the self-identified race of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the median household income across different racial categories in Hill Country Village. It portrays the median household income of the head of household across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to gain insights into economic disparities and trends and explore the variations in median houshold income for diverse racial categories.

    Key observations

    Based on our analysis of the distribution of Hill Country Village population by race & ethnicity, the population is predominantly White. This particular racial category constitutes the majority, accounting for 72.82% of the total residents in Hill Country Village. Notably, the median household income for White households is $244,375. Interestingly, despite the White population being the most populous, it is worth noting that Some Other Race households actually reports the highest median household income, with a median income of $250,001. This reveals that, while Whites may be the most numerous in Hill Country Village, Some Other Race households experience greater economic prosperity in terms of median household income.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race of the head of household: This column presents the self-identified race of the household head, encompassing all relevant racial categories (excluding ethnicity) applicable in Hill Country Village.
    • Median household income: Median household income, adjusting for inflation, presented in 2023-inflation-adjusted dollars

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Hill Country Village median household income by race. You can refer the same here

  6. d

    Data from: Land Cover Trends Dataset, 2000-2011

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Land Cover Trends Dataset, 2000-2011 [Dataset]. https://catalog.data.gov/dataset/land-cover-trends-dataset-2000-2011
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    U.S. Geological Survey scientists, funded by the Climate and Land Use Change Research and Development Program, developed a dataset of 2006 and 2011 land use and land cover (LULC) information for selected 100-km2 sample blocks within 29 EPA Level 3 ecoregions across the conterminous United States. The data was collected for validation of new and existing national scale LULC datasets developed from remotely sensed data sources. The data can also be used with the previously published Land Cover Trends Dataset: 1973-2000 (http:// http://pubs.usgs.gov/ds/844/), to assess land-use/land-cover change in selected ecoregions over a 37-year study period. LULC data for 2006 and 2011 was manually delineated using the same sample block classification procedures as the previous Land Cover Trends project. The methodology is based on a statistical sampling approach, manual classification of land use and land cover, and post-classification comparisons of land cover across different dates. Landsat Thematic Mapper, and Enhanced Thematic Mapper Plus imagery was interpreted using a modified Anderson Level I classification scheme. Landsat data was acquired from the National Land Cover Database (NLCD) collection of images. For the 2006 and 2011 update, ecoregion specific alterations in the sampling density were made to expedite the completion of manual block interpretations. The data collection process started with the 2000 date from the previous assessment and any needed corrections were made before interpreting the next two dates of 2006 and 2011 imagery. The 2000 land cover was copied and any changes seen in the 2006 Landsat images were digitized into a new 2006 land cover image. Similarly, the 2011 land cover image was created after completing the 2006 delineation. Results from analysis of these data include ecoregion based statistical estimates of the amount of LULC change per time period, ranking of the most common types of conversions, rates of change, and percent composition. Overall estimated amount of change per ecoregion from 2001 to 2011 ranged from a low of 370 km2 in the Northern Basin and Range Ecoregion to a high of 78,782 km2 in the Southeastern Plains Ecoregion. The Southeastern Plains Ecoregion continues to encompass the most intense forest harvesting and regrowth in the country. Forest harvesting and regrowth rates in the southeastern U.S. and Pacific Northwest continued at late 20th century levels. The land use and land cover data collected by this study is ideally suited for training, validation, and regional assessments of land use and land cover change in the U.S. because it is collected using manual interpretation techniques of Landsat data aided by high resolution photography. The 2001-2011 Land Cover Trends Dataset is provided in an Albers Conical Equal Area projection using the NAD 1983 datum. The sample blocks have a 30-meter resolution and file names follow a specific naming convention that includes the number of the ecoregion containing the block, the block number, and the Landsat image date. The data files are organized by ecoregion, and are available in the ERDAS Imagine (.img) format. U.S. Geological Survey scientists, funded by the Climate and Land Use Change Research and Development Program, developed a dataset of 2006 and 2011 land use and land cover (LULC) information for selected 100-km2 sample blocks within 29 EPA Level 3 ecoregions across the conterminous United States. The data was collected for validation of new and existing national scale LULC datasets developed from remotely sensed data sources. The data can also be used with the previously published Land Cover Trends Dataset: 1973-2000 (http:// http://pubs.usgs.gov/ds/844/), to assess land-use/land-cover change in selected ecoregions over a 37-year study period. LULC data for 2006 and 2011 was manually delineated using the same sample block classification procedures as the previous Land Cover Trends project. The methodology is based on a statistical sampling approach, manual classification of land use and land cover, and post-classification comparisons of land cover across different dates. Landsat Thematic Mapper, and Enhanced Thematic Mapper Plus imagery was interpreted using a modified Anderson Level I classification scheme. Landsat data was acquired from the National Land Cover Database (NLCD) collection of images. For the 2006 and 2011 update, ecoregion specific alterations in the sampling density were made to expedite the completion of manual block interpretations. The data collection process started with the 2000 date from the previous assessment and any needed corrections were made before interpreting the next two dates of 2006 and 2011 imagery. The 2000 land cover was copied and any changes seen in the 2006 Landsat images were digitized into a new 2006 land cover image. Similarly, the 2011 land cover image was created after completing the 2006 delineation. Results from analysis of these data include ecoregion based statistical estimates of the amount of LULC change per time period, ranking of the most common types of conversions, rates of change, and percent composition. Overall estimated amount of change per ecoregion from 2001 to 2011 ranged from a low of 370 square km in the Northern Basin and Range Ecoregion to a high of 78,782 square km in the Southeastern Plains Ecoregion. The Southeastern Plains Ecoregion continues to encompass the most intense forest harvesting and regrowth in the country. Forest harvesting and regrowth rates in the southeastern U.S. and Pacific Northwest continued at late 20th century levels. The land use and land cover data collected by this study is ideally suited for training, validation, and regional assessments of land use and land cover change in the U.S. because it’s collected using manual interpretation techniques of Landsat data aided by high resolution photography. The 2001-2011 Land Cover Trends Dataset is provided in an Albers Conical Equal Area projection using the NAD 1983 datum. The sample blocks have a 30-meter resolution and file names follow a specific naming convention that includes the number of the ecoregion containing the block, the block number, and the Landsat image date. The data files are organized by ecoregion, and are available in the ERDAS Imagine (.img) format.

  7. N

    Country Club, MO Age Cohorts Dataset: Children, Working Adults, and Seniors...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Country Club, MO Age Cohorts Dataset: Children, Working Adults, and Seniors in Country Club - Population and Percentage Analysis // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/4b78f836-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Country Club Village
    Variables measured
    Population Over 65 Years, Population Under 18 Years, Population Between 18 and 64 Years, Percent of Total Population for Age Groups
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age cohorts. For age cohorts we divided it into three buckets Children ( Under the age of 18 years), working population ( Between 18 and 64 years) and senior population ( Over 65 years). For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Country Club population by age cohorts (Children: Under 18 years; Working population: 18-64 years; Senior population: 65 years or more). It lists the population in each age cohort group along with its percentage relative to the total population of Country Club. The dataset can be utilized to understand the population distribution across children, working population and senior population for dependency ratio, housing requirements, ageing, migration patterns etc.

    Key observations

    The largest age group was 18 to 64 years with a poulation of 1,811 (57.11% of the total population). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age cohorts:

    • Under 18 years
    • 18 to 64 years
    • 65 years and over

    Variables / Data Columns

    • Age Group: This column displays the age cohort for the Country Club population analysis. Total expected values are 3 groups ( Children, Working Population and Senior Population).
    • Population: The population for the age cohort in Country Club is shown in the following column.
    • Percent of Total Population: The population as a percent of total population of the Country Club is shown in the following column.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Country Club Population by Age. You can refer the same here

  8. p

    Panama Number Dataset

    • listtodata.com
    .csv, .xls, .txt
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    List to Data (2025). Panama Number Dataset [Dataset]. https://listtodata.com/panama-dataset
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 17, 2025
    Authors
    List to Data
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2025 - Dec 31, 2025
    Area covered
    Panama
    Variables measured
    phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
    Description

    Panama number dataset is a precious source for your telemarketing at this time. Additionally, people need to do marketing to make people aware of your services. Anyway, without proper marketing, your business won’t be able to achieve its maximum potential. Similarly, to ensure the maximum reach of any brand or product you need to promote them in all possible mediums. The Panama number dataset from List To Data can be the best buy of all. We all know that in this present time, it is difficult to sell anything without marketing. The Panama number dataset will make your marketing more targeted and increase the prospect of success. Hence, this contact library can change the whole scenario for anyone. Panama phone data will come in handy and at an affordable price. In fact, it will support and promote products to a huge audience through the telephone. As we all know, a total of 5.34 million cellular mobile connections were active in this country, so it would be foolish not to use this list for marketing. Panama phone data can be used in any of your preferred CRM systems smoothly and you can analyze the results of your campaigns more effortlessly. Besides, we add basic info about the people on our number package, so anyone can use them to segment your information. Hence, with this exact Panama phone data, you can hope to get the best possible outcome. Yet, your business will see enormous growth with the country’s mobile number database. Panama phone number list will be a useful marketing resource. SMS and Telemarketing costs less than other traditional ways, so it will save you money. In other words, your business will progress smoothly with a high profit [ROI]. Not only that, but the Panama phone number list will also influence your branding. In fact, bring your business to the next level with the most updated and 95% active number data. Panama phone number list is a cost-friendly resource that people can buy now from List To Data. Above all, we guarantee a high accuracy rate for this list as well as a high delivery rate for your messages. To that end, you can be sure of the advantages that your business will get from the library.

  9. N

    Median Household Income Variation by Family Size in Country Club Hills, IL:...

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Median Household Income Variation by Family Size in Country Club Hills, IL: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1acf7cd1-73fd-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Country Club Hills, Illinois
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in Country Club Hills, IL, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, all of the household sizes were found in Country Club Hills. Across the different household sizes in Country Club Hills the mean income is $88,306, and the standard deviation is $31,659. The coefficient of variation (CV) is 35.85%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.
    • In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $46,212. It then further increased to $100,776 for 7-person households, the largest household size for which the bureau reported a median household income.

    https://i.neilsberg.com/ch/country-club-hills-il-median-household-income-by-household-size.jpeg" alt="Country Club Hills, IL median household income, by household size (in 2022 inflation-adjusted dollars)">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Country Club Hills median household income. You can refer the same here

  10. Aqueduct Global Flood Risk Country Rankings - Datasets - Data | World...

    • old-datasets.wri.org
    Updated Mar 4, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wri.org (2015). Aqueduct Global Flood Risk Country Rankings - Datasets - Data | World Resources Institute [Dataset]. https://old-datasets.wri.org/dataset/aqueduct-global-flood-risk-country-rankings
    Explore at:
    Dataset updated
    Mar 4, 2015
    Dataset provided by
    World Resources Institutehttps://www.wri.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Approximately, 21 million people worldwide could be affected by river floods on average each year, and the 15 countries with the most people exposed, including India, Bangladesh, China, Vietnam, Pakistan, Indonesia, Egypt, Myanmar, Afghanistan, Nigeria, Brazil, Thailand, Democratic Republic of Congo, Iraq, and Cambodia, account for nearly 80 percent of the total population affected in an average year. Summary The Aqueduct Global Flood Risk Country Ranking ranks 163 countries by their current annual average population affected by river floods using the Aqueduct Global Flood Analyzer. Approximately, 21 million people worldwide could be affected by river floods on average each year, and the 15 countries with the most people exposed, including India, Bangladesh, China, Vietnam, Pakistan, Indonesia, Egypt, Myanmar, Afghanistan, Nigeria, Brazil, Thailand, Democratic Republic of Congo, Iraq, and Cambodia, account for nearly 80 percent of the total population affected in an average year. A country-wide estimated average flood protection level was given to each country based on its income level. Cautions Assumption: We assigned a country-wide average flood protection level for each country based on its income level (World Bank). 1) For low-income countries, we assume 10-year flood protection; 2) for lower-middle income countries, we assume 25-year flood protection; 3) for upper-middle income countries, we assume 50-year flood protection; 4) for high-income countries, we assume 100-year flood protection; and 5) for the Netherlands, we assume a 1000-year flood protection. Citation

  11. Data from: Quality wines in Italy and France: a dataset of protected...

    • figshare.com
    txt
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Candiago; Simon Tscholl; Leonardo Bassani; Helder Fraga; Lukas Egarter Vigl (2024). Quality wines in Italy and France: a dataset of protected designation of origin specifications [Dataset]. http://doi.org/10.6084/m9.figshare.25393261.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Sebastian Candiago; Simon Tscholl; Leonardo Bassani; Helder Fraga; Lukas Egarter Vigl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    France, Italy
    Description

    Italy and France are historically among the countries that produce the most prestigious wines worldwide. In Europe, these two countries together produce more than half of the wines classified under the Protected Designation of Origin (PDO) label, the strictest quality mark of food and wines in the European Union. Due to their long tradition in wine protection, Italy and France include highly detailed regulatory information in their wine PDO regulatory documents that are usually not available for other countries, such as specific information about the main cultivars that must be used to make each wine product or the related required planting density in the vineyards. However, this information is scattered throughout the documents of each wine production area and has never been extracted and homogenised in a unique dataset. Here, we present the first dataset that characterizes the PDO wines produced in Italy and France at very high detail based on the documents from the official EU geographical indication register. It includes, for each country, a standardized list of the PDO wine names, linked with their specific regulatory requirements, including the wine colour, type, cultivars used and maximum allowed yields. The unprecedent level of detail of this dataset allows for the first time the analysis of more than 5000 traditional wines and their legal and agronomic specifications. This gives insights into the interplay between the European Union quality regulation policy, the wine sector and agronomic practices, enabling researchers and practitioners to analyze wine production in the context of specific regulations or economic scenarios.

  12. I

    Cline Center Coup d’État Project Dataset

    • databank.illinois.edu
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
    Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
    Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7

  13. F

    In-Car Speech Dataset: French (France)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). In-Car Speech Dataset: French (France) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-french
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    French, France
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the French Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

    Speech Data

    This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

    Participant Diversity:

    - Speakers: 50+ native French speakers from the FutureBeeAI Community.

    - Regions: Ensures a balanced representation of France1 accents, dialects, and demographics.

    - Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Recording Nature: Scripted wake word and command type of audio recordings.

    - Duration: Average duration of 5 to 20 seconds per audio recording.

    - Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

    Dataset Diversity

    Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

    Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

    Different Cars: Data collection was carried out in different types and models of cars.

    Different Types of Voice Commands:

    - Navigational Voice Commands

    - Mobile Control Voice Commands

    - Car Control Voice Commands

    - Multimedia & Entertainment Commands

    - General, Question Answer, Search Commands

    Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

    - Morning

    - Afternoon

    - Evening

    Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

    - Noise Level: Silent, Low Noise, Moderate Noise, High Noise

    - Parking Location: Indoor, Outdoor

    - Car Windows: Open, Closed

    - Car AC: On, Off

    - Car Engine: On, Off

    - Car Movement: Stationary, Moving

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

    Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of French voice assistant speech recognition models.

    License

    This French In-car audio dataset is created by FutureBeeAI and is available for commercial use.

  14. Mining Company's Global Supply Chain - Logistics Data for a Medium Size...

    • figshare.com
    zip
    Updated Feb 24, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marco Veluscek (2016). Mining Company's Global Supply Chain - Logistics Data for a Medium Size Excavator - Extended Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.2749120.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Marco Veluscek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The company which provided the dataset is the world leader in manufacturing of construction and mining equipment, diesel and natural gas engines, industrial gas turbines and diesel-electric locomotives. The current revenue of the company is estimated to be on the order of tens of billions and they sell products and parts via a worldwide dealer network. The company sells more than 3 million products and 700,000 parts in more than 20 countries around the world every year. They operate with more than 3,000 suppliers and 3,000 dealerships and their logistics operations alone are worth more than 60 million dollars per year.The dataset provided is one example of supply chain problem for one product of the company - a medium size excavator. In the current dataset, the number of dealers, production facilities and shipping ports is the same as in the original problem; it is only the demand figures, the production capacities, the transportation times and costs and the sale prices that have been randomly generated. The figures have been generated according to a normal distribution with the same mean and standard deviation as in the original dataset (e.g. the demand figures have the same mean and standard deviation as those found in the original problem).The dataset has been extended with 9 more years of demand distributions. The additional 9 years have been created with the same random problem generator. The purpose of the dataset is to provide more instances of the problem. The dataset may be interpreted as containing 10 years of demand for one product or the demand figures of 10 similar products. For instance, we adopted this dataset in a machine learning context to have a larger and more comprehensive training set.

  15. CURVAS-PDACVI dataset

    • zenodo.org
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meritxell Riera-Marín; Meritxell Riera-Marín; SIKHA O K; SIKHA O K; MARIA MONTSERRAT DUH; MARIA MONTSERRAT DUH; Anton Aubanell; Anton Aubanell; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia; Júlia Rodríguez-Comas; Júlia Rodríguez-Comas; Miguel Ángel González Ballester; Miguel Ángel González Ballester; Javier Garcia López; Javier Garcia López; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia (2025). CURVAS-PDACVI dataset [Dataset]. http://doi.org/10.5281/zenodo.15401568
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Meritxell Riera-Marín; Meritxell Riera-Marín; SIKHA O K; SIKHA O K; MARIA MONTSERRAT DUH; MARIA MONTSERRAT DUH; Anton Aubanell; Anton Aubanell; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia; Júlia Rodríguez-Comas; Júlia Rodríguez-Comas; Miguel Ángel González Ballester; Miguel Ángel González Ballester; Javier Garcia López; Javier Garcia López; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This challenge will be hosted soon in Grand Challenge. Currently under construction.

    Clinical Problem

    In medical imaging, DL models are often tasked with delineating structures or abnormalities within complex anatomical structures, such as tumors, blood vessels, or organs. Uncertainty arises from the inherent complexity and variability of these structures, leading to challenges in precisely defining their boundaries. This uncertainty is further compounded by interrater variability, as different medical experts may have varying opinions on where the true boundaries lie. DL models must grapple with these discrepancies, leading to inconsistencies in segmentation results across different annotators and potentially impacting diagnosis and treatment decisions. Addressing interrater variability in DL for medical segmentation involves the development of robust algorithms capable of capturing and quantifying uncertainty, as well as standardizing annotation practices and promoting collaboration among medical experts to reduce variability and improve the reliability of DL-based medical image analysis. Interrater variability poses significant challenges in the field of DL for medical image segmentation.

    This challenge is designed to promote awareness of the impact uncertainty has on clinical applications of medical image analysis. In our last-year edition, we proposed a competition based on modeling the uncertainty of segmenting three abdominal organs, namely kidney, liver and pancreas, focusing on organ volume as a clinical quantity of interest. This year, we go one step further and propose to segment pancreatic pathological structures, namely Pancreatic Ductal Adenocarcinoma (PDAC), with the clinical goal of understanding vascular involvement, a key measure of tumor resectability. In this above context, uncertainty quantification is a much more challenging task, given the wildly varying contours that different PDAC instances show.

    This year, we will provide a richer dataset, in which we start from an already existing dataset of clinically verified contrast-enhanced abdominal CT scans with a single set of manual annotations (provided by the PANORAMA organization), and make an effort to construct four extra manual annotations per PDAC case. In this way, we will assemble a unique dataset that creates a notable opportunity to analyze the impact of multi-rater annotations in several dimensions, e.g. different annotation protocols or different annotator experiences, to name a few.

    CURVAS Challenge Goal

    This challenge aims to advance deep learning methods for medical image segmentation by focusing on the critical issue of interrater variability, particularly in the context of pancreatic cancer. Building on last year's focus on organ segmentation uncertainty, this edition shifts to the more complex task of segmenting Pancreatic Ductal Adenocarcinoma (PDAC) to assess vascular involvement—a key indicator of tumor resectability. By providing a unique, richly annotated dataset with multiple expert annotations per case, the challenge encourages participants to develop robust models that can quantify and manage uncertainty arising from differing expert opinions, ultimately improving the clinical reliability of AI-based image analysis.

    For more information about the challenge, visit our website to join CURVAS-PDACVI (Calibration and Uncertainty for multiRater Volume Assessment in multistructure Segmentation - Pancreatic Ductal AdenoCarcinoma Vascular Invasion). This challenge will be held in MICCAI 2025.

    Dataset Cohort

    The challenge cohort comprises upper-abdominal axial, portal-venous CECT 125 CT scans selected from a subset of the PANORAMA challenge dataset. The selection process will prioritize CT scans with manually generated labels, excluding those with automatically derived annotations. Additionally, only cases with a conclusive diagnostic test (e.g., pathology, cytology, histopathology) are included, while patients with radiology-based diagnoses have been excluded.

    To ensure the subset is representative of common real-world scenarios, lesion sizes have been analyzed, and a diverse range of cases have been selected. Furthermore, patient demographics, including sex and age, have been considered to enhance the cohort's representativeness.

    Finally, a preliminary visual analysis have been conducted before sending the image to radiologists for segmentation. This ensures the tumor's location, size, and relevance, helping maintain the dataset's representativeness for the challenge.

    The previously indicated cohort of 125 CT scans is splitted in the following way:

    • Training Phase cohort:

    40 CT scans with the respective annotations is given. It is encouraged to leverage publicly available external data annotated by multiple raters. The idea of giving a small amount of data for the training set and giving the opportunity of using a public dataset for training is to make the challenge more inclusive, giving the option to develop a method by using data that is in anyone's hands. Furthermore, by using this data to train and using other data to evaluate, it makes it more robust to shifts and other sources of variability between datasets.

    • Validation Phase cohort:

    5 CT scans will be used for this phase.

    • Test Phase cohort:

    85 CT scans will be used for evaluation.

    Both validation and testing CT scans cohorts will not be published until the end of the challenge. Furthermore, to which group each CT scan belongs will not be revealed until after the challenge.

    Each folder containing a study is named with a unique ID (CURVASPDAC_XXXX) so it cannot be directy related to the PANORAMA ID and has the following structure:

    • annotation_X.nii.gz: contains the Pancreatic Ductal Adenocarcinoma (PDAC) segmentations (X=1 being the PANORAMA segmentation, X=2,..,5 being the other experts segmentations)
    • image.nii.gz: CT volume

    The four additional annotations are done from radiologists at Universitätsklinikum Erlangen, Hospital de Sant Pau, and Hospital de Mataró. Hence, four new annotations plus the PANORAMA annotation are provied. Another clinician, focused on modifying the annotations from the vascular structures of the PANORAMA dataset and separated veins and arteries in single strcutures segmentations. This structures are the ones considered highly relevant for the study of Vascular Invasion (VI): Porta, Superior Mesenteric Vein (SMV), Superior Mesenteric Artery (SMA), Hepatic Artery and Celiac Trunk. The vascular annotations will be made public later in the challenge, so the participants can try out the evaluation code.

    A balance to ensure representiveness within the subsets have been performed as well. Factors such as devices, sex, and patient age have been considered to improve the cohort's representativeness. Efforts have been made to balance bias as evenly as possible across these variables. For age distribution, the target percentages are as follows: below 50 years (5%), 50–59 years (15%), 60–69 years (20%), 70–79 years (30%), and 80–89 years (30%) [1,2,3,4]. While these percentages are approximate and have been rounded for simplicity, the balance aims to be as close to these proportions as feasible. For the sex, 40-50% for females and 50-60% for males [5]. For location of the PDAC, 60-70% head, 15-25% body and 10-15% tail [6]. The size of the lesions has been analyzed and a subset will be selected and this values will be published in the future with the entire dataset.

    Data from PANORAMA Batch 1 (https://zenodo.org/records/13715870), Batch 2 (https://zenodo.org/records/13742336), and Batch 3 (https://zenodo.org/records/11034011)), are not allowed for training the models. Batch 4 (https://zenodo.org/records/10999754) can be used.

    For more technical information about the dataset visit the platform: https://panorama.grand-challenge.org/datasets-imaging-labels/

    Ethical Approval and Data Usage Agreement

    No other information that is not already public about the patient will be released since the CT images and their corresponding information are already publicly available.

    References

    [1] Lee, K.S.; Sekhar, A.; Rofsky, N.M.; Pedrosa, I. Prevalence of Incidental Pancreatic Cysts in the Adult Population on MR Imaging. Am J Gastroenterol 2010, 105, 2079–2084, doi:10.1038/ajg.2010.122.

    [2] Canakis, A.; Lee, L.S. State-of-the-Art Update of Pancreatic Cysts. Dig Dis Sci 2021.

    [3] De Oliveira, P.B.; Puchnick, A.; Szejnfeld, J.; Goldman, S.M. Prevalence of Incidental Pancreatic Cysts on 3 Tesla Magnetic Resonance. PLoS One 2015, 10, doi:10.1371/JOURNAL.PONE.0121317.

    [4] Kimura, W.; Nagai, H.; Kuroda, A.; Muto, T.; Esaki, Y. Analysis of Small Cystic Lesions of the Pancreas. Int J Pancreatol 1995, 18, 197–206, doi:10.1007/BF02784942.

    [5] Natalie Moshayedi et al. Race, sex, age, and geographic disparities in pancreatic cancer incidence. JCO 40, 520-520(2022). DOI:10.1200/JCO.2022.40.4_suppl.520

    [6] Avo Artinyan, Perry A. Soriano, Christina Prendergast, Tracey Low, Joshua D.I. Ellenhorn, Joseph Kim, The anatomic location of pancreatic cancer is a prognostic

  16. F

    In-Car Speech Dataset: Italian (Italy)

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). In-Car Speech Dataset: Italian (Italy) [Dataset]. https://www.futurebeeai.com/dataset/monologue-speech-dataset/in-car-speech-dataset-italian-italy
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Area covered
    Italy
    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    Welcome to the Italian Language In-car Speech Dataset, a comprehensive collection of audio recordings designed to facilitate the development of speech recognition models specifically tailored for in-car environments. This dataset aims to support research and innovation in automotive speech technology, enabling seamless and robust voice interactions within vehicles for drivers and co-passengers.

    Speech Data

    This dataset comprises over 5,000 high-quality audio recordings collected from various in-car environments. These recordings include scripted wake words and command-type prompts.

    Participant Diversity:

    - Speakers: 50+ native Italian speakers from the FutureBeeAI Community.

    - Regions: Ensures a balanced representation of Italy1 accents, dialects, and demographics.

    - Participant Profile: Participants range from 18 to 70 years old, representing both males and females in a 60:40 ratio, respectively.

    Recording Nature: Scripted wake word and command type of audio recordings.

    - Duration: Average duration of 5 to 20 seconds per audio recording.

    - Formats: WAV format with mono channels, a bit depth of 16 bits. The dataset contains different data at 16kHz and 48kHz.

    Dataset Diversity

    Apart from participant diversity, the dataset is diverse in terms of different wake words, voice commands, and recording environments.

    Different Automobile Related Wake Words: Hey Mercedes, Hey BMW, Hey Porsche, Hey Volvo, Hey Audi, Hi Genesis, Hey Mini, Hey Toyota, Ok Ford, Hey Hyundai, Ok Honda, Hello Kia, Hey Dodge.

    Different Cars: Data collection was carried out in different types and models of cars.

    Different Types of Voice Commands:

    - Navigational Voice Commands

    - Mobile Control Voice Commands

    - Car Control Voice Commands

    - Multimedia & Entertainment Commands

    - General, Question Answer, Search Commands

    Recording Time: Participants recorded the given prompts at various times to make the dataset more diverse.

    - Morning

    - Afternoon

    - Evening

    Recording Environment: Various recording environments were captured to acquire more realistic data and to make the dataset inclusive of various types of noises. Some of the environment variables are as follows:

    - Noise Level: Silent, Low Noise, Moderate Noise, High Noise

    - Parking Location: Indoor, Outdoor

    - Car Windows: Open, Closed

    - Car AC: On, Off

    - Car Engine: On, Off

    - Car Movement: Stationary, Moving

    Metadata

    The dataset provides comprehensive metadata for each audio recording and participant:

    Participant Metadata: Unique identifier, age, gender, country, state, district, accent, and dialect.

    Other Metadata: Recording transcript, recording environment, device details, sample rate, bit depth, file format, recording time.

    This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Italian voice assistant speech recognition models.

    License

    This Italian In-car audio dataset is created by FutureBeeAI and is available for commercial use.

  17. d

    Shopping Malls Database by Country

    • datarade.ai
    .csv, .xls, .txt
    Updated Mar 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geodatindustry (2022). Shopping Malls Database by Country [Dataset]. https://datarade.ai/data-products/shopping-malls-database-by-country-geodataindustry
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Mar 9, 2022
    Dataset authored and provided by
    Geodatindustry
    Area covered
    Canada, France, United Kingdom
    Description

    To this day, the Geodatindustry database is the world's most complete and accurate in the retail, commercial and industry area, with 25 years of experience and a qualified teams.

    Geodatindustry Database is the perfect tool to lead your decision making, market analytics, strategy building, prospecting, advertizing compaigns, etc.

    By purchasing this dataset, you gain access to more than 18,000 shopping malls all over the World, hosting millions of stores and welcoming millions of visitors each year.

    Included Points of Interest in this dataset : -Shopping Malls and Centers -Outlets -Big Supermakets and Hypermarkets.

    Information (if known) : shopping mall's name, physical address, number of shops, x,y coordinates, annual visitors counts (in millions), owner and managers, global area and GLA (in ranges), the website.

    Global area and GLA Ranges : A = 0-2 500 m² B = 2 500-5 000 m² C = 5 000-10 000 m² D = 10 000-25 000 m²
    E = 25 000-50 000 m² F = 50 000-75 000 m² G = 75 000-100 000 m² H = 100 000-1M m² I = 1M-10M m² J = 10M m² and +

    Prices depend on the amount of Shopping Malls for each country. It goes from 59€ to 3990€ per country.

  18. w

    Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Apr 15, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Development Research Group, Finance and Private Sector Development Unit (2015). Global Financial Inclusion (Global Findex) Database 2011 - Afghanistan [Dataset]. https://microdata.worldbank.org/index.php/catalog/1117
    Explore at:
    Dataset updated
    Apr 15, 2015
    Dataset authored and provided by
    Development Research Group, Finance and Private Sector Development Unit
    Time period covered
    2011
    Area covered
    Afghanistan
    Description

    Abstract

    Well-functioning financial systems serve a vital purpose, offering savings, credit, payment, and risk management products to people with a wide range of needs. Yet until now little had been known about the global reach of the financial sector - the extent of financial inclusion and the degree to which such groups as the poor, women, and youth are excluded from formal financial systems. Systematic indicators of the use of different financial services had been lacking for most economies.

    The Global Financial Inclusion (Global Findex) database provides such indicators. This database contains the first round of Global Findex indicators, measuring how adults in more than 140 economies save, borrow, make payments, and manage risk. The data set can be used to track the effects of financial inclusion policies globally and develop a deeper and more nuanced understanding of how people around the world manage their day-to-day finances. By making it possible to identify segments of the population excluded from the formal financial sector, the data can help policy makers prioritize reforms and design new policies.

    Geographic coverage

    National Coverage.

    Analysis unit

    Individual

    Universe

    The target population is the civilian, non-institutionalized population 15 years and above.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Global Findex indicators are drawn from survey data collected by Gallup, Inc. over the 2011 calendar year, covering more than 150,000 adults in 148 economies and representing about 97 percent of the world's population. Since 2005, Gallup has surveyed adults annually around the world, using a uniform methodology and randomly selected, nationally representative samples. The second round of Global Findex indicators was collected in 2014 and is forthcoming in 2015. The set of indicators will be collected again in 2017.

    Surveys were conducted face-to-face in economies where landline telephone penetration is less than 80 percent, or where face-to-face interviewing is customary. The first stage of sampling is the identification of primary sampling units, consisting of clusters of households. The primary sampling units are stratified by population size, geography, or both, and clustering is achieved through one or more stages of sampling. Where population information is available, sample selection is based on probabilities proportional to population size; otherwise, simple random sampling is used. Random route procedures are used to select sampled households. Unless an outright refusal occurs, interviewers make up to three attempts to survey the sampled household. If an interview cannot be obtained at the initial sampled household, a simple substitution method is used. Respondents are randomly selected within the selected households by means of the Kish grid.

    Surveys were conducted by telephone in economies where landline telephone penetration is over 80 percent. The telephone surveys were conducted using random digit dialing or a nationally representative list of phone numbers. In selected countries where cell phone penetration is high, a dual sampling frame is used. Random respondent selection is achieved by using either the latest birthday or Kish grid method. At least three attempts are made to teach a person in each household, spread over different days and times of year.

    The sample size in Afghanistan was 1,000 individuals. Gender-matched sampling was used during the final stage of selection.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaire was designed by the World Bank, in conjunction with a Technical Advisory Board composed of leading academics, practitioners, and policy makers in the field of financial inclusion. The Bill and Melinda Gates Foundation and Gallup, Inc. also provided valuable input. The questionnaire was piloted in over 20 countries using focus groups, cognitive interviews, and field testing. The questionnaire is available in 142 languages upon request.

    Questions on insurance, mobile payments, and loan purposes were asked only in developing economies. The indicators on awareness and use of microfinance insitutions (MFIs) are not included in the public dataset. However, adults who report saving at an MFI are considered to have an account; this is reflected in the composite account indicator.

    Sampling error estimates

    Estimates of standard errors (which account for sampling error) vary by country and indicator. For country- and indicator-specific standard errors, refer to the Annex and Country Table in Demirguc-Kunt, Asli and L. Klapper. 2012. "Measuring Financial Inclusion: The Global Findex." Policy Research Working Paper 6025, World Bank, Washington, D.C.

  19. Z

    Data from: OSDG Community Dataset (OSDG-CD)

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OSDG; UNDP IICPSD SDG AI Lab; PPMI (2024). OSDG Community Dataset (OSDG-CD) [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_5550237
    Explore at:
    Dataset updated
    Jun 3, 2024
    Authors
    OSDG; UNDP IICPSD SDG AI Lab; PPMI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, which were validated by over 1,400 OSDG Community Platform (OSDG-CP) citizen scientists from over 140 countries, with respect to the Sustainable Development Goals (SDGs).

    Dataset Information

    In support of the global effort to achieve the Sustainable Development Goals (SDGs), OSDG is realising a series of SDG-labelled text datasets. The OSDG Community Dataset (OSDG-CD) is the direct result of the work of more than 1,400 volunteers from over 130 countries who have contributed to our understanding of SDGs via the OSDG Community Platform (OSDG-CP). The dataset contains tens of thousands of text excerpts (henceforth: texts) which were validated by the Community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches.

    📘 The file contains 43,0210 (+390) text excerpts and a total of 310,328 (+3,733) assigned labels.

    To learn more about the project, please visit the OSDG website and the official GitHub page. Explore a detailed overview of the OSDG methodology in our recent paper "OSDG 2.0: a multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs)".

    Source Data

    The dataset consists of paragraph-length text excerpts derived from publicly available documents, including reports, policy documents and publication abstracts. A significant number of documents (more than 3,000) originate from UN-related sources such as SDG-Pathfinder and SDG Library. These sources often contain documents that already have SDG labels associated with them. Each text is comprised of 3 to 6 sentences and is about 90 words on average.

    Methodology

    All the texts are evaluated by volunteers on the OSDG-CP. The platform is an ambitious attempt to bring together researchers, subject-matter experts and SDG advocates from all around the world to create a large and accurate source of textual information on the SDGs. The Community volunteers use the platform to participate in labelling exercises where they validate each text's relevance to SDGs based on their background knowledge.

    In each exercise, the volunteer is shown a text together with an SDG label associated with it – this usually comes from the source – and asked to either accept or reject the suggested label.

    There are 3 types of exercises:

    All volunteers start with the mandatory introductory exercise that consists of 10 pre-selected texts. Each volunteer must complete this exercise before they can access 2 other exercise types. Upon completion, the volunteer reviews the exercise by comparing their answers with the answers of the rest of the Community using aggregated statistics we provide, i.e., the share of those who accepted and rejected the suggested SDG label for each of the 10 texts. This helps the volunteer to get a feel for the platform.

    SDG-specific exercises where the volunteer validates texts with respect to a single SDG, e.g., SDG 1 No Poverty.

    All SDGs exercise where the volunteer validates a random sequence of texts where each text can have any SDG as its associated label.

    After finishing the introductory exercise, the volunteer is free to select either SDG-specific or All SDGs exercises. Each exercise, regardless of its type, consists of 100 texts. Once the exercise is finished, the volunteer can either label more texts or exit the platform. Of course, the volunteer can finish the exercise early. All progress is saved and recorded still.

    To ensure quality, each text is validated by up to 9 different volunteers and all texts included in the public release of the data have been validated by at least 3 different volunteers.

    It is worth keeping in mind that all exercises present the volunteers with a binary decision problem, i.e., either accept or reject a suggested label. The volunteers are never asked to select one or more SDGs that a certain text might relate to. The rationale behind this set-up is that asking a volunteer to select from 17 SDGs is extremely inefficient. Currently, all texts are validated against only one associated SDG label.

    Column Description

    doi - Digital Object Identifier of the original document

    text_id - unique text identifier

    text - text excerpt from the document

    sdg - the SDG the text is validated against

    labels_negative - the number of volunteers who rejected the suggested SDG label

    labels_positive - the number of volunteers who accepted the suggested SDG label

    agreement - agreement score based on the formula (agreement = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}})

    Further Information

    Do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. All queries can be directed to community@osdg.ai.

  20. T

    GDP by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Jun 29, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2011). GDP by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/gdp
    Explore at:
    csv, json, xml, excelAvailable download formats
    Dataset updated
    Jun 29, 2011
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
Organization logo

Daily Social Media Active Users

"A thorough dataset that displays user activity on major social media platforms

Explore at:
zip(126814 bytes)Available download formats
Dataset updated
May 5, 2025
Authors
Shaik Barood Mohammed Umar Adnaan Faiz
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description:

The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

Dataset Breakdown:

  • Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

  • Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

  • Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

  • Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

  • Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

  • Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

  • Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

Context and Use Cases:

  • This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

Researchers, data scientists, and developers can use this dataset to:

  • Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

  • Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

  • Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

  • Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

  • Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

  • Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

Future Considerations:

As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...

Search
Clear search
Close search
Google apps
Main menu