17 datasets found
  1. Impact of Digital Habits on Mental Health

    • kaggle.com
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahzad Aslam (2025). Impact of Digital Habits on Mental Health [Dataset]. https://www.kaggle.com/datasets/zeesolver/mental-health
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Kaggle
    Authors
    Shahzad Aslam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.

    It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.

    Dataset Applications

    • Quantify how screen‑time, TikTok use, or multi‑platform engagement statistically relate to stress, sleep loss, and mood.
    • Train regression or classification models that forecast stress level or mood score from real‑time digital‑usage metrics.
    • Feed user‑specific data into recommender systems that suggest screen‑time caps or bedtime routines to improve mental health.
    • Provide evidence for guidelines on youth screen‑time limits and platform moderation based on observed stress‑sleep trade‑offs.
    • Serve as a teaching dataset for EDA, feature engineering, and model evaluation in data‑science or psychology curricula.
    • Evaluate app interventions (e.g., screen‑time nudges) by comparing predicted versus actual post‑intervention stress or mood shifts.
    • Cluster individuals into digital‑behavior personas (e.g., “heavy late‑night scrollers”) to tailor mental‑health resources.
    • Generate synthetic time‑series scenarios (what‑if reductions in TikTok hours) to estimate downstream impacts on sleep and stress.
    • Use engineered features (ratio of TikTok hours to total screen‑time, etc.) in broader wellbeing models that include diet or exercise data.
    • Assess whether mental‑health prediction models remain accurate and unbiased across different screen‑time or platform‑use segments. # Column Descriptions
    • screen_time_hours – Daily total screen usage in hours across all devices.
    • social_media_platforms_used – Number of different social media platforms used per day.
    • hours_on_TikTok – Time spent on TikTok daily, in hours.
    • sleep_hours – Average number of sleep hours per night.
    • stress_level – Stress intensity reported on a scale from 1 (low) to 10 (high).
    • mood_score – Self-rated mood on a scale from 2 (poor) to 10 (excell # Inspiration This dataset was inspired by growing concerns about how screen time and social media affect mental health. It enables analysis of the links between digital habits, stress, sleep, and mood—encouraging data-driven solutions for healthier online behavior and emotional well-being. # Ethically Mined Data: This dataset has been ethically mined and synthetically generated without collecting any personally identifiable information. All values are artificial but statistically realistic, allowing safe use in academic, research, and public health projects while fully respecting user privacy and data ethics.
  2. Amount of data created, consumed, and stored 2010-2023, with forecasts to...

    • statista.com
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    May 2024
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.

  3. d

    Data from: Visualizing Plant Responses: Novel Insights Possible through...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +1more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Visualizing Plant Responses: Novel Insights Possible through Affordable Imaging Techniques in the Greenhouse [Dataset]. https://catalog.data.gov/dataset/data-from-visualizing-plant-responses-novel-insights-possible-through-affordable-imaging-t
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Data of image calculation averages, coefficient of variations, and experimental measurements that were presented in the manuscript, Visualizing Plant Responses: Novel Insights Possible through Affordable Imaging Techniques in the Greenhouse, is provided.Abstract: Global climatic pressures and increased human demands create a modern necessity for efficient and affordable plant phenotyping unencumbered by arduous technical requirements. The analysis and archival of imagery have become easier as modern camera technology and computers are leveraged. This facilitates the detection of vegetation status and changes over time. Using a custom lightbox, an inexpensive camera, and common software, turfgrass pots were photographed in a greenhouse environment over an 8-week experiment period. Subsequent imagery was analyzed for area of cover, color metrics, and sensitivity to image corrections. Findings were compared to active spectral reflectance data and previously reported measurements of visual quality, productivity, and water use. Results indicate that Red Green Blue-based (RGB) imagery with simple controls is sufficient to measure the effects of plant treatments. Notable correlations were observed for corrected imagery, including between a percent yellow color area classification segment (%Y) with human visual quality ratings (VQ) (R = -0.89), the dark green color index (DGCI) with clipping productivity in mg d-1 (mg) (R = 0.61), and an index combination term (COMB2) with water use in mm d-1 (mm) (R = -0.60). The calculation of green cover area (%G) correlated with Normalized Difference Vegetation Index (NDVI) (R = 0.91) and its RED reflectance spectra (R = -0.87). A CIELAB b/a chromatic ratio (BA) correlated with Normalized Difference Red-Edge index (NDRE) (R = 0.90), and its Red-Edge (RE) (R = -0.74) reflectance spectra, while a new calculation termed HSVi correlated strongest to the Near-Infrared (NIR) (R = 0.90) reflectance spectra. Additionally, COMB2 significantly differentiated between the treatment effects of date, mowing height, deficit irrigation, and their interactions (p < 0.001). Sensitivity and statistical analysis of typical image file formats and corrections that included JPEG (JPG), TIFF (TIF), geometric lens correction (LC), and color correction (CC) were conducted. Results underscore the need for further research to support image corrections standardization and better connect image data to biological processes. This study demonstrates the potential of consumer-grade photography to capture plant phenotypic traits.

  4. Rates of COVID-19 Cases or Deaths by Age Group and Vaccination Status

    • data.cdc.gov
    • data.virginia.gov
    • +1more
    application/rdfxml +5
    Updated Feb 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDC COVID-19 Response, Epidemiology Task Force (2023). Rates of COVID-19 Cases or Deaths by Age Group and Vaccination Status [Dataset]. https://data.cdc.gov/Public-Health-Surveillance/Rates-of-COVID-19-Cases-or-Deaths-by-Age-Group-and/3rge-nu2a
    Explore at:
    tsv, application/rssxml, csv, application/rdfxml, xml, jsonAvailable download formats
    Dataset updated
    Feb 22, 2023
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    CDC COVID-19 Response, Epidemiology Task Force
    Description

    Data for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Vaccination Status. Click 'More' for important dataset description and footnotes

    Dataset and data visualization details: These data were posted on October 21, 2022, archived on November 18, 2022, and revised on February 22, 2023. These data reflect cases among persons with a positive specimen collection date through September 24, 2022, and deaths among persons with a positive specimen collection date through September 3, 2022.

    Vaccination status: A person vaccinated with a primary series had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably completing the primary series of an FDA-authorized or approved COVID-19 vaccine. An unvaccinated person had SARS-CoV-2 RNA or antigen detected on a respiratory specimen and has not been verified to have received COVID-19 vaccine. Excluded were partially vaccinated people who received at least one FDA-authorized vaccine dose but did not complete a primary series ≥14 days before collection of a specimen where SARS-CoV-2 RNA or antigen was detected. Additional or booster dose: A person vaccinated with a primary series and an additional or booster dose had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after receipt of an additional or booster dose of any COVID-19 vaccine on or after August 13, 2021. For people ages 18 years and older, data are graphed starting the week including September 24, 2021, when a COVID-19 booster dose was first recommended by CDC for adults 65+ years old and people in certain populations and high risk occupational and institutional settings. For people ages 12-17 years, data are graphed starting the week of December 26, 2021, 2 weeks after the first recommendation for a booster dose for adolescents ages 16-17 years. For people ages 5-11 years, data are included starting the week of June 5, 2022, 2 weeks after the first recommendation for a booster dose for children aged 5-11 years. For people ages 50 years and older, data on second booster doses are graphed starting the week including March 29, 2022, when the recommendation was made for second boosters. Vertical lines represent dates when changes occurred in U.S. policy for COVID-19 vaccination (details provided above). Reporting is by primary series vaccine type rather than additional or booster dose vaccine type. The booster dose vaccine type may be different than the primary series vaccine type. ** Because data on the immune status of cases and associated deaths are unavailable, an additional dose in an immunocompromised person cannot be distinguished from a booster dose. This is a relevant consideration because vaccines can be less effective in this group. Deaths: A COVID-19–associated death occurred in a person with a documented COVID-19 diagnosis who died; health department staff reviewed to make a determination using vital records, public health investigation, or other data sources. Rates of COVID-19 deaths by vaccination status are reported based on when the patient was tested for COVID-19, not the date they died. Deaths usually occur up to 30 days after COVID-19 diagnosis. Participating jurisdictions: Currently, these 31 health departments that regularly link their case surveillance to immunization information system data are included in these incidence rate estimates: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Indiana, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Nebraska, New Jersey, New Mexico, New York, New York City (New York), North Carolina, Philadelphia (Pennsylvania), Rhode Island, South Dakota, Tennessee, Texas, Utah, Washington, and West Virginia; 30 jurisdictions also report deaths among vaccinated and unvaccinated people. These jurisdictions represent 72% of the total U.S. population and all ten of the Health and Human Services Regions. Data on cases among people who received additional or booster doses were reported from 31 jurisdictions; 30 jurisdictions also reported data on deaths among people who received one or more additional or booster dose; 28 jurisdictions reported cases among people who received two or more additional or booster doses; and 26 jurisdictions reported deaths among people who received two or more additional or booster doses. This list will be updated as more jurisdictions participate. Incidence rate estimates: Weekly age-specific incidence rates by vaccination status were calculated as the number of cases or deaths divided by the number of people vaccinated with a primary series, overall or with/without a booster dose (cumulative) or unvaccinated (obtained by subtracting the cumulative number of people vaccinated with a primary series and partially vaccinated people from the 2019 U.S. intercensal population estimates) and multiplied by 100,000. Overall incidence rates were age-standardized using the 2000 U.S. Census standard population. To estimate population counts for ages 6 months through 1 year, half of the single-year population counts for ages 0 through 1 year were used. All rates are plotted by positive specimen collection date to reflect when incident infections occurred. For the primary series analysis, age-standardized rates include ages 12 years and older from April 4, 2021 through December 4, 2021, ages 5 years and older from December 5, 2021 through July 30, 2022 and ages 6 months and older from July 31, 2022 onwards. For the booster dose analysis, age-standardized rates include ages 18 years and older from September 19, 2021 through December 25, 2021, ages 12 years and older from December 26, 2021, and ages 5 years and older from June 5, 2022 onwards. Small numbers could contribute to less precision when calculating death rates among some groups. Continuity correction: A continuity correction has been applied to the denominators by capping the percent population coverage at 95%. To do this, we assumed that at least 5% of each age group would always be unvaccinated in each jurisdiction. Adding this correction ensures that there is always a reasonable denominator for the unvaccinated population that would prevent incidence and death rates from growing unrealistically large due to potential overestimates of vaccination coverage. Incidence rate ratios (IRRs): IRRs for the past one month were calculated by dividing the average weekly incidence rates among unvaccinated people by that among people vaccinated with a primary series either overall or with a booster dose. Publications: Scobie HM, Johnson AG, Suthar AB, et al. Monitoring Incidence of COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Status — 13 U.S. Jurisdictions, April 4–July 17, 2021. MMWR Morb Mortal Wkly Rep 2021;70:1284–1290. Johnson AG, Amin AB, Ali AR, et al. COVID-19 Incidence and Death Rates Among Unvaccinated and Fully Vaccinated Adults with and Without Booster Doses During Periods of Delta and Omicron Variant Emergence — 25 U.S. Jurisdictions, April 4–December 25, 2021. MMWR Morb Mortal Wkly Rep 2022;71:132–138. Johnson AG, Linde L, Ali AR, et al. COVID-19 Incidence and Mortality Among Unvaccinated and Vaccinated Persons Aged ≥12 Years by Receipt of Bivalent Booster Doses and Time Since Vaccination — 24 U.S. Jurisdictions, October 3, 2021–December 24, 2022. MMWR Morb Mortal Wkly Rep 2023;72:145–152. Johnson AG, Linde L, Payne AB, et al. Notes from the Field: Comparison of COVID-19 Mortality Rates Among Adults Aged ≥65 Years Who Were Unvaccinated and Those Who Received a Bivalent Booster Dose Within the Preceding 6 Months — 20 U.S. Jurisdictions, September 18, 2022–April 1, 2023. MMWR Morb Mortal Wkly Rep 2023;72:667–669.

  5. f

    Data from: S1 Dataset -

    • figshare.com
    xlsx
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabnawaz Khan; Wang Jie (2025). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0317148.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Rabnawaz Khan; Wang Jie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.

  6. e

    Visualization of the population development in Graz

    • data.europa.eu
    jpeg
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Visualization of the population development in Graz [Dataset]. https://data.europa.eu/data/datasets/101c1f33-344e-4458-82c9-f8a00a83ae0a/embed
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 10, 2024
    Area covered
    Graz
    Description

    In the isodemographic map of the population development Graz 2006 - 2012, the individual districts are shown in proportion to the population in 2012. The size and shape of the district boundaries are adjusted in such a way that a uniform population density prevails in the areas shown; However, the spatial relationship and distribution of the city districts to each other remain largely the same. Specific spatial trends and developments can be derived from this: The most populous municipalities are Jakomini (35,800), Lend (32,000), Gries (29,300) and Geidorf (28,000); The focus of the population of Graz is clearly in the districts around the old town of Graz. The highest growth rates and dynamics in the period May 2006 to May 2012 are recorded in percentage terms in the districts of Lend (+9.2%), Strassgang (+8.5%) and Mariatrost (+8.1). Compared to the absolute growth figures, Lend (+2,700 p.e.), Jakomini (+1,300 p.e.) and Strassgang (+1,200 p.e.) are clearly ahead. This contrasts with the districts with the lowest growth rates over the same period: Ries (+0.3% / 19 p.e.), Innere Stadt (+0.5% / 21 p.e.), Waltendorf (+1.1% / 149 p.e.) and Gries (+1.1% / 332 p.e.).

  7. o

    County Health Visualization based on KIHBS 2005/6 - Dataset - openAFRICA

    • open.africa
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County Health Visualization based on KIHBS 2005/6 - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/county-health-visualization-based-on-kihbs-2005-6
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Percentage Population with Malaria/Fever compared to Proportion of the population who Slept under a bed-net

  8. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +2more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  9. a

    Community Explorer ACS Community Data

    • maps-semcog.opendata.arcgis.com
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Southeast Michigan Council of Governments (2025). Community Explorer ACS Community Data [Dataset]. https://maps-semcog.opendata.arcgis.com/datasets/community-explorer-acs-community-data
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset authored and provided by
    Southeast Michigan Council of Governments
    Area covered
    Description

    SEMCOG's Community Explorer tool is great for dynamically visualizing demographic and economic data in Southeast Michigan. Use this dataset to extend Community Explorer and make your own visualization.This tool has over 40 indicators across 4 geography types (County, Community, School Districts, Census Tracts). Not only are the data columns available, but we also include the Margin of Error (MOE) to better understand the reliability of each column.IndicatorsTotal PopulationPopulation Density (Persons/Acre)Median AgePercent Age 65+Percent Age 65+ Living AlonePercent Ages 5 to 17Ratio Youth to SeniorsPercent Bachelor's Degree or HigherPercent People in PovertyPercent AsianPercent BlackPercent HispanicPercent WhiteTotal HouseholdsAverage Household SizePercent Households with SeniorsPercent Households with ChildrenPercent Households with No CarPercent Households with Internet AccessTotal Households without Internet AccessPercent Households with Broadband Internet AccessTotal Households without Broadband Internet AccessPercentage Households with Computing DevicesTotal Households without a Desktop or LaptopPercent Seniors with Broadband Internet AccessPercent Children without Broadband Internet AccessPercent Children without Computing DevicesTotal Housing UnitsPercent VacantPercent Owner OccupiedPercent Renter OccupiedPercent Single FamilyPercent Multi-FamilyTotal JobsJob Density (Jobs/Acre)Unemployment RateLabor Force Participation RateMedian Household IncomePer Capita IncomeMedian Housing ValueAverage Commute Time (Minutes)Percent Drive Alone to WorkPercent Commute by Transit

  10. u

    Census 2021 Data Visualization by Provincial Electoral Division (PED) – CERB...

    • beta.data.urbandatacentre.ca
    • data.urbandatacentre.ca
    Updated Jun 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Census 2021 Data Visualization by Provincial Electoral Division (PED) – CERB - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://beta.data.urbandatacentre.ca/dataset/ab-gda-8fe6aa77-9e40-4f5e-ac09-5d46388b938c
    Explore at:
    Dataset updated
    Jun 10, 2025
    Area covered
    Canada
    Description

    This downloadable map product includes the Provincial Electoral Divisions (PED) from the most recent provincial election. The data in this information product illustrates the boundaries of Alberta's 87 Provincial Electoral Divisions. Electoral Boundaries are defined by the Alberta Election Act, Chapter E-1, 2018. Provincial Electoral Divisions (PEDs) are territorial units represented by an elected Member to serve in the Alberta Provincial Legislative Assembly. The Provincial Electoral Divisions used in this information product were enacted in December 2017 and came into effect for the 2019 provincial general election. The PED profile contains data created by Statistics Canada in the 2021 Census of Population. The map is a Bivariate map with two factors being shown through the fill colours of each divisions the percentage of CERB recipients versus the percentage of working age people with some post-secondary education. The outline colours are showing the dominant employment sectors of each division. The dominant employment sector is determined by the employment sector with the most employees in each division. The main map is at a scale of 1:3,750,000 and the inset maps are at a scale of 1:500,000.

  11. COVID-19 State Data

    • kaggle.com
    Updated Nov 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Night Ranger (2020). COVID-19 State Data [Dataset]. https://www.kaggle.com/nightranger77/covid19-state-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Night Ranger
    Description

    This dataset is a per-state amalgamation of demographic, public health and other relevant predictors for COVID-19.

    Deaths, Infections and Tests by State

    The COVID Tracking Project: https://covidtracking.com/data/api

    Used positive, death and totalTestResults from the API for, respectively, Infected, Deaths and Tested in this dataset. Please read the documentation of the API for more context on those columns

    Predictor Data and Sources

    Population (2020)

    Density is people per meter squared https://worldpopulationreview.com/states/

    ICU Beds and Age 60+

    https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/

    GDP

    https://worldpopulationreview.com/states/gdp-by-state/

    Income per capita (2018)

    https://worldpopulationreview.com/states/per-capita-income-by-state/

    Gini

    https://en.wikipedia.org/wiki/List_of_U.S._states_by_Gini_coefficient

    Unemployment (2020)

    Rates from Feb 2020 and are percentage of labor force
    https://www.bls.gov/web/laus/laumstrk.htm

    Sex (2017)

    Ratio is Male / Female
    https://www.kff.org/other/state-indicator/distribution-by-gender/

    Smoking Percentage (2020)

    https://worldpopulationreview.com/states/smoking-rates-by-state/

    Influenza and Pneumonia Death Rate (2018)

    Death rate per 100,000 people
    https://www.cdc.gov/nchs/pressroom/sosmap/flu_pneumonia_mortality/flu_pneumonia.htm

    Chronic Lower Respiratory Disease Death Rate (2018)

    Death rate per 100,000 people
    https://www.cdc.gov/nchs/pressroom/sosmap/lung_disease_mortality/lung_disease.htm

    Active Physicians (2019)

    https://www.kff.org/other/state-indicator/total-active-physicians/

    Hospitals (2018)

    https://www.kff.org/other/state-indicator/total-hospitals

    Health spending per capita

    Includes spending for all health care services and products by state of residence. Hospital spending is included and reflects the total net revenue. Costs such as insurance, administration, research, and construction expenses are not included.
    https://www.kff.org/other/state-indicator/avg-annual-growth-per-capita/

    Pollution (2019)

    Pollution: Average exposure of the general public to particulate matter of 2.5 microns or less (PM2.5) measured in micrograms per cubic meter (3-year estimate)
    https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL

    Medium and Large Airports

    For each state, number of medium and large airports https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_the_United_States

    Temperature (2019)

    Note that FL was incorrect in the table, but is corrected in the Hottest States paragraph
    https://worldpopulationreview.com/states/average-temperatures-by-state/
    District of Columbia temperature computed as the average of Maryland and Virginia

    Urbanization (2010)

    Urbanization as a percentage of the population https://www.icip.iastate.edu/tables/population/urban-pct-states

    Age Groups (2018)

    https://www.kff.org/other/state-indicator/distribution-by-age/

    School Closure Dates

    Schools that haven't closed are marked NaN https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html

    Note that some datasets above did not contain data for District of Columbia, this missing data was found via Google searches manually entered.

  12. W

    Priority Index Sri Lanka Floods May 2017

    • cloud.csiss.gmu.edu
    • data.amerigeoss.org
    • +1more
    csv +2
    Updated Jun 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN Humanitarian Data Exchange (2019). Priority Index Sri Lanka Floods May 2017 [Dataset]. https://cloud.csiss.gmu.edu/uddi/ar/dataset/priority-index-sri-lanka-floods-may-2017
    Explore at:
    zip(20780604), shp, kml, geojson, api(1095521), csv(74271), shp, kml, geojson, api(816300), csv(5150)Available download formats
    Dataset updated
    Jun 18, 2019
    Dataset provided by
    UN Humanitarian Data Exchange
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sri Lanka
    Description

    Product

    This priority index was derived by combining a detailed flood extent mapping with detailed human settlement geo-data. Both sources were combined to produce the location and magnitude of population living in flooded areas. This was subsequently aggregated to admin-4 areas (GND) as well as admin-3 areas (DS divisional).

    The flood extent mapping was derived in turn by combining two sources: Flood extent maps could be produced rather faster using satellite imageries captured by either optical sensors or Synthetic Aperture Radar (SAR) sensors. In most places flood is cause by heavy rainfall which means in most cases cloud is present, this is a limitation for optical sensors as they can’t penetrate clouds. Radar sensors are not affected by cloud, which make them more useful in presence of cloud. In This analysis we analyzed sentinel2 optical image from May 28th and Sentinel 1 SAR image from May 30th. Then we combine the two results adding up the flood extents.

    Main cloud covered areas and permanent water bodies are removed from the flood extent map using the Sentinel 2 cloud mask. The scale/resolution of the flood extent map is 30mts where as the permanent water body map has 250m scale resolution. This will introduce some discrepancy: part of flood extent map could be permanent water body.

    Scope

    Analysis focused on 4 districts in South-West Sri Lanka based on news reports (https://www.dropbox.com/s/n0qdqe7qfgq6fyv/special_situation.pdf?dl=0). Based on the admin-3-level analysis, highest percentages of population living in flooded areas were seen in Matara district. Admin-4 level analysis concentrated only on Matara district for that reason.

    Caveats

    The dataset is showing percentage flooded. The data has not yet been corrected for small populations. We believe the product is currently pointing to the high priority areas. In the shp or csv files the user of this data could easily correct for small populations, if there is a wish to target on the amount of people affected.

    Data used from partners

    The human settlement data was retrieved from http://ciesin.columbia.edu/data/hrsl/. Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. High Resolution Settlement Layer (HRSL). Source imagery for HRSL © 2016 DigitalGlobe. Accessed 01-06-2017.

    The Radar imagery analysis was done by NASA JPL, whose input in this product has been crucial.

    Visualization

    An example map is available here: http://bit.ly/SriLankaFloodMap

    Linked data

    Admin boundaries 3 and 4 can be found here (link on OBJECT_ID): https://data.humdata.org/group/lka?q=&ext_page_size=25&sort=score+desc%2C+metadata_modified+desc&tags=administrative+boundaries#dataset-filter-start

    How to use

    The ratio column in the SHPs or CSVs can be multiplied by 100 to get the percentage of flooding in the area.

  13. o

    Population Pyramid Data and R Script for the US, States, and Counties 1970 -...

    • openicpsr.org
    delimited
    Updated Jan 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathanael Rosenheim (2020). Population Pyramid Data and R Script for the US, States, and Counties 1970 - 2017 [Dataset]. http://doi.org/10.3886/E117081V2
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Jan 23, 2020
    Dataset provided by
    Texas A&M University
    Authors
    Nathanael Rosenheim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, Counties, States
    Description

    Population pyramids provide a way to visualize the age and sex composition of a geographic region, such as a nation, state, or county. A standard population pyramid divides sex into two bar charts or histograms, one for the male population and one for the female population. The two charts mirror each other and are divide age into 5-year cohorts. The shape of a population pyramid provides insights into a region’s fertility, mortality, and migration patterns. When a region has high fertility and mortality, but low migration the visualization will look like a pyramid, with the youngest age cohort (0-4 years) representing the largest percent of the population and each older cohort representing a progressively smaller percent of the population.

    In many regions fertility and mortality have decreased significantly since 1970, as people live longer and women have fewer children. With lower fertility and mortality, population pyramids are shaped more like a pillar.

    While population pyramids can be made for any geographic region, when interpreting population pyramids for smaller areas (like counties) the most important force that shapes the pyramid is often in- and out-migration (Wang and vom Hofe, 2006, p. 65). For smaller regions, population pyramids can have unique shapes.

    This data archive provides the resources needed to generate population pyramids for the United States, individual states, and any county within the United States. Population pyramids usually require significant data cleaning and graph making skills to generate one pyramid. With this data archive the data cleaning has been completed and the R script provides reusable code to quickly generate graphs. The final output is an image file with six graphs on one page. The final layout makes it easy to compare changes in population age and sex composition for any state and any county in the US for 1970, 1980, 1990, 2000, 2010, and 2017.

  14. Comprehensive Credit Card Transactions Dataset

    • kaggle.com
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RAJATSURANA979 (2023). Comprehensive Credit Card Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/rajatsurana979/comprehensive-credit-card-transactions-dataset/
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    RAJATSURANA979
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4560787%2F1bf7d8acca3f6ca6adbae87c95df1f33%2F1_MIXrCZ0QAVp6qoElgWea-A.jpg?generation=1697784111548502&alt=media" alt="">

    Data is the new oil, and this dataset is a wellspring of knowledge waiting to be tapped😷!

    Don't forget to upvote and share your insights with the community. Happy data exploration!🥰

    ** For more related datasets: ** https://www.kaggle.com/datasets/rajatsurana979/fifafcmobile24 https://www.kaggle.com/datasets/rajatsurana979/most-streamed-spotify-songs-2023 https://www.kaggle.com/datasets/rajatsurana979/comprehensive-credit-card-transactions-dataset https://www.kaggle.com/datasets/rajatsurana979/hotel-reservation-data-repository https://www.kaggle.com/datasets/rajatsurana979/percent-change-in-consumer-spending https://www.kaggle.com/datasets/rajatsurana979/fast-food-sales-report/data

    Description: Welcome to the world of credit card transactions! This dataset provides a treasure trove of insights into customers' spending habits, transactions, and more. Whether you're a data scientist, analyst, or just someone curious about how money moves, this dataset is for you.

    Features: - Customer ID: Unique identifiers for every customer. - Name: First name of the customer. - Surname: Last name of the customer. - Gender: The gender of the customer. - Birthdate: Date of birth for each customer. - Transaction Amount: The dollar amount for each transaction. - Date: Date when the transaction occurred. - Merchant Name: The name of the merchant where the transaction took place. - Category: Categorization of the transaction.

    Why this dataset matters: Understanding consumer spending patterns is crucial for businesses and financial institutions. This dataset is a goldmine for exploring trends, patterns, and anomalies in financial behavior. It can be used for fraud detection, marketing strategies, and much more.

    Acknowledgments: We'd like to express our gratitude to the contributors and data scientists who helped curate this dataset. It's a collaborative effort to promote data-driven decision-making.

    Let's Dive In: Explore, analyze, and visualize this data to uncover the hidden stories in the world of credit card transactions. We look forward to seeing your innovative analyses, visualizations, and applications using this dataset.

  15. f

    Summary of predicted cancer incidence rate.

    • figshare.com
    xls
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabnawaz Khan; Wang Jie (2025). Summary of predicted cancer incidence rate. [Dataset]. http://doi.org/10.1371/journal.pone.0317148.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Rabnawaz Khan; Wang Jie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.

  16. census-bureau-international

    • kaggle.com
    zip
    Updated May 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). census-bureau-international [Dataset]. https://www.kaggle.com/bigquery/census-bureau-international
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    May 6, 2020
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    Context

    The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

    Sample Query 1

    What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

    standardSQL

    SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

    Sample Query 2

    Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

    standardSQL

    SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

    Sample Query 3

    The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

    SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

    Update frequency

    Historic (none)

    Dataset source

    United States Census Bureau

    Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data

  17. World Population & Health Data 2014 - 2024

    • kaggle.com
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faizal Rosyid (2025). World Population & Health Data 2014 - 2024 [Dataset]. https://www.kaggle.com/datasets/faizalrosyid/world-population-and-health-data-2014-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    Kaggle
    Authors
    Faizal Rosyid
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    World
    Description

    This dataset provides an extensive view of global population statistics and health metrics across various countries from 2014 to 2024. It combines population data with vital health-related indicators, making it a valuable resource for understanding trends in population growth and health outcomes worldwide. Researchers, data scientists, and policymakers can utilize this dataset to analyze correlations between population dynamics and health performance at a global scale.

    Key Features: - Country: Name of the country. - Year: Year of the data (2014–2024). - Population: Total population for the respective year and country. - Country Code: ISO 3-letter country codes for easy identification. - Health Expenditure (health_exp): Percentage of GDP spent on healthcare. - Life Expectancy (life_expect): Average life expectancy at birth in years. - Maternal Mortality (maternal_mortality): Maternal deaths per 100,000 live births. - Infant Mortality (infant_mortality): Deaths of infants under 1 year per 1,000 live births. - Neonatal Mortality (neonatal_mortality): Deaths of newborns (0–28 days) per 1,000 live births. - Under-5 Mortality (under_5_mortality): Deaths of children under 5 years per 1,000 live births. - HIV Prevalence (prev_hiv): Percentage of the population living with HIV. - Tuberculosis Incidence (inci_tuberc): Estimated new and relapse TB cases per 100,000 people. - Undernourishment Prevalence (prev_undernourishment): Percentage of the population that is undernourished.

    Use Cases: - Health Policy Analysis: Understand trends in healthcare expenditure and its relationship to health outcomes. - Global Health Research: Investigate global or regional disparities in health and nutrition. - Population Studies: Analyze population growth trends alongside health indicators. - Data Visualization: Build visual dashboards for storytelling and impactful data representation.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shahzad Aslam (2025). Impact of Digital Habits on Mental Health [Dataset]. https://www.kaggle.com/datasets/zeesolver/mental-health
Organization logo

Impact of Digital Habits on Mental Health

Screen Time & Social Media: Effects on Stress, Sleep, and Mood

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 14, 2025
Dataset provided by
Kaggle
Authors
Shahzad Aslam
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.

It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.

Dataset Applications

  • Quantify how screen‑time, TikTok use, or multi‑platform engagement statistically relate to stress, sleep loss, and mood.
  • Train regression or classification models that forecast stress level or mood score from real‑time digital‑usage metrics.
  • Feed user‑specific data into recommender systems that suggest screen‑time caps or bedtime routines to improve mental health.
  • Provide evidence for guidelines on youth screen‑time limits and platform moderation based on observed stress‑sleep trade‑offs.
  • Serve as a teaching dataset for EDA, feature engineering, and model evaluation in data‑science or psychology curricula.
  • Evaluate app interventions (e.g., screen‑time nudges) by comparing predicted versus actual post‑intervention stress or mood shifts.
  • Cluster individuals into digital‑behavior personas (e.g., “heavy late‑night scrollers”) to tailor mental‑health resources.
  • Generate synthetic time‑series scenarios (what‑if reductions in TikTok hours) to estimate downstream impacts on sleep and stress.
  • Use engineered features (ratio of TikTok hours to total screen‑time, etc.) in broader wellbeing models that include diet or exercise data.
  • Assess whether mental‑health prediction models remain accurate and unbiased across different screen‑time or platform‑use segments. # Column Descriptions
  • screen_time_hours – Daily total screen usage in hours across all devices.
  • social_media_platforms_used – Number of different social media platforms used per day.
  • hours_on_TikTok – Time spent on TikTok daily, in hours.
  • sleep_hours – Average number of sleep hours per night.
  • stress_level – Stress intensity reported on a scale from 1 (low) to 10 (high).
  • mood_score – Self-rated mood on a scale from 2 (poor) to 10 (excell # Inspiration This dataset was inspired by growing concerns about how screen time and social media affect mental health. It enables analysis of the links between digital habits, stress, sleep, and mood—encouraging data-driven solutions for healthier online behavior and emotional well-being. # Ethically Mined Data: This dataset has been ethically mined and synthetically generated without collecting any personally identifiable information. All values are artificial but statistically realistic, allowing safe use in academic, research, and public health projects while fully respecting user privacy and data ethics.
Search
Clear search
Close search
Google apps
Main menu