https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.
It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
Data of image calculation averages, coefficient of variations, and experimental measurements that were presented in the manuscript, Visualizing Plant Responses: Novel Insights Possible through Affordable Imaging Techniques in the Greenhouse, is provided.Abstract: Global climatic pressures and increased human demands create a modern necessity for efficient and affordable plant phenotyping unencumbered by arduous technical requirements. The analysis and archival of imagery have become easier as modern camera technology and computers are leveraged. This facilitates the detection of vegetation status and changes over time. Using a custom lightbox, an inexpensive camera, and common software, turfgrass pots were photographed in a greenhouse environment over an 8-week experiment period. Subsequent imagery was analyzed for area of cover, color metrics, and sensitivity to image corrections. Findings were compared to active spectral reflectance data and previously reported measurements of visual quality, productivity, and water use. Results indicate that Red Green Blue-based (RGB) imagery with simple controls is sufficient to measure the effects of plant treatments. Notable correlations were observed for corrected imagery, including between a percent yellow color area classification segment (%Y) with human visual quality ratings (VQ) (R = -0.89), the dark green color index (DGCI) with clipping productivity in mg d-1 (mg) (R = 0.61), and an index combination term (COMB2) with water use in mm d-1 (mm) (R = -0.60). The calculation of green cover area (%G) correlated with Normalized Difference Vegetation Index (NDVI) (R = 0.91) and its RED reflectance spectra (R = -0.87). A CIELAB b/a chromatic ratio (BA) correlated with Normalized Difference Red-Edge index (NDRE) (R = 0.90), and its Red-Edge (RE) (R = -0.74) reflectance spectra, while a new calculation termed HSVi correlated strongest to the Near-Infrared (NIR) (R = 0.90) reflectance spectra. Additionally, COMB2 significantly differentiated between the treatment effects of date, mowing height, deficit irrigation, and their interactions (p < 0.001). Sensitivity and statistical analysis of typical image file formats and corrections that included JPEG (JPG), TIFF (TIF), geometric lens correction (LC), and color correction (CC) were conducted. Results underscore the need for further research to support image corrections standardization and better connect image data to biological processes. This study demonstrates the potential of consumer-grade photography to capture plant phenotypic traits.
Data for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Vaccination Status. Click 'More' for important dataset description and footnotes
Dataset and data visualization details: These data were posted on October 21, 2022, archived on November 18, 2022, and revised on February 22, 2023. These data reflect cases among persons with a positive specimen collection date through September 24, 2022, and deaths among persons with a positive specimen collection date through September 3, 2022.
Vaccination status: A person vaccinated with a primary series had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably completing the primary series of an FDA-authorized or approved COVID-19 vaccine. An unvaccinated person had SARS-CoV-2 RNA or antigen detected on a respiratory specimen and has not been verified to have received COVID-19 vaccine. Excluded were partially vaccinated people who received at least one FDA-authorized vaccine dose but did not complete a primary series ≥14 days before collection of a specimen where SARS-CoV-2 RNA or antigen was detected. Additional or booster dose: A person vaccinated with a primary series and an additional or booster dose had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after receipt of an additional or booster dose of any COVID-19 vaccine on or after August 13, 2021. For people ages 18 years and older, data are graphed starting the week including September 24, 2021, when a COVID-19 booster dose was first recommended by CDC for adults 65+ years old and people in certain populations and high risk occupational and institutional settings. For people ages 12-17 years, data are graphed starting the week of December 26, 2021, 2 weeks after the first recommendation for a booster dose for adolescents ages 16-17 years. For people ages 5-11 years, data are included starting the week of June 5, 2022, 2 weeks after the first recommendation for a booster dose for children aged 5-11 years. For people ages 50 years and older, data on second booster doses are graphed starting the week including March 29, 2022, when the recommendation was made for second boosters. Vertical lines represent dates when changes occurred in U.S. policy for COVID-19 vaccination (details provided above). Reporting is by primary series vaccine type rather than additional or booster dose vaccine type. The booster dose vaccine type may be different than the primary series vaccine type. ** Because data on the immune status of cases and associated deaths are unavailable, an additional dose in an immunocompromised person cannot be distinguished from a booster dose. This is a relevant consideration because vaccines can be less effective in this group. Deaths: A COVID-19–associated death occurred in a person with a documented COVID-19 diagnosis who died; health department staff reviewed to make a determination using vital records, public health investigation, or other data sources. Rates of COVID-19 deaths by vaccination status are reported based on when the patient was tested for COVID-19, not the date they died. Deaths usually occur up to 30 days after COVID-19 diagnosis. Participating jurisdictions: Currently, these 31 health departments that regularly link their case surveillance to immunization information system data are included in these incidence rate estimates: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Indiana, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Nebraska, New Jersey, New Mexico, New York, New York City (New York), North Carolina, Philadelphia (Pennsylvania), Rhode Island, South Dakota, Tennessee, Texas, Utah, Washington, and West Virginia; 30 jurisdictions also report deaths among vaccinated and unvaccinated people. These jurisdictions represent 72% of the total U.S. population and all ten of the Health and Human Services Regions. Data on cases among people who received additional or booster doses were reported from 31 jurisdictions; 30 jurisdictions also reported data on deaths among people who received one or more additional or booster dose; 28 jurisdictions reported cases among people who received two or more additional or booster doses; and 26 jurisdictions reported deaths among people who received two or more additional or booster doses. This list will be updated as more jurisdictions participate. Incidence rate estimates: Weekly age-specific incidence rates by vaccination status were calculated as the number of cases or deaths divided by the number of people vaccinated with a primary series, overall or with/without a booster dose (cumulative) or unvaccinated (obtained by subtracting the cumulative number of people vaccinated with a primary series and partially vaccinated people from the 2019 U.S. intercensal population estimates) and multiplied by 100,000. Overall incidence rates were age-standardized using the 2000 U.S. Census standard population. To estimate population counts for ages 6 months through 1 year, half of the single-year population counts for ages 0 through 1 year were used. All rates are plotted by positive specimen collection date to reflect when incident infections occurred. For the primary series analysis, age-standardized rates include ages 12 years and older from April 4, 2021 through December 4, 2021, ages 5 years and older from December 5, 2021 through July 30, 2022 and ages 6 months and older from July 31, 2022 onwards. For the booster dose analysis, age-standardized rates include ages 18 years and older from September 19, 2021 through December 25, 2021, ages 12 years and older from December 26, 2021, and ages 5 years and older from June 5, 2022 onwards. Small numbers could contribute to less precision when calculating death rates among some groups. Continuity correction: A continuity correction has been applied to the denominators by capping the percent population coverage at 95%. To do this, we assumed that at least 5% of each age group would always be unvaccinated in each jurisdiction. Adding this correction ensures that there is always a reasonable denominator for the unvaccinated population that would prevent incidence and death rates from growing unrealistically large due to potential overestimates of vaccination coverage. Incidence rate ratios (IRRs): IRRs for the past one month were calculated by dividing the average weekly incidence rates among unvaccinated people by that among people vaccinated with a primary series either overall or with a booster dose. Publications: Scobie HM, Johnson AG, Suthar AB, et al. Monitoring Incidence of COVID-19 Cases, Hospitalizations, and Deaths, by Vaccination Status — 13 U.S. Jurisdictions, April 4–July 17, 2021. MMWR Morb Mortal Wkly Rep 2021;70:1284–1290. Johnson AG, Amin AB, Ali AR, et al. COVID-19 Incidence and Death Rates Among Unvaccinated and Fully Vaccinated Adults with and Without Booster Doses During Periods of Delta and Omicron Variant Emergence — 25 U.S. Jurisdictions, April 4–December 25, 2021. MMWR Morb Mortal Wkly Rep 2022;71:132–138. Johnson AG, Linde L, Ali AR, et al. COVID-19 Incidence and Mortality Among Unvaccinated and Vaccinated Persons Aged ≥12 Years by Receipt of Bivalent Booster Doses and Time Since Vaccination — 24 U.S. Jurisdictions, October 3, 2021–December 24, 2022. MMWR Morb Mortal Wkly Rep 2023;72:145–152. Johnson AG, Linde L, Payne AB, et al. Notes from the Field: Comparison of COVID-19 Mortality Rates Among Adults Aged ≥65 Years Who Were Unvaccinated and Those Who Received a Bivalent Booster Dose Within the Preceding 6 Months — 20 U.S. Jurisdictions, September 18, 2022–April 1, 2023. MMWR Morb Mortal Wkly Rep 2023;72:667–669.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.
In the isodemographic map of the population development Graz 2006 - 2012, the individual districts are shown in proportion to the population in 2012. The size and shape of the district boundaries are adjusted in such a way that a uniform population density prevails in the areas shown; However, the spatial relationship and distribution of the city districts to each other remain largely the same. Specific spatial trends and developments can be derived from this: The most populous municipalities are Jakomini (35,800), Lend (32,000), Gries (29,300) and Geidorf (28,000); The focus of the population of Graz is clearly in the districts around the old town of Graz. The highest growth rates and dynamics in the period May 2006 to May 2012 are recorded in percentage terms in the districts of Lend (+9.2%), Strassgang (+8.5%) and Mariatrost (+8.1). Compared to the absolute growth figures, Lend (+2,700 p.e.), Jakomini (+1,300 p.e.) and Strassgang (+1,200 p.e.) are clearly ahead. This contrasts with the districts with the lowest growth rates over the same period: Ries (+0.3% / 19 p.e.), Innere Stadt (+0.5% / 21 p.e.), Waltendorf (+1.1% / 149 p.e.) and Gries (+1.1% / 332 p.e.).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Percentage Population with Malaria/Fever compared to Proportion of the population who Slept under a bed-net
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
SEMCOG's Community Explorer tool is great for dynamically visualizing demographic and economic data in Southeast Michigan. Use this dataset to extend Community Explorer and make your own visualization.This tool has over 40 indicators across 4 geography types (County, Community, School Districts, Census Tracts). Not only are the data columns available, but we also include the Margin of Error (MOE) to better understand the reliability of each column.IndicatorsTotal PopulationPopulation Density (Persons/Acre)Median AgePercent Age 65+Percent Age 65+ Living AlonePercent Ages 5 to 17Ratio Youth to SeniorsPercent Bachelor's Degree or HigherPercent People in PovertyPercent AsianPercent BlackPercent HispanicPercent WhiteTotal HouseholdsAverage Household SizePercent Households with SeniorsPercent Households with ChildrenPercent Households with No CarPercent Households with Internet AccessTotal Households without Internet AccessPercent Households with Broadband Internet AccessTotal Households without Broadband Internet AccessPercentage Households with Computing DevicesTotal Households without a Desktop or LaptopPercent Seniors with Broadband Internet AccessPercent Children without Broadband Internet AccessPercent Children without Computing DevicesTotal Housing UnitsPercent VacantPercent Owner OccupiedPercent Renter OccupiedPercent Single FamilyPercent Multi-FamilyTotal JobsJob Density (Jobs/Acre)Unemployment RateLabor Force Participation RateMedian Household IncomePer Capita IncomeMedian Housing ValueAverage Commute Time (Minutes)Percent Drive Alone to WorkPercent Commute by Transit
This downloadable map product includes the Provincial Electoral Divisions (PED) from the most recent provincial election. The data in this information product illustrates the boundaries of Alberta's 87 Provincial Electoral Divisions. Electoral Boundaries are defined by the Alberta Election Act, Chapter E-1, 2018. Provincial Electoral Divisions (PEDs) are territorial units represented by an elected Member to serve in the Alberta Provincial Legislative Assembly. The Provincial Electoral Divisions used in this information product were enacted in December 2017 and came into effect for the 2019 provincial general election. The PED profile contains data created by Statistics Canada in the 2021 Census of Population. The map is a Bivariate map with two factors being shown through the fill colours of each divisions the percentage of CERB recipients versus the percentage of working age people with some post-secondary education. The outline colours are showing the dominant employment sectors of each division. The dominant employment sector is determined by the employment sector with the most employees in each division. The main map is at a scale of 1:3,750,000 and the inset maps are at a scale of 1:500,000.
This dataset is a per-state amalgamation of demographic, public health and other relevant predictors for COVID-19.
Used positive
, death
and totalTestResults
from the API for, respectively, Infected
, Deaths
and Tested
in this dataset.
Please read the documentation of the API for more context on those columns
Density is people per meter squared https://worldpopulationreview.com/states/
https://worldpopulationreview.com/states/gdp-by-state/
https://worldpopulationreview.com/states/per-capita-income-by-state/
https://en.wikipedia.org/wiki/List_of_U.S._states_by_Gini_coefficient
Rates from Feb 2020 and are percentage of labor force
https://www.bls.gov/web/laus/laumstrk.htm
Ratio is Male / Female
https://www.kff.org/other/state-indicator/distribution-by-gender/
https://worldpopulationreview.com/states/smoking-rates-by-state/
Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/flu_pneumonia_mortality/flu_pneumonia.htm
Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/lung_disease_mortality/lung_disease.htm
https://www.kff.org/other/state-indicator/total-active-physicians/
https://www.kff.org/other/state-indicator/total-hospitals
Includes spending for all health care services and products by state of residence. Hospital spending is included and reflects the total net revenue. Costs such as insurance, administration, research, and construction expenses are not included.
https://www.kff.org/other/state-indicator/avg-annual-growth-per-capita/
Pollution: Average exposure of the general public to particulate matter of 2.5 microns or less (PM2.5) measured in micrograms per cubic meter (3-year estimate)
https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL
For each state, number of medium and large airports https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_the_United_States
Note that FL was incorrect in the table, but is corrected in the Hottest States paragraph
https://worldpopulationreview.com/states/average-temperatures-by-state/
District of Columbia temperature computed as the average of Maryland and Virginia
Urbanization as a percentage of the population https://www.icip.iastate.edu/tables/population/urban-pct-states
https://www.kff.org/other/state-indicator/distribution-by-age/
Schools that haven't closed are marked NaN https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html
Note that some datasets above did not contain data for District of Columbia, this missing data was found via Google searches manually entered.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This priority index was derived by combining a detailed flood extent mapping with detailed human settlement geo-data. Both sources were combined to produce the location and magnitude of population living in flooded areas. This was subsequently aggregated to admin-4 areas (GND) as well as admin-3 areas (DS divisional).
The flood extent mapping was derived in turn by combining two sources: Flood extent maps could be produced rather faster using satellite imageries captured by either optical sensors or Synthetic Aperture Radar (SAR) sensors. In most places flood is cause by heavy rainfall which means in most cases cloud is present, this is a limitation for optical sensors as they can’t penetrate clouds. Radar sensors are not affected by cloud, which make them more useful in presence of cloud. In This analysis we analyzed sentinel2 optical image from May 28th and Sentinel 1 SAR image from May 30th. Then we combine the two results adding up the flood extents.
Main cloud covered areas and permanent water bodies are removed from the flood extent map using the Sentinel 2 cloud mask. The scale/resolution of the flood extent map is 30mts where as the permanent water body map has 250m scale resolution. This will introduce some discrepancy: part of flood extent map could be permanent water body.
Analysis focused on 4 districts in South-West Sri Lanka based on news reports (https://www.dropbox.com/s/n0qdqe7qfgq6fyv/special_situation.pdf?dl=0). Based on the admin-3-level analysis, highest percentages of population living in flooded areas were seen in Matara district. Admin-4 level analysis concentrated only on Matara district for that reason.
The dataset is showing percentage flooded. The data has not yet been corrected for small populations. We believe the product is currently pointing to the high priority areas. In the shp or csv files the user of this data could easily correct for small populations, if there is a wish to target on the amount of people affected.
The human settlement data was retrieved from http://ciesin.columbia.edu/data/hrsl/. Facebook Connectivity Lab and Center for International Earth Science Information Network - CIESIN - Columbia University. 2016. High Resolution Settlement Layer (HRSL). Source imagery for HRSL © 2016 DigitalGlobe. Accessed 01-06-2017.
The Radar imagery analysis was done by NASA JPL, whose input in this product has been crucial.
An example map is available here: http://bit.ly/SriLankaFloodMap
Admin boundaries 3 and 4 can be found here (link on OBJECT_ID): https://data.humdata.org/group/lka?q=&ext_page_size=25&sort=score+desc%2C+metadata_modified+desc&tags=administrative+boundaries#dataset-filter-start
The ratio column in the SHPs or CSVs can be multiplied by 100 to get the percentage of flooding in the area.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population pyramids provide
a way to visualize the age and sex composition of a geographic region, such as
a nation, state, or county. A standard population pyramid divides sex into two
bar charts or histograms, one for the male population and one for
the female population. The two charts mirror each other and are divide age
into 5-year cohorts. The shape of a population pyramid provides insights
into a region’s fertility, mortality, and migration patterns. When a region has
high fertility and mortality, but low migration the visualization will look
like a pyramid, with the youngest age cohort (0-4 years) representing the largest
percent of the population and each older cohort representing a progressively
smaller percent of the population.
In many regions fertility and mortality have
decreased significantly since 1970, as people live longer and women have fewer
children. With lower fertility and mortality, population pyramids are shaped
more like a pillar.
While population pyramids can be made for any
geographic region, when interpreting population pyramids for smaller areas
(like counties) the most important force that shapes the pyramid is often in-
and out-migration (Wang and vom Hofe, 2006, p. 65). For smaller regions,
population pyramids can have unique shapes.
This data archive provides the resources needed
to generate population pyramids for the United States, individual states, and
any county within the United States. Population pyramids usually require
significant data cleaning and graph making skills to generate one pyramid. With
this data archive the data cleaning has been completed and the R script
provides reusable code to quickly generate graphs. The final output is an image
file with six graphs on one page. The final layout makes it easy to compare
changes in population age and sex composition for any state and any county in
the US for 1970, 1980, 1990, 2000, 2010, and 2017.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4560787%2F1bf7d8acca3f6ca6adbae87c95df1f33%2F1_MIXrCZ0QAVp6qoElgWea-A.jpg?generation=1697784111548502&alt=media" alt="">
Data is the new oil, and this dataset is a wellspring of knowledge waiting to be tapped😷!
Don't forget to upvote and share your insights with the community. Happy data exploration!🥰
** For more related datasets: ** https://www.kaggle.com/datasets/rajatsurana979/fifafcmobile24 https://www.kaggle.com/datasets/rajatsurana979/most-streamed-spotify-songs-2023 https://www.kaggle.com/datasets/rajatsurana979/comprehensive-credit-card-transactions-dataset https://www.kaggle.com/datasets/rajatsurana979/hotel-reservation-data-repository https://www.kaggle.com/datasets/rajatsurana979/percent-change-in-consumer-spending https://www.kaggle.com/datasets/rajatsurana979/fast-food-sales-report/data
Description: Welcome to the world of credit card transactions! This dataset provides a treasure trove of insights into customers' spending habits, transactions, and more. Whether you're a data scientist, analyst, or just someone curious about how money moves, this dataset is for you.
Features: - Customer ID: Unique identifiers for every customer. - Name: First name of the customer. - Surname: Last name of the customer. - Gender: The gender of the customer. - Birthdate: Date of birth for each customer. - Transaction Amount: The dollar amount for each transaction. - Date: Date when the transaction occurred. - Merchant Name: The name of the merchant where the transaction took place. - Category: Categorization of the transaction.
Why this dataset matters: Understanding consumer spending patterns is crucial for businesses and financial institutions. This dataset is a goldmine for exploring trends, patterns, and anomalies in financial behavior. It can be used for fraud detection, marketing strategies, and much more.
Acknowledgments: We'd like to express our gratitude to the contributors and data scientists who helped curate this dataset. It's a collaborative effort to promote data-driven decision-making.
Let's Dive In: Explore, analyze, and visualize this data to uncover the hidden stories in the world of credit card transactions. We look forward to seeing your innovative analyses, visualizations, and applications using this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cancer, the second-leading cause of mortality, kills 16% of people worldwide. Unhealthy lifestyles, smoking, alcohol abuse, obesity, and a lack of exercise have been linked to cancer incidence and mortality. However, it is hard. Cancer and lifestyle correlation analysis and cancer incidence and mortality prediction in the next several years are used to guide people’s healthy lives and target medical financial resources. Two key research areas of this paper are Data preprocessing and sample expansion design Using experimental analysis and comparison, this study chooses the best cubic spline interpolation technology on the original data from 32 entry points to 420 entry points and converts annual data into monthly data to solve the problem of insufficient correlation analysis and prediction. Factor analysis is possible because data sources indicate changing factors. TSA-LSTM Two-stage attention design a popular tool with advanced visualization functions, Tableau, simplifies this paper’s study. Tableau’s testing findings indicate it cannot analyze and predict this paper’s time series data. LSTM is utilized by the TSA-LSTM optimization model. By commencing with input feature attention, this model attention technique guarantees that the model encoder converges to a subset of input sequence features during the prediction of output sequence features. As a result, the model’s natural learning trend and prediction quality are enhanced. The second step, time performance attention, maintains We can choose network features and improve forecasts based on real-time performance. Validating the data source with factor correlation analysis and trend prediction using the TSA-LSTM model Most cancers have overlapping risk factors, and excessive drinking, lack of exercise, and obesity can cause breast, colorectal, and colon cancer. A poor lifestyle directly promotes lung, laryngeal, and oral cancers, according to visual tests. Cancer incidence is expected to climb 18–21% between 2020 and 2025, according to 2021. Long-term projection accuracy is 98.96 percent, and smoking and obesity may be the main cancer causes.
The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area
where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset provides an extensive view of global population statistics and health metrics across various countries from 2014 to 2024. It combines population data with vital health-related indicators, making it a valuable resource for understanding trends in population growth and health outcomes worldwide. Researchers, data scientists, and policymakers can utilize this dataset to analyze correlations between population dynamics and health performance at a global scale.
Key Features: - Country: Name of the country. - Year: Year of the data (2014–2024). - Population: Total population for the respective year and country. - Country Code: ISO 3-letter country codes for easy identification. - Health Expenditure (health_exp): Percentage of GDP spent on healthcare. - Life Expectancy (life_expect): Average life expectancy at birth in years. - Maternal Mortality (maternal_mortality): Maternal deaths per 100,000 live births. - Infant Mortality (infant_mortality): Deaths of infants under 1 year per 1,000 live births. - Neonatal Mortality (neonatal_mortality): Deaths of newborns (0–28 days) per 1,000 live births. - Under-5 Mortality (under_5_mortality): Deaths of children under 5 years per 1,000 live births. - HIV Prevalence (prev_hiv): Percentage of the population living with HIV. - Tuberculosis Incidence (inci_tuberc): Estimated new and relapse TB cases per 100,000 people. - Undernourishment Prevalence (prev_undernourishment): Percentage of the population that is undernourished.
Use Cases: - Health Policy Analysis: Understand trends in healthcare expenditure and its relationship to health outcomes. - Global Health Research: Investigate global or regional disparities in health and nutrition. - Population Studies: Analyze population growth trends alongside health indicators. - Data Visualization: Build visual dashboards for storytelling and impactful data representation.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.
It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.