Facebook
TwitterThis child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.
Facebook
TwitterSelected variables from the most recent ACS Community Survey (Released 2023) aggregated by Community Area. Additional years will be added as they become available. The underlying algorithm to create the dataset calculates the % of a census tract that falls within the boundaries of a given community area. Given that census tracts and community area boundaries are not aligned, these figures should be considered an estimate. Total population in this dataset: 2,647,621 Total Chicago Population Per ACS 2023: 2,664,452 % Difference: -0.632% There are different approaches in common use for displaying Hispanic or Latino population counts. In this dataset, following the approach taken by the Census Bureau, a person who identifies as Hispanic or Latino will also be counted in the race category with which they identify. However, again following the Census Bureau data, there is also a column for White Not Hispanic or Latino. Code can be found here: https://github.com/Chicago/5-Year-ACS-Survey-Data Community Area Shapefile: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6 Census Area Python Package Documentation: https://census-area.readthedocs.io/en/latest/index.html
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This workflow provides the prototype components of open dataset tools in KNIME Python-based Geospatial Extension, Users can acquire the data by easily defining the variable and geographic level. It contains 4 nodes: US2020 TIGER for US Basemap( Census Block, Block Group, Tract, and County), US2020 Census for Decennial Census P.L. 94-171 Redistricting Data US ACS-5: for the data of American Community Survey (ACS) 5 Years. GeoView: for geodata visualization Requirements: US Census API key:https://api.census.gov/data/key_signup.html KNIME Extension: KNIME Python Integration Python Package: geopandas, requests, matplotlib
Facebook
TwitterUS Census Bureau conducts American Census Survey 1 and 5 Yr surveys that record various demographics and provide public access through APIs. I have attempted to call the APIs through the python environment using the requests library, Clean, and organize the data in a usable format.
ACS Subject data [2011-2019] was accessed using Python by following the below API Link:
https://api.census.gov/data/2011/acs/acs1?get=group(B08301)&for=county:*
The data was obtained in JSON format by calling the above API, then imported as Python Pandas Dataframe. The 84 variables returned have 21 Estimate values for various metrics, 21 pairs of respective Margin of Error, and respective Annotation values for Estimate and Margin of Error Values. This data was then undergone through various cleaning processes using Python, where excess variables were removed, and the column names were renamed. Web-Scraping was carried out to extract the variables' names and replace the codes in the column names in raw data.
The above step was carried out for multiple ACS/ACS-1 datasets spanning 2011-2019 and then merged into a single Python Pandas Dataframe. The columns were rearranged, and the "NAME" column was split into two columns, namely 'StateName' and 'CountyName.' The counties for which no data was available were also removed from the Dataframe. Once the Dataframe was ready, it was separated into two new dataframes for separating State and County Data and exported into '.csv' format
More information about the source of Data can be found at the URL below:
US Census Bureau. (n.d.). About: Census Bureau API. Retrieved from Census.gov
https://www.census.gov/data/developers/about.html
I hope this data helps you to create something beautiful, and awesome. I will be posting a lot more databases shortly, if I get more time from assignments, submissions, and Semester Projects 🧙🏼‍♂️. Good Luck.
Facebook
Twitter🇺🇸 ëŻ¸ęµ English Selected variables from the most recent 5 year ACS Community Survey (Released 2023) aggregated by Ward. Additional years will be added as they become available. The underlying algorithm to create the dataset calculates the percent of a census tract that falls within the boundaries of a given ward. Given that census tracts and ward boundaries are not aligned, these figures should be considered an estimate. Total Population in this Dataset: 2,649,803 Total Population of Chicago reported by ACS 2023: 2,664,452 % Difference: %-0.55 There are different approaches in common use for displaying Hispanic or Latino population counts. In this dataset, following the approach taken by the Census Bureau, a person who identifies as Hispanic or Latino will also be counted in the race category with which they identify. However, again following the Census Bureau data, there is also a column for White Not Hispanic or Latino. The City of Chicago is actively soliciting community input on how best to represent race, ethnicity, and related concepts in its data and policy. Every dataset, including this one, has a "Contact dataset owner" link in the Actions menu. You can use it to offer any input you wish to share or to indicate if you would be interested in participating in live discussions the City may host. Code can be found here: https://github.com/Chicago/5-Year-ACS-Survey-Data Ward Shapefile: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2023-Map/cdf7-bgn3 Census Area Python Package Documentation: https://census-area.readthedocs.io/en/latest/index.html
Facebook
TwitterThe United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.
What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!
SELECT
age.country_name,
age.life_expectancy,
size.country_area
FROM (
SELECT
country_name,
life_expectancy
FROM
bigquery-public-data.census_bureau_international.mortality_life_expectancy
WHERE
year = 2016) age
INNER JOIN (
SELECT
country_name,
country_area
FROM
bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size
ON
age.country_name = size.country_name
ORDER BY
2 DESC
/* Limit removed for Data Studio Visualization */
LIMIT
10
Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.
SELECT
age.country_name,
SUM(age.population) AS under_25,
pop.midyear_population AS total,
ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25
FROM (
SELECT
country_name,
population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population_agespecific
WHERE
year =2017
AND age < 25) age
INNER JOIN (
SELECT
midyear_population,
country_code
FROM
bigquery-public-data.census_bureau_international.midyear_population
WHERE
year = 2017) pop
ON
age.country_code = pop.country_code
GROUP BY
1,
3
ORDER BY
4 DESC /* Remove limit for visualization*/
LIMIT
10
The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.
SELECT
growth.country_name,
growth.net_migration,
CAST(area.country_area AS INT64) AS country_area
FROM (
SELECT
country_name,
net_migration,
country_code
FROM
bigquery-public-data.census_bureau_international.birth_death_growth_rates
WHERE
year = 2017) growth
INNER JOIN (
SELECT
country_area,
country_code
FROM
bigquery-public-data.census_bureau_international.country_names_area
Historic (none)
United States Census Bureau
Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Executive Summary Farmers' markets are an important part of building community, ethically sourcing food, and creating a culture around sustainable habits. In this project, I worked to source data for farmers' markets in North Carolina. Due to their impact on the community, I also joined this data with census data to obtain a better understanding of how they are distributed and what insights they can provide us socially and economically. This dataset can also be used with other census data as it has digestible location data and further research in social science fields.
Data The data includes farmers' market data, web scraped from the North Carolina Department of Agriculture and Food Services joined with census data from 2019, the most recent year I could find. The web scraping gathered the farmers' market name, address, and contact info, while the census data gave total population, median income, and the number of people from 18-30 based on zipcode. This data is unique in this field due to its recency. It is possible to find similar data through the Department of Agriculture, but that data is often outdated and can contain mistakes on a more granular level. This script I've constructed allows the most recent data to be pulled in North Carolina.
Power Analysis I conducted a power analysis with intention to find if the populations based on zipcodes with farmers' markets were significantly different than the average zipcode population of North Carolina, using a significance level of .05 and power of .8, resulting in a required sample of 127.52.
Exploratory Data Analysis You can find exploratory data analysis in the eda.py file to better acclimate yourself with the data. There were 247 farmers' markets collected, and three census variables were attached. Other distribution metrics are included with visualizations as well as general information on the data.
Link to Github https://github.com/tejasj02/Farmers-Market-Data-Curation
Ethics statement This dataset was curated on publicly available sources with intention to further research and information in this social science field. All scraping and data gathering was done ethically, not breaching any rules. Farmers' Market data was obtained from the North Carolina Department of Agriculture and Consumer Services while the census data was imported from the censusdata python library. Data is public and up to date as of 11/25/2024. Can be run with adjusted code to be updated. The dataset is open source and should adhere to normal ethical boundaries.
Facebook
TwitterDataset Card for Census Income (Adult)
This dataset is a precise version of Adult or Census Income. This dataset from UCI somehow happens to occupy two links, but we checked and confirm that they are identical. We used the following python script to create this Hugging Face dataset. import pandas as pd from datasets import Dataset, DatasetDict, Features, Value, ClassLabel
url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" url2 =… See the full description on the dataset page: https://huggingface.co/datasets/cestwc/census-income.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains many files; use the Tree view below to get a condensed overview of what is available. This dataset provides harmonized geographic boundary shapefiles and crosswalks sourced from the U.S. Census Bureau and accessed via the pygris Python library. It includes: ZCTA (ZIP Code Tabulation Area) shapefiles County shapefiles ZCTA-to-county crosswalk files Unique lists of ZCTAs and counties by year Shapefiles: The column names and column types are harmonized for consistency across years. The cartographic boundaries are selected across years enabling longitudinal spatial analysis and integration with external datasets such as demographic or health data. Crosswalks are fetched directly from U.S. Census sources and processed to ensure compatibility and ease of use. All files are structured to support reproducible, year-over-year spatial analyses.
Facebook
TwitterThis project combines data extraction, predictive modeling, and geospatial mapping to analyze housing trends in Mercer County, New Jersey. It consists of three core components: Census Data Extraction: Gathers U.S. Census data (2012–2022) on median house value, household income, and racial demographics for all census tracts in the county. It accounts for changes in census tract boundaries between 2010 and 2020 by approximating values for newly defined tracts. House Value Prediction: Uses an LSTM model with k-fold cross-validation to forecast median house values through 2025. Multiple feature combinations and sequence lengths are tested to optimize prediction accuracy, with the final model selected based on MSE and MAE scores. Data Mapping: Visualizes historical and predicted housing data using GeoJSON files from the TIGERWeb API. It generates interactive maps showing raw values, changes over time, and percent differences, with customization options to handle outliers and improve interpretability. This modular workflow can be adapted to other regions by changing the input FIPS codes and feature selections.
Facebook
TwitterThis data was collected and created for a project in a data science course I took in college in the Spring of 2020. I have updated the data to include more dates into the summer and decided to share it and the code so others can explore it.
Available here: https://hifld-geoplatform.opendata.arcgis.com/datasets/hospitals
Information on hospitals in the United States.
Available here: https://github.com/nytimes/covid-19-data
Daily covid cases and death data for us counties.
Available here: https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/
Data sheet available here: https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/co-est2019-alldata.pdf
2019 county level census estimates.
Available here: https://covidtracking.com/api/v1/states/daily.csv
Daily state level covid testing data.
Uploaded with Git LFS
Intereim data views created by me to hold cleaned data and used to create the final datset.
Final combined dataset, a days X 3142(num of us counties+dc) long time series with variables stored as a proportion of population.
Uploaded with Git LFS
The python scripts have comments to explain which datasets they're responsible for generating.
Feel free to use and edit them to tailor the datasets generated to your liking.
There is also a helper function library in the main directory.
Scripts can be ran by calling >python
Facebook
TwitterThe provided Python code is developed to extract data from the Federal Reserve Economic Data (FRED) regarding Bachelor's or Higher degree education in the United States, specifically at the state and county levels. The code generates data based on the current date and is available up until the year 2021.
This code is useful for research purposes, particularly for conducting comparative analyses involving educational and economic indicators. There are two distinct CSV files associated with this code. One file contains information on the percentage of Bachelor's or Higher degree holders among residents of all USA states, while the other file provides data on states, counties, and municipalities throughout the entire USA.
The extraction process involves applying different criteria, including content filtering (such as title, frequency, seasonal adjustment, and unit) and collaborative filtering based on item similarity. For the first CSV file, the algorithm extracts data for each state in the USA and assigns corresponding state names to the respective FRED codes using a loop. Similarly, for the second CSV file, data is extracted based on a given query, encompassing USA states, counties, and municipalities.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The first three non-intercept terms represent indicator variables for the different seasons, with fall being the baseline. The Vaccine Available term represents a binary variable for whether the COVID-19 initial vaccination series was publicly available or not. Weekend is a binary variable for whether the data was collected on Saturday or Sunday. The four Income Bracket terms are indicator variables for the median income level of the census tract where the data was collected. The income brackets are defined in our methods. Lastly, the More than 55.5% White term is an indicator variable for if the census tract in question had a populace that is more than 55.5% White. Full documentation for the Python package used to make this output is available from the developers [38].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To conduct this study, I sourced demographic data from 2010 to 2023 from the California Elections Data Archive (CEDA) for city council members and school board members. The CEDA data provide a full list of candidate names and the number of votes a given candidate received for every city council and school board election. I assigned the gender to each candidate based on the lists of popular male and female names provided by the Social Security Administration. Since the average age of city council members is 46 years old according to the Bureau of Labor Statistics, I compiled a list of popular male and female given names for babies born in the 1960s, 1970s, and 1980s. Then, I automated the gender classification as follows: for example, as “Lisa” is identified as a popular female given name by the Social Security Administration, every candidate whose first name is “Lisa” was assigned “female” in our dataset. For a gender-neutral name that appeared on the lists for both male and female given names, which included “Alex” and “Casey,” I used the following keywords “[first name] [last name] [office type (either “city council” or “school board”)] [name of the city or the school district]” to search for more information about the official’s gender online. My search returned either a picture to help clearly identify the official’s gender and/or an article that refers to the official with gendered pronouns. To identify the ethnicity of each elected official, I used the 2010 Census data and the 23AndMe Surname Discovery Tool. The 2010 Census lists surnames occurring at least 100 times, and it includes self-reported ethnicity data for individuals with a given surname. Similarly, the 23AndMe Surname Discovery Tool gives the percentage of individuals with the given surname who identify as each of four different ethnicity groups: Hispanic, White, Asian/Pacific Islander, and Black based on the 2010 US Census data. For surnames that did not appear on either the 2010 Census data or the 23AndMe Surname Discovery Tool, I used Python’s Ethnicolr library, which bases its prediction of ethnicity using either both first and last name or just the last name on the US census data (2000 and 2010), the Florida voting registration data, and the Wikipedia data.
Facebook
TwitterThe U.S. Geological Survey is developing national water-use models to support water resources management in the United States. Model benefits include a nationally consistent estimation approach, greater temporal and spatial resolution of estimates, efficient and automated updates of results, and capabilities to forecast water use into the future and assess model uncertainty. The term “reanalysis” refers to the process of reevaluating and recalculating water-use data using updated or refined methods, data sources, models, or assumptions. In this data release, water use refers to water that is withdrawn by public and private water suppliers and includes water provided for domestic, commercial, industrial, thermoelectric power, and public water uses, as well as water that is consumed or lost within the public supply system. Consumptive use refers to water withdrawn by the public supply system that is evaporated, transpired, incorporated into products or crops, or consumed by humans or livestock. This data release contains data used in a machine learning model (child item 2) to estimate monthly water use for communities that are supplied by public-supply water systems in the conterminous United States for 2000-2020. This data release also contains associated scripts used to produce input features (child items 4 - 8) as well as model water use estimates by 12-digit hydrologic unit code (HUC12) and public supply water service area (WSA). HUC12 boundaries are in child item 3. Public supply delivery and consumptive use estimates are in child items 1 and 9, respectively. First posted: November 1, 2023 Revised: August 8, 2024 This version replaces the previous version of the data release: Luukkonen, C.L., Alzraiee, A.H., Larsen, J.D., Martin, D.J., Herbert, D.M., Buchwald, C.A., Houston, N.A., Valseth, K.J., Paulinski, S., Miller, L.D., Niswonger, R.G., Stewart, J.S., and Dieter, C.A., 2023, Public supply water use reanalysis for the 2000-2020 period by HUC12, month, and year for the conterminous United States: U.S. Geological Survey data release, https://doi.org/10.5066/P9FUL880 Version 2.0 This data release has been updated as of 8/8/2024. The previous version has been replaced because some fractions used for downscaling WSA estimates to HUC12 did not sum to one for some WSAs in Virginia. Updated model water use estimates by HUC12 are included in this version. A change was made in two scripts to check for this condition. Output files have also been updated to preserve the leading zero in in the HUC12 codes. Additional files are also included to provide information about mapping the WSAs and groundwater and surface water fractions to HUC12 and to provide public supply water-use estimates by WSA. The 'Machine learning model that estimates total monthly and annual per capita public supply water use' child item has been updated with these corrections and additional files. A new child item 'R code used to estimate public supply consumptive water use' has been added to provide estimates of public supply consumptive use. This page includes the following files: PS_HUC12_Tot_2000_2020.csv - a csv file with estimated monthly public supply total water use from 2000-2020 by HUC12, in million gallons per day PS_HUC12_GW_2000_2020.csv - a csv file with estimated monthly public supply groundwater use for 2000-2020 by HUC12, in million gallons per day PS_HUC12_SW_2000_2020.csv - a csv file with estimated monthly public supply surface water use for 2000-2020 by HUC12, in million gallons per day PS_WSA_Tot_2000_2020.csv - a csv file with estimated monthly public supply total water use from 2000-2020 by WSA, in million gallons per day PS_WSA_GW_2000_2020.csv - a csv file with estimated monthly public supply groundwater use for 2000-2020 by WSA, in million gallons per day PS_WSA_SW_2000_2020.csv - a csv file with estimated monthly public supply surface water use for 2000-2020 by WSA, in million gallons per day Note: 1) Groundwater and surface water fractions were determined using source counts as described in the 'R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units' child item. 2) Some HUC12s have estimated water use of zero because no public-supply water service areas were modeled within the HUC. change_files_format.py - A Python script used to change the water use estimates by WSA and HUC12 files from wide format to the thin and long format version_history.txt - a txt file describing changes in this version The data release is organized into these items: 1. Machine learning model that estimates public supply deliveries for domestic and other use types - The public supply delivery model estimates total delivery of domestic, commercial, industrial, institutional, and irrigation (CII) water use for public supply water service areas within the conterminous United States. This item contains model input datasets, code used to build the delivery machine learning model, and output predictions. 2. Machine learning model that estimates total monthly and annual per capita public supply water use - The public supply water use model estimates total monthly water use for 12-digit hydrologic units within the conterminous United States. This item contains model input datasets, code used to build the water use machine learning model, and output predictions. 3. National watershed boundary (HUC12) dataset for the conterminous United States, retrieved 10/26/2020 - Spatial data consisting of a shapefile with 12-digit hydrologic units for the conterminous United States retrieved 10/26/2020. 4. Python code used to determine average yearly and monthly tourism per 1000 residents for public-supply water service areas - This code was used to create a feature for the public supply model that provides information for areas affected by population increases due to tourism. 5. Python code used to download gridMET climate data for public-supply water service areas - The climate data collector is a tool used to query climate data which are used as input features in the public supply models. 6. Python code used to download U.S. Census Bureau data for public-supply water service areas - The census data collector is a geographic based tool to query census data which are used as input features in the public supply models. 7. R code that determines buying and selling of water by public-supply water service areas - This code was used to create a feature for the public supply model that indicates whether public-supply systems buy water, sell water, or neither buy nor sell water. 8. R code that determines groundwater and surface water source fractions for public-supply water service areas, counties, and 12-digit hydrologic units - This code was used to determine source water fractions (groundwater and/or surface water) for public supply systems and HUC12s. 9. R code used to estimate public supply consumptive water use - This code was used to estimate public supply consumptive water use using an assumed fraction of deliveries for outdoor irrigation and estimates of evaporative demand. This item contains estimated monthly public supply consumptive use datasets by HUC12 and WSA.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the underlying data sets needed to perform the peak analysis and create the land-use regression (LUR) models described in the paper. There are four datasets:
"RAMP_Location.csv": Locations and IDs of the low-cost sensors used in this work. [NOTE: latitudes and longitudes for the sensor deployments have been intentionally rounded to protect the location of volunteer sensor hosts.]
"RAMP_data.zip": This contains the csv files of calibrated PM2.5, NO, NO2, CO and O3 measurements for the entire study period across all low-cost sensor sites.
"Vancouver_Population_Density_2016.zip": Shapefile of the population within each Dissemination Area from the 2016 Canadian Census. This information was originally extracted from the Canadian Census Analyser supported by the University of Toronto.
"smell_van_data.csv": Contains the locations, date, and description of odor reports during the monitoring period from the Smell Vancouver website (https://smell-vancouver.ca)
There is also sample code in Python to perform the peak analysis and create the LURs. [NOTE: we have intentionally excluded uploading the exact data sets imported by this code; our original data contains exact locations of sensor host volunteers and thus cannot be shared.]
"Peak_analysis_geohealth.py": A Python script to perform the peak analysis from the paper.
"LUR_strathcona_geohealth.py": A Python script to create the LURs and maps of LUR results from the paper.
Links to other data used in the code from publicly available sources:
Railway locations - https://opendata.vancouver.ca/explore/dataset/railways/information/
Public streets - https://opendata.vancouver.ca/explore/dataset/public-streets/information/
Bus stops - https://abacus.library.ubc.ca/dataset.xhtml?persistentId=hdl:11272.1/AB2/QQLSCJ
Block outlines - https://opendata.vancouver.ca/explore/dataset/block-outlines/information/
Facebook
TwitterShapefiles for mapping and understanding overlaps
sf package in R. geopandas in Python.
https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
Facebook
TwitterBy Noah Rippner [source]
This dataset offers a unique opportunity to examine the pattern and trends of county-level cancer rates in the United States at the individual county level. Using data from cancer.gov and the US Census American Community Survey, this dataset allows us to gain insight into how age-adjusted death rate, average deaths per year, and recent trends vary between counties – along with other key metrics like average annual counts, met objectives of 45.5?, recent trends (2) in death rates, etc., captured within our deep multi-dimensional dataset. We are able to build linear regression models based on our data to determine correlations between variables that can help us better understand cancers prevalence levels across different counties over time - making it easier to target health initiatives and resources accurately when necessary or desired
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This kaggle dataset provides county-level datasets from the US Census American Community Survey and cancer.gov for exploring correlations between county-level cancer rates, trends, and mortality statistics. This dataset contains records from all U.S counties concerning the age-adjusted death rate, average deaths per year, recent trend (2) in death rates, average annual count of cases detected within 5 years, and whether or not an objective of 45.5 (1) was met in the county associated with each row in the table.
To use this dataset to its fullest potential you need to understand how to perform simple descriptive analytics which includes calculating summary statistics such as mean, median or other numerical values; summarizing categorical variables using frequency tables; creating data visualizations such as charts and histograms; applying linear regression or other machine learning techniques such as support vector machines (SVMs), random forests or neural networks etc.; differentiating between supervised vs unsupervised learning techniques etc.; reviewing diagnostics tests to evaluate your models; interpreting your findings; hypothesizing possible reasons and patterns discovered during exploration made through data visualizations ; Communicating and conveying results found via effective presentation slides/documents etc.. Having this understanding will enable you apply different methods of analysis on this data set accurately ad effectively.
Once these concepts are understood you are ready start exploring this data set by first importing it into your visualization software either tableau public/ desktop version/Qlikview / SAS Analytical suite/Python notebooks for building predictive models by loading specified packages based on usage like Scikit Learn if Python is used among others depending on what tool is used . Secondly a brief description of the entire table's column structure has been provided above . Statistical operations can be carried out with simple queries after proper knowledge of basic SQL commands is attained just like queries using sub sets can also be performed with good command over selecting columns while specifying conditions applicable along with sorting operations being done based on specific attributes as required leading up towards writing python codes needed when parsing specific portion of data desired grouping / aggregating different categories before performing any kind of predictions / models can also activated create post joining few tables possible , when ever necessary once again varying across tools being used Thereby diving deep into analyzing available features determined randomly thus creating correlation matrices figures showing distribution relationships using correlation & covariance matrixes , thus making evaluations deducing informative facts since revealing trends identified through corresponding scatter plots from a given metric gathered from appropriate fields!
- Building a predictive cancer incidence model based on county-level demographic data to identify high-risk areas and target public health interventions.
- Analyzing correlations between age-adjusted death rate, average annual count, and recent trends in order to develop more effective policy initiatives for cancer prevention and healthcare access.
- Utilizing the dataset to construct a machine learning algorithm that can predict county-level mortality rates based on socio-economic factors such as poverty levels and educational attainment rates
If you use this dataset i...
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This database consists of a high-resolution village-level drought dataset for major Indian states for the past 43 years (1981 – 2022) for each month. It was created by utilising the CHIRPS precipitation and GLEAM evapotranspiration datasets. GLEAMS dataset based on the well recognised Priestley-Taylor equation to estimate potential evapotranspiration (PET) based on observations of surface net radiation and near-surface air temperature. The SPEI was calculated for spatial grids of 5x5 km for the SPEI 3-month time scale, suitable for agricultural drought monitoring.This high-resolution SPEI dataset was integrated with Indian village boundaries and associated census attribute dataset. This allows researchers to perform multi-disciplinary investigations, e.g., climate migration modelling, drought hazards, and exposure assessment. The development of the dataset has been performed while keeping potential users in mind. Therefore, the dataset can be integrated into a GIS system for visualization (using .mid/.mif format) and into Python programming for modelling and analysis (using .csv). For advanced analysis, I have also provided it in netCDF format, which can be read in Python using xarray or the netcdf4 library. More details are in the README.pdf file. Date Submitted: 2023-11-07 Issued: 2023-11-07
Facebook
TwitterBy Homeland Infrastructure Foundation [source]
Each urban area is uniquely identified by a 5-character numeric census code that may contain leading zeroes as necessary. The dataset comprises several key attributes such as the name of the urban area (represented by multiple columns), legal/statistical area description, MAF/TIGER feature class code for classification purposes (MTFCC10), urban area type code (UATYP10), functional status indicating its operational characteristics (FUNCSTAT10), and geographic coordinates specifying the latitude and longitude of the interior point of each urban area.
Additional information available includes the land area in square meters (ALAND10) which denotes the extent of developed territory within an urban zone. Similarly, water areas associated with each urban area are provided as well in square meters measurement (AWATER10). Furthermore, shape length is included to describe the total length of an individual's shape or outline within an urban region while shape area signifies its overall spatial extent.
Here is a step-by-step guide on how to effectively use this dataset:
Import the Data: Load the dataset into your preferred tool or programming language for data analysis. Popular options include Python with libraries like pandas or R with packages like tidyr.
Explore the Columns: Familiarize yourself with the available columns in the dataset. Here are some important ones:
NAME10: The name of each urban area.NAMELSAD10: The name and legal/statistical area description of each urban area.UACE10: A 5-character numeric census code that uniquely identifies each urban area.ALAND10: The land area of each urban area in square meters.AWATER10: The water area of each urban area in square meters.FUNCSTAT10: The functional status of each urban area.INTPTLAT10andINTPTLON10: The latitude and longitude coordinates of the interior point of each urban area.Understand Urban Area Types: The dataset distinguishes between two types of urban areas:
a) Urbanized Areas (UAs): These areas contain 50,000 or more people.
b) Urban Clusters (UCs): These areas contain at least 2,500 people but fewer than 50,000 people. (Except in the U.S. Virgin Islands and Guam, which may have urban clusters with populations greater than 50,000).
The column
UATYP10provides the urban area type code for each entry.Analyze Functional Status: Explore the
FUNCSTAT10column to understand the functional status of each urban area. This information indicates whether an area is deemed functional for residential, commercial, or other non-residential purposes.Visualize Geographic Data: Util
- Urban Planning Analysis: This dataset can be used to analyze and compare different urban areas based on their land area, water area, population density, and functional status. It can provide valuable insights for urban planners in terms of designing infrastructure, allocating resources, and making informed decisions to ensure sustainable development.
- Demographic Research: Researchers studying population trends and demographics can utilize this dataset to understand the growth, distribution, and characteristics of urban areas over time. By analyzing the population size and density of different urban areas, they can identify patterns of urbanization and assess the impact of policies or events on urban populations.
- Environmental Impact Assessment: The land area and water area information in this dataset can be used to assess the environmental impact of urban areas. Researchers or environmentalists can analyze the proportion of green spaces versus built-up areas within each urban area to evaluate levels of air pollution, biodiversity loss, or potential for implementing sustainable practices like rooftop gardens or rainwater harvesting systems
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate i...
Facebook
TwitterThis child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.