16 datasets found

d
Python code used to download U.S. Census Bureau data for public-supply water...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Python code used to download U.S. Census Bureau data for public-supply water service areas [Dataset]. https://catalog.data.gov/dataset/python-code-used-to-download-u-s-census-bureau-data-for-public-supply-water-service-areas
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
This child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.
C
ACS 5 Year Data by Community Area
data.cityofchicago.org
catalog.data.gov
application/rdfxml +5
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2025). ACS 5 Year Data by Community Area [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/ACS-5-Year-Data-by-Community-Area/t68z-cikk
Explore at:
tsv, json, csv, xml, application/rdfxml, application/rssxmlAvailable download formats
Dataset updated
Jan 23, 2025
Dataset authored and provided by
City of Chicago
Description
Selected variables from the most recent ACS Community Survey (Released 2023) aggregated by Community Area. Additional years will be added as they become available.

The underlying algorithm to create the dataset calculates the % of a census tract that falls within the boundaries of a given community area. Given that census tracts and community area boundaries are not aligned, these figures should be considered an estimate.

Total population in this dataset: 2,647,621 Total Chicago Population Per ACS 2023: 2,664,452 % Difference: -0.632%

There are different approaches in common use for displaying Hispanic or Latino population counts. In this dataset, following the approach taken by the Census Bureau, a person who identifies as Hispanic or Latino will also be counted in the race category with which they identify. However, again following the Census Bureau data, there is also a column for White Not Hispanic or Latino.

Code can be found here: https://github.com/Chicago/5-Year-ACS-Survey-Data

Community Area Shapefile:

https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6

Census Area Python Package Documentation:

https://census-area.readthedocs.io/en/latest/index.html
g
ACS 5 Year Data by Ward
gimi9.com
data.cityofchicago.org
+1more
Updated Feb 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ACS 5 Year Data by Ward [Dataset]. https://gimi9.com/dataset/data-gov_acs-5-year-data-by-ward
Explore at:
Dataset updated
Feb 7, 2025
Description
🇺🇸 미국 English Selected variables from the most recent 5 year ACS Community Survey (Released 2023) aggregated by Ward. Additional years will be added as they become available. The underlying algorithm to create the dataset calculates the percent of a census tract that falls within the boundaries of a given ward. Given that census tracts and ward boundaries are not aligned, these figures should be considered an estimate. Total Population in this Dataset: 2,649,803 Total Population of Chicago reported by ACS 2023: 2,664,452 % Difference: %-0.55 There are different approaches in common use for displaying Hispanic or Latino population counts. In this dataset, following the approach taken by the Census Bureau, a person who identifies as Hispanic or Latino will also be counted in the race category with which they identify. However, again following the Census Bureau data, there is also a column for White Not Hispanic or Latino. The City of Chicago is actively soliciting community input on how best to represent race, ethnicity, and related concepts in its data and policy. Every dataset, including this one, has a "Contact dataset owner" link in the Actions menu. You can use it to offer any input you wish to share or to indicate if you would be interested in participating in live discussions the City may host. Code can be found here: https://github.com/Chicago/5-Year-ACS-Survey-Data Ward Shapefile: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Wards-2023-Map/cdf7-bgn3 Census Area Python Package Documentation: https://census-area.readthedocs.io/en/latest/index.html
h
census-income
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WC (2025). census-income [Dataset]. https://huggingface.co/datasets/cestwc/census-income
Explore at:
Dataset updated
Feb 12, 2025
Authors
WC
Description
Dataset Card for Census Income (Adult)

This dataset is a precise version of Adult or Census Income. This dataset from UCI somehow happens to occupy two links, but we checked and confirm that they are identical. We used the following python script to create this Hugging Face dataset. import pandas as pd from datasets import Dataset, DatasetDict, Features, Value, ClassLabel

URLs

url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" url2 =… See the full description on the dataset page: https://huggingface.co/datasets/cestwc/census-income.
H
Comprehensive dataset and Python toolkit for housing market analysis in...
dataverse.harvard.edu
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kingston Li (2025). Comprehensive dataset and Python toolkit for housing market analysis in Mercer County, NJ [Dataset]. http://doi.org/10.7910/DVN/LYRDHG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LYRDHG
Dataset updated
Mar 22, 2025
Dataset provided by
Harvard Dataverse
Authors
Kingston Li
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Mercer County, New Jersey
Description
This project combines data extraction, predictive modeling, and geospatial mapping to analyze housing trends in Mercer County, New Jersey. It consists of three core components: Census Data Extraction: Gathers U.S. Census data (2012–2022) on median house value, household income, and racial demographics for all census tracts in the county. It accounts for changes in census tract boundaries between 2010 and 2020 by approximating values for newly defined tracts. House Value Prediction: Uses an LSTM model with k-fold cross-validation to forecast median house values through 2025. Multiple feature combinations and sequence lengths are tested to optimize prediction accuracy, with the final model selected based on MSE and MAE scores. Data Mapping: Visualizes historical and predicted housing data using GeoJSON files from the TIGERWeb API. It generates interactive maps showing raw values, changes over time, and percent differences, with customization options to handle outliers and improve interpretability. This modular workflow can be adapted to other regions by changing the input FIPS codes and feature selections.
USA 2020 Census Population Characteristics - Place Geographies
hub.arcgis.com
data-isdh.opendata.arcgis.com
+1more
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2023). USA 2020 Census Population Characteristics - Place Geographies [Dataset]. https://hub.arcgis.com/maps/9c84c24c55a04c3b8317f37e536e6a8a
Explore at:
Dataset updated
Jun 1, 2023
Dataset authored and provided by
Esrihttp://esri.com/
Area covered

Description
This layer shows total population counts by sex, age, and race groups data from the 2020 Census Demographic and Housing Characteristics. This is shown by Nation, Consolidated City, Census Designated Place, Incorporated Place boundaries. Each geography layer contains a common set of Census counts based on available attributes from the U.S. Census Bureau. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis.   To see the full list of attributes available in this service, go to the "Data" tab above, and then choose "Fields" at the top right. Each attribute contains definitions, additional details, and the formula for calculated fields in the field description.Vintage of boundaries and attributes: 2020 Demographic and Housing Characteristics Table(s): P1, H1, H3, P2, P3, P5, P12, P13, P17, PCT12 (Not all lines of these DHC tables are available in this feature layer.)Data downloaded from: U.S. Census Bureau’s data.census.gov siteDate the Data was Downloaded: May 25, 2023Geography Levels included: Nation, Consolidated City, Census Designated Place, Incorporated PlaceNational Figures: included in Nation layer The United States Census Bureau Demographic and Housing Characteristics: 2020 Census Results 2020 Census Data Quality Geography & 2020 Census Technical Documentation Data Table Guide: includes the final list of tables, lowest level of geography by table and table shells for the Demographic Profile and Demographic and Housing Characteristics.News & Updates This layer is ready to be used in ArcGIS Pro, ArcGIS Online and its configurable apps, Story Maps, dashboards, Notebooks, Python, custom apps, and mobile apps. Data can also be exported for offline workflows. Please cite the U.S. Census Bureau when using this data. Data Processing Notes: These 2020 Census boundaries come from the US Census TIGER geodatabases. These are Census boundaries with water and/or coastlines erased for cartographic and mapping purposes. For Census tracts and block groups, the water cutouts are derived from a subset of the 2020 Areal Hydrography boundaries offered by TIGER. Water bodies and rivers which are 50 million square meters or larger (mid to large sized water bodies) are erased from the tract and block group boundaries, as well as additional important features. For state and county boundaries, the water and coastlines are derived from the coastlines of the 2020 500k TIGER Cartographic Boundary Shapefiles. These are erased to more accurately portray the coastlines and Great Lakes. The original AWATER and ALAND fields are unchanged and available as attributes within the data table (units are square meters).  The layer contains all US states, Washington D.C., and Puerto Rico. Census tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99). Block groups that fall within the same criteria (Block Group denoted as 0 with no area land) have also been removed.Percentages and derived counts, are calculated values (that can be identified by the "_calc_" stub in the field name). Field alias names were created based on the Table Shells file available from the Data Table Guide for the Demographic Profile and Demographic and Housing Characteristics. Not all lines of all tables listed above are included in this layer. Duplicative counts were dropped. For example, P0030001 was dropped, as it is duplicative of P0010001.To protect the privacy and confidentiality of respondents, their data has been protected using differential privacy techniques by the U.S. Census Bureau.
H
2023 Major Demographics by US Census Block Group
dataverse.harvard.edu
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Bryan (2025). 2023 Major Demographics by US Census Block Group [Dataset]. http://doi.org/10.7910/DVN/9AEYAS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/9AEYAS
Dataset updated
Mar 7, 2025
Dataset provided by
Harvard Dataverse
Authors
Michael Bryan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
blockgroupdemographics A selection of variables from the US Census Bureau's American Community Survey 5YR and TIGER/Line publications. Overview The U.S. Census Bureau published it's American Community Survey 5 Year with more than 37,000 variables. Most ACS advanced users will have their personal list of favorites, but this conventional wisdom is not available to occasional analysts. This publication re-shares 174 select demographic data from the U.S. Census Bureau to provide an supplement to Open Environments Block Group publications. These results do not reflect any proprietary or predictive model. Rather, they extract from Census Bureau results. For additional support or more detail, please see the Census Bureau citations below. The first 170 demographic variables are taken from popular variables in the American Community Survey (ACS) including age, race, income, education and family structure. A full list of ACS variable names and definitions can be found in the ACS 'Table Shells' here https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.html. The dataset includes 4 additional columns from the Census' TIGER/Line publication. See Open Environment's 2023blockgroupcartographics publication for the shapes of each block group. For each block group, the dataset includes land area (ALAND), water area (AWATER), interpolated latitude (INTPTLAT) and longitude (INTPTLON). These are valuable for calculating population density variables which combine ACS populations and TIGER land area. Files The resulting dataset is available with other block group based datasets on Harvard's Dataverse https://dataverse.harvard.edu/ in Open Environment's Block Group Dataverse https://dataverse.harvard.edu/dataverse/blockgroupdatasets/. This data simply requires csv reader software or pythons pandas package. Supporting the data file, is acsvars.csv, a list of the Census variable names and their corresponding description. Citations “American Community Survey 5-Year Data (2019-2023).” Census.gov, US Census Bureau, https://www.census.gov/data/developers/data-sets/acs-5year.html. 2023 "American Community Survey, Table Shells and Table List” Census.gov, US Census Bureau, https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.html Python Package Index - PyPI. Python Software Foundation. "A simple wrapper for the United States Census Bureau’s API.". Retrieved from https://pypi.org/project/census/
census-bureau-international
kaggle.com
zip
Updated May 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). census-bureau-international [Dataset]. https://www.kaggle.com/bigquery/census-bureau-international
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 6, 2020
Dataset provided by
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
Context

The United States Census Bureau’s international dataset provides estimates of country populations since 1950 and projections through 2050. Specifically, the dataset includes midyear population figures broken down by age and gender assignment at birth. Additionally, time-series data is provided for attributes including fertility rates, birth rates, death rates, and migration rates.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.census_bureau_international.

Sample Query 1

What countries have the longest life expectancy? In this query, 2016 census information is retrieved by joining the mortality_life_expectancy and country_names_area tables for countries larger than 25,000 km2. Without the size constraint, Monaco is the top result with an average life expectancy of over 89 years!

standardSQL

SELECT age.country_name, age.life_expectancy, size.country_area FROM ( SELECT country_name, life_expectancy FROM bigquery-public-data.census_bureau_international.mortality_life_expectancy WHERE year = 2016) age INNER JOIN ( SELECT country_name, country_area FROM bigquery-public-data.census_bureau_international.country_names_area where country_area > 25000) size ON age.country_name = size.country_name ORDER BY 2 DESC /* Limit removed for Data Studio Visualization */ LIMIT 10

Sample Query 2

Which countries have the largest proportion of their population under 25? Over 40% of the world’s population is under 25 and greater than 50% of the world’s population is under 30! This query retrieves the countries with the largest proportion of young people by joining the age-specific population table with the midyear (total) population table.

standardSQL

SELECT age.country_name, SUM(age.population) AS under_25, pop.midyear_population AS total, ROUND((SUM(age.population) / pop.midyear_population) * 100,2) AS pct_under_25 FROM ( SELECT country_name, population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population_agespecific WHERE year =2017 AND age < 25) age INNER JOIN ( SELECT midyear_population, country_code FROM bigquery-public-data.census_bureau_international.midyear_population WHERE year = 2017) pop ON age.country_code = pop.country_code GROUP BY 1, 3 ORDER BY 4 DESC /* Remove limit for visualization*/ LIMIT 10

Sample Query 3

The International Census dataset contains growth information in the form of birth rates, death rates, and migration rates. Net migration is the net number of migrants per 1,000 population, an important component of total population and one that often drives the work of the United Nations Refugee Agency. This query joins the growth rate table with the area table to retrieve 2017 data for countries greater than 500 km2.

SELECT growth.country_name, growth.net_migration, CAST(area.country_area AS INT64) AS country_area FROM ( SELECT country_name, net_migration, country_code FROM bigquery-public-data.census_bureau_international.birth_death_growth_rates WHERE year = 2017) growth INNER JOIN ( SELECT country_area, country_code FROM bigquery-public-data.census_bureau_international.country_names_area

Update frequency

Historic (none)

Dataset source

United States Census Bureau

Terms of use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

See the GCP Marketplace listing for more details and sample queries: https://console.cloud.google.com/marketplace/details/united-states-census-bureau/international-census-data
Protocol data (Python version)
figshare.com
txt
Updated Oct 1, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesse Gillis (2020). Protocol data (Python version) [Dataset]. http://doi.org/10.6084/m9.figshare.13034171.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13034171.v1
Dataset updated
Oct 1, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jesse Gillis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We published 3 protocols illustrating how MetaNeighbor can be used to quantify cell type replicability across single cell transcriptomic datasets.The data files included here are needed to run the Python version of the protocols available on Github (https://github.com/gillislab/MetaNeighbor-Protocol) in Jupyter (.ipynb) notebook format. To run the protocols, download the protocols on Github, download the data on Figshare, place the data and protocol files in the same directory, then run the notebooks in Jupyter.Briefly:- biccn_hvg.h5ad contains a single cell transcriptomic dataset published by the Brain Initiative Cell Census Network (in AnnData format). It combines data from 7 datasets obtained in the mouse primary motor cortex (https://www.biorxiv.org/content/10.1101/2020.02.29.970558v2). Note that this dataset only contains highly variable genes.- biccn_gaba.h5ad: same dataset as biccn_hvg.h5ad, but restricted to GABAergic neurons. The dataset contains all genes common to the 7 BICCN datasets (not just highly variable genes).- hemberg.h5ad contains a merged version of 4 human pancreas single cell transcriptomic datasets made available in a standardized form by the Hemberg lab.- tasic.h5ad contains a single cell transcriptomic dataset of neurons from the mouse primary visual cortex, as published in Tasic et al. 2016.- go_mouse.mtx, go_mouse_col_labels.txt, go_mouse_row_labels.txt: mouse gene ontology annotations, stored as a one-hot encoded matrix.
d
Datasets for Computational Methods and GIS Applications in Social Science
search.dataone.org
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahui Wang; Lingbo Liu (2024). Datasets for Computational Methods and GIS Applications in Social Science [Dataset]. http://doi.org/10.7910/DVN/4CM7V4
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/4CM7V4
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Fahui Wang; Lingbo Liu
Description
Dataset for the textbook Computational Methods and GIS Applications in Social Science (3rd Edition), 2023 Fahui Wang, Lingbo Liu Main Book Citation: Wang, F., & Liu, L. (2023). Computational Methods and GIS Applications in Social Science (3rd ed.). CRC Press. https://doi.org/10.1201/9781003292302 KNIME Lab Manual Citation: Liu, L., & Wang, F. (2023). Computational Methods and GIS Applications in Social Science - Lab Manual. CRC Press. https://doi.org/10.1201/9781003304357 KNIME Hub Dataset and Workflow for Computational Methods and GIS Applications in Social Science-Lab Manual Update Log If Python package not found in Package Management, use ArcGIS Pro's Python Command Prompt to install them, e.g., conda install -c conda-forge python-igraph leidenalg NetworkCommDetPro in CMGIS-V3-Tools was updated on July 10,2024 Add spatial adjacency table into Florida on June 29,2024 The dataset and tool for ABM Crime Simulation were updated on August 3, 2023, The toolkits in CMGIS-V3-Tools was updated on August 3rd,2023. Report Issues on GitHub https://github.com/UrbanGISer/Computational-Methods-and-GIS-Applications-in-Social-Science Following the website of Fahui Wang : http://faculty.lsu.edu/fahui Contents Chapter 1. Getting Started with ArcGIS: Data Management and Basic Spatial Analysis Tools Case Study 1: Mapping and Analyzing Population Density Pattern in Baton Rouge, Louisiana Chapter 2. Measuring Distance and Travel Time and Analyzing Distance Decay Behavior Case Study 2A: Estimating Drive Time and Transit Time in Baton Rouge, Louisiana Case Study 2B: Analyzing Distance Decay Behavior for Hospitalization in Florida Chapter 3. Spatial Smoothing and Spatial Interpolation Case Study 3A: Mapping Place Names in Guangxi, China Case Study 3B: Area-Based Interpolations of Population in Baton Rouge, Louisiana Case Study 3C: Detecting Spatiotemporal Crime Hotspots in Baton Rouge, Louisiana Chapter 4. Delineating Functional Regions and Applications in Health Geography Case Study 4A: Defining Service Areas of Acute Hospitals in Baton Rouge, Louisiana Case Study 4B: Automated Delineation of Hospital Service Areas in Florida Chapter 5. GIS-Based Measures of Spatial Accessibility and Application in Examining Healthcare Disparity Case Study 5: Measuring Accessibility of Primary Care Physicians in Baton Rouge Chapter 6. Function Fittings by Regressions and Application in Analyzing Urban Density Patterns Case Study 6: Analyzing Population Density Patterns in Chicago Urban Area >Chapter 7. Principal Components, Factor and Cluster Analyses and Application in Social Area Analysis Case Study 7: Social Area Analysis in Beijing Chapter 8. Spatial Statistics and Applications in Cultural and Crime Geography Case Study 8A: Spatial Distribution and Clusters of Place Names in Yunnan, China Case Study 8B: Detecting Colocation Between Crime Incidents and Facilities Case Study 8C: Spatial Cluster and Regression Analyses of Homicide Patterns in Chicago Chapter 9. Regionalization Methods and Application in Analysis of Cancer Data Case Study 9: Constructing Geographical Areas for Mapping Cancer Rates in Louisiana Chapter 10. System of Linear Equations and Application of Garin-Lowry in Simulating Urban Population and Employment Patterns Case Study 10: Simulating Population and Service Employment Distributions in a Hypothetical City Chapter 11. Linear and Quadratic Programming and Applications in Examining Wasteful Commuting and Allocating Healthcare Providers Case Study 11A: Measuring Wasteful Commuting in Columbus, Ohio Case Study 11B: Location-Allocation Analysis of Hospitals in Rural China Chapter 12. Monte Carlo Method and Applications in Urban Population and Traffic Simulations Case Study 12A. Examining Zonal Effect on Urban Population Density Functions in Chicago by Monte Carlo Simulation Case Study 12B: Monte Carlo-Based Traffic Simulation in Baton Rouge, Louisiana Chapter 13. Agent-Based Model and Application in Crime Simulation Case Study 13: Agent-Based Crime Simulation in Baton Rouge, Louisiana Chapter 14. Spatiotemporal Big Data Analytics and Application in Urban Studies Case Study 14A: Exploring Taxi Trajectory in ArcGIS Case Study 14B: Identifying High Traffic Corridors and Destinations in Shanghai Dataset File Structure 1 BatonRouge Census.gdb BR.gdb 2A BatonRouge BR_Road.gdb Hosp_Address.csv TransitNetworkTemplate.xml BR_GTFS Google API Pro.tbx 2B Florida FL_HSA.gdb R_ArcGIS_Tools.tbx (RegressionR) 3A China_GX GX.gdb 3B BatonRouge BR.gdb 3C BatonRouge BRcrime R_ArcGIS_Tools.tbx (STKDE) 4A BatonRouge BRRoad.gdb 4B Florida FL_HSA.gdb HSA Delineation Pro.tbx Huff Model Pro.tbx FLplgnAdjAppend.csv 5 BRMSA BRMSA.gdb Accessibility Pro.tbx 6 Chicago ChiUrArea.gdb R_ArcGIS_Tools.tbx (RegressionR) 7 Beijing BJSA.gdb bjattr.csv R_ArcGIS_Tools.tbx (PCAandFA, BasicClustering) 8A Yunnan YN.gdb R_ArcGIS_Tools.tbx (SaTScanR) 8B Jiangsu JS.gdb 8C Chicago ChiCity.gdb cityattr.csv ...
USA State Shapefiles
kaggle.com
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Switzer (2025). USA State Shapefiles [Dataset]. https://www.kaggle.com/datasets/nswitzer/usa-state-shapeflies/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2025
Dataset provided by
Kaggle
Authors
Nick Switzer
Area covered
United States
Description
Shapefiles for mapping and understanding overlaps

sf package in R. geopandas in Python.

https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html

Big Data Certification KR

kaggle.com

zip

Updated Nov 29, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

KIM TAE HEON (2021). Big Data Certification KR [Dataset]. https://www.kaggle.com/agileteam/bigdatacertificationkr

Explore at:

zip(15840 bytes)Available download formats

Dataset updated

Nov 29, 2021

Authors

KIM TAE HEON

License

Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically

Description

빅데이터 분석기사 실기 준비 놀이터

함께 놀아볼까요? 무궁화 꽃이 피었습니다 😜 빅데이터 분석기사 실기 준비를 위한 데이터 셋입니다. 더 좋은 코드를 만든다면 많은 공유 부탁드려요🎉 (Python과 R모두 환영합니다.)

4회 기출 유형

작업형2 유형 컴피티션 : https://www.kaggle.com/competitions/big-data-analytics-certification-kr-2022
베이스라인: (준비중)

3회 기출 유형 및 심화 학습자료

빅데이터 분석기사 컴피티션🍭
https://www.kaggle.com/competitions/big-data-analytics-certification

🆕 New 문제 업데이트 2022.6

작업형2
회귀: https://www.kaggle.com/code/agileteam/t2-2-2-baseline-r2
분류(3회 기출 심화 변형) : https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline
작업형1 (3회 기출 유형)
작업형1 모의문제2(심화) https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2

🎁 빅데이터 분식기사 실기 입문 강의 Open 🎁

https://class101.page.link/tp9k
입문자를 위한 강의 오픈 했어요 👍
파이썬-판다스-머신러닝-모의문제(작업형1,2)-꿀팁 등을 실기 준비에 필요한 내용만 친절하게 알려드려요🎉
머신러닝을 해보신 분이라면 수강 할 필요 없을 것 같아요, 바로 모의 문제를 풀기 힘든 설명이 필요한 찐 입문자에게 추천드려요!

📌작업형1 예상문제 (P:파이썬, R)

Tasks 탭에서 문제 및 코드 확인

[2회차 기출 유형] 작업형1 P: https://www.kaggle.com/agileteam/tutorial-t1-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-2-r-2
공식 예시문제(작업형1) P: https://www.kaggle.com/agileteam/tutorial-t1-python R: https://www.kaggle.com/limmyoungjin/tutorial-t1-r
T1-1.Outlier(IQR) / #이상치 #IQR P: https://www.kaggle.com/agileteam/py-t1-1-iqr-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-1-iqr-expected-questions-2
T1-2.Outlier(age) / #이상치 #소수점나이 P: https://www.kaggle.com/agileteam/py-t1-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-2-expected-questions-2
T1-3. Missing data / #결측치 #삭제 #중앙 #평균 P: https://www.kaggle.com/agileteam/py-t1-3-map-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-3-expected-questions-2
T1-4. Skewness and Kurtosis (Log Scale) / #왜도 #첨도 #로그스케일 P: https://www.kaggle.com/agileteam/py-t1-4-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-4-expected-questions-2
T1-5. Standard deviation / #표준편차 P: https://www.kaggle.com/agileteam/py-t1-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-5-expected-questions-2
T1-6. Groupby Sum / #결측치 #조건 P: https://www.kaggle.com/agileteam/py-t1-6-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-6-expected-questions-2
T1-7. Replace / #값변경 #조건 #최대값 P: https://www.kaggle.com/agileteam/py-t1-7-2-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-7-2-expected-questions-2
T1-8. Cumulative Sum / #누적합 #결측치 #보간 P: https://www.kaggle.com/agileteam/py-t1-8-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-8-expected-questions-2
T1-9. Standardization / #표준화 #중앙값 P: https://www.kaggle.com/agileteam/py-t1-9-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-9-expected-questions-2
T1-10. Yeo-Johnson and Box–Cox / #여존슨 #박스-콕스 #결측치 #최빈값 P: https://www.kaggle.com/agileteam/py-t1-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-10-expected-questions-2
T1-11. min-max scaling / #스케일링 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-11-min-max-5-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-11-min-max-5-expected-questions-2
T1-12. top10-bottom10 / #그룹핑 #정렬 #상하위값 P: https://www.kaggle.com/agileteam/py-t1-12-10-10-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-12-10-expected-questions-2
T1-13. Correlation / #상관관계 P: https://www.kaggle.com/agileteam/py-t1-13-expected-questions R: https://www.kaggle.com/limmyoungjin/r-t1-13-expected-questions-2
T1-14. Multi Index & Groupby / #멀티인덱스 #정렬 #인덱스리셋 #상위값 P: https://www.kaggle.com/agileteam/py-t1-14-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-14-2-expected-question-2
T1-15. Slicing & Condition / #슬라이싱 #결측치 #중앙값 #조건 P: https://www.kaggle.com/agileteam/py-t1-15-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-15-expected-question-2
T1-16. Variance / #분산 #결측치전후값차이 P: https://www.kaggle.com/agileteam/py-t1-16-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-16-expected-question-2
T1-17. Time-Series1 / #시계열데이터 #datetime P: https://www.kaggle.com/agileteam/py-t1-17-1-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-17-1-expected-question-2
T1-18. Time-Series2 / #주말 #평일 #비교 #시계열데이터 P: https://www.kaggle.com/agileteam/py-t1-18-2-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-18-2-expected-question-2
T1-19. Time-Series3 (monthly total) / #월별 #총계 #비교 #데이터값변경
P: https://www.kaggle.com/agileteam/py-t1-19-3-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-19-3-expected-question-2
T1-20. Combining Data / 데이터 #병합 #결합 / 고객과 궁합이 맞는 타입 매칭
P: https://www.kaggle.com/agileteam/py-t1-20-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-20-expected-question-2
T1-21. Binning Data / #비닝 #구간나누기 P: https://www.kaggle.com/agileteam/py-t1-21-expected-question R: https://www.kaggle.com/limmyoungjin/r-t1-21-expected-question-2
T1-22. Time-Series4 (Weekly data) / #주간 #합계 P: https://www.kaggle.com/agileteam/t1-22-time-series4-weekly-data R: https://www.kaggle.com/limmyoungjin/r-t1-22-time-series4-weekly-data-2
T1-23. Drop Duplicates / #중복제거 #결측치 #10번째값으로채움 P: https://www.kaggle.com/agileteam/t1-23-drop-duplicates R: https://www.kaggle.com/limmyoungjin/r-t1-23-drop-duplicates-2
T1-24. Time-Series5 (Lagged Feature) / #시차데이터 #조건 P: https://www.kaggle.com/agileteam/t1-24-time-series5-lagged-feature R: https://www.kaggle.com/limmyoungjin/r-t1-24-time-series5-2
[MOCK EXAM1] TYPE1 / 작업형1 모의고사 P: https://www.kaggle.com/agileteam/mock-exam1-type1-1-tutorial R: https://www.kaggle.com/limmyoungjin/mock-exam1-type1-1
[MOCK EXAM2] TYPE1 / 작업형1 모의고사2 P: https://www.kaggle.com/code/agileteam/mock-exam2-type1-1-2

📌작업형2 예상문제

Tasks 탭에서 문제 및 코드 확인 - [3회차 기출유형 작업형2] : 여행 보험 패키지 상품 (데이터를 조금 어렵게 변경함) P: https://www.kaggle.com/code/agileteam/3rd-type2-3-2-baseline

[2회차 기출유형 작업형2] : E-Commerce Shipping Data P: https://www.kaggle.com/agileteam/tutorial-t2-2-python R: https://www.kaggle.com/limmyoungjin/tutorial-t2-2-r
T2. Exercise / 예시문제 : 백화점고객의 1년간 데이터 (dataq 공식 예제) P: https://www.kaggle.com/agileteam/t2-exercise-tutorial-baseline
T2-1. Titanic (Classification) / 타이타닉 P: https://www.kaggle.com/agileteam/t2-1-titanic-simple-baseline R: https://www.kaggle.com/limmyoungjin/r-t2-1-titanic
T2-2. Pima Indians Diabetes (Classification) / 당뇨병 P: https://www.kaggle.com/agileteam/t2-2-pima-indians-diabetes R: https://www.kaggle.com/limmyoungjin/r-t2-2-pima-indians-diabetes
T2-3. Adult Census Income (Classification) / 성인 인구소득 예측 P: https://www.kaggle.com/agileteam/t2-3-adult-census-income-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-3-adult-census-income
T2-4. House Prices (Regression) / 집값 예측 / RMSE P: https://www.kaggle.com/code/blighpark/t2-4-house-prices-regression R: https://www.kaggle.com/limmyoungjin/r-t2-4-house-prices
T2-5. Insurance Forecast (Regression) / P: https://www.kaggle.com/agileteam/insurance-starter-tutorial R: https://www.kaggle.com/limmyoungjin/r-t2-5-insurance-prediction
T2-6. Bike-sharing-demand (Regression) / 자전거 수요 예측 / RMSLE P: R: https://www.kaggle.com/limmyoungjin/r-t2-6-bike-sharing-demand
[MOCK EXAM1] TYPE2. HR-DATA / 작업형2 모의고사 P: https://www.kaggle.com/agileteam/mock-exam-t2-exam-template(템플릿만 제공) https://www.kaggle.com/agileteam/mock-exam-t2-starter-tutorial (찐입문자용) https://www.kaggle.com/agileteam/mock-exam-t2-baseline-tutorial (베이스라인)

📌6 주 완성 코스 (아래 표 참고)

주차	유형(에디터)	번호
6주 전	작업형1(노트북)	T1-1~5
5주 전	작업형1(노트북)	T1-6~9, T1 EQ(기출),
4주 전	작업형1(스크립트), 작업형2(노트북)	T1-10~13, T1.Ex, T2EQ, T2-1
3주 전	작업형1(스크립트), 작업형2(노트북)	T1-14~19, T2-2~3
2주 전	작업형1(스크립트), 작업형2(스크립트)	T1-20~21, T2-4~6, 복습
1주 전	작업형1, 작업형2(스크립트), 단답형	T1-22~24, 모의고사, 복습, 응시환경 체험, 단답

📌입문자를 위한 머신러닝 튜토리얼 (공유해주신 노트북 중 선정하였음👍)

- https://www.kaggle.com/ohseokkim/t2-2-pima-indians-diabetes 작성자: @ohseokkim 😆

d
Calculated Leached Nitrogen from Septic Systems in Wisconsin, 1850-2010
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Calculated Leached Nitrogen from Septic Systems in Wisconsin, 1850-2010 [Dataset]. https://catalog.data.gov/dataset/calculated-leached-nitrogen-from-septic-systems-in-wisconsin-1850-2010
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Area covered
Wisconsin
Description
This data release contains a netCDF file containing decadal estimates of nitrate leached from septic systems (kilograms per hectare per year, or kg/ha) in the state of Wisconsin from 1850 to 2010, as well as the python code and supporting files used to create the netCDF file. The netCDF file is used as an input to a Nitrate Decision Support Tool for the State of Wisconsin (GW-NDST; Juckem and others, 2024). The dataset was constructed starting with 1990 census records, which included responses about households using septic systems for waste disposal. The fraction of population using septic systems in 1990 was aggregated at the county scale and applied backward in time for each decade from 1850 to 1980. For decades from 1990 to 2010, the fraction of population using septic systems was computed on the finer resolution census block-group scale. Each decadal estimate of the fraction of population using septic systems was then multiplied by 4.13 kilograms per person per year of leached nitrate to estimate the per-area load of nitrate below the root zone. The data release includes a python notebook used to process the input datasets included in the data release, shapefiles created (or modified) using the python notebook, and the final netCDF file.
f
South Africa Education Data and Visualisations
ufs.figshare.com
png
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Herkulaas Combrink; Elizabeth Carr; Katinka de wet; Vukosi Marivate; Benjamin Rosman (2023). South Africa Education Data and Visualisations [Dataset]. http://doi.org/10.38140/ufs.22081058.v4
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.38140/ufs.22081058.v4
Dataset updated
Aug 15, 2023
Dataset provided by
University of the Free State
Authors
Herkulaas Combrink; Elizabeth Carr; Katinka de wet; Vukosi Marivate; Benjamin Rosman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Africa
Description
The tabular and visual dataset focuses on South African basic education and provides insights into the distribution of schools and basic population statistics across the country. This tabular and visual data are stratified across different quintiles for each provincial and district boundary. The quintile system is used by the South African government to classify schools based on their level of socio-economic disadvantage, with quintile 1 being the most disadvantaged and quintile 5 being the least disadvantaged. The data was joined by extracting information from the debarment of basic education with StatsSA population census data. Thereafter, all tabular data and geo located data were transformed to maps using GIS software and the Python integrated development environment. The dataset includes information on the number of schools and students in each quintile, as well as the population density in each area. The data is displayed through a combination of charts, maps and tables, allowing for easy analysis and interpretation of the information.
c
India DroughtSet: A village-level drought dataset for the past 43 years
datacatalogue.cessda.eu
ssh.datastations.nl
+1more
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T Pareek (2024). India DroughtSet: A village-level drought dataset for the past 43 years [Dataset]. http://doi.org/10.17026/dans-xft-eprj
Explore at:
Unique identifier
https://doi.org/10.17026/dans-xft-eprj
Dataset updated
Feb 14, 2024
Dataset provided by
Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente
Authors
T Pareek
Area covered
India
Description
This database consists of a high-resolution village-level drought dataset for major Indian states for the past 43 years (1981 – 2022) for each month. It was created by utilising the CHIRPS precipitation and GLEAM evapotranspiration datasets. GLEAMS dataset based on the well recognised Priestley-Taylor equation to estimate potential evapotranspiration (PET) based on observations of surface net radiation and near-surface air temperature. The SPEI was calculated for spatial grids of 5x5 km for the SPEI 3-month time scale, suitable for agricultural drought monitoring.
This high-resolution SPEI dataset was integrated with Indian village boundaries and associated census attribute dataset. This allows researchers to perform multi-disciplinary investigations, e.g., climate migration modelling, drought hazards, and exposure assessment. The development of the dataset has been performed while keeping potential users in mind. Therefore, the dataset can be integrated into a GIS system for visualization (using .mid/.mif format) and into Python programming for modelling and analysis (using .csv). For advanced analysis, I have also provided it in netCDF format, which can be read in Python using xarray or the netcdf4 library. More details are in the README.pdf file.

Date Submitted: 2023-11-07
Issued: 2023-11-07
Modelizations and analyzes of the urban fabric (2D and 3D) of Charleville...
zenodo.org
data.niaid.nih.gov
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rassat Sylvain; Rassat Sylvain (2022). Modelizations and analyzes of the urban fabric (2D and 3D) of Charleville from 1724 to 1876. [Dataset]. http://doi.org/10.34847/nkl.abcb8377
Explore at:
Unique identifier
https://doi.org/10.34847/nkl.abcb8377
Dataset updated
Nov 9, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rassat Sylvain; Rassat Sylvain
Description
The main characteristic of the city of Charleville (Ardennes), concerning the history of its population, is the realization by the municipal authorities of a nominative, spatial and annual census of the inhabitants, undertaken from the end of the 17th century, and until at the beginning of the 20th century.
This gives the possibility of exploiting Carolopolitan data within a geographic information system (GIS). Thanks to this GIS, to its informal quality and volume, it is possible to reconstruct in 3D the city of 1834, and to make full use of the methodological environment known as "BIM" (or Building Information Model). BIM will make it possible to exploit the architectural and demographic composition of each plot, block, dwelling in this city, from its genesis to the present day, like a real 3D GIS.
Spatial analyzes built from cartographic and topographic data, analyzed, vectorized (2D and 3D), georeferenced and linked to the occupation data of the city of Charleville.

Data collected, digitized and structured from 01/15/2016 to 01/04/2021 in the context of:
- Axis 1 of the Roland Mousnier Center (UMR 8596), Sorbonne University / CNRS
- the C2EP2 Project funded by the Sorbonne University "Emergencies" call for projects (2019-21) for the web and BIM parts (CSTB partnership)
- scientific and technical collaboration (partnership agreement) between the National Archives and the Roland Mousnier Center.

Creative Commons License
Dataset "Analyzes of the urban fabric (2D and 3D) of Charleville from 1724 to 1876.". by Sylvain Rassat, Roland Mousnier Center, CNRS, Departmental Archives, National Archives is made available under the terms of the Creative Commons Attribution - Share under the Same Conditions 4.0 International license.
Based on a Source Link work.
Permissions beyond the scope of this license can be obtained at https://www.researchgate.net/profile/Sylvain_Rassat.
(https://cesium.cstb.fr/Apps/cnrs/MN_Charleville3D.html)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2024). Python code used to download U.S. Census Bureau data for public-supply water service areas [Dataset]. https://catalog.data.gov/dataset/python-code-used-to-download-u-s-census-bureau-data-for-public-supply-water-service-areas

Python code used to download U.S. Census Bureau data for public-supply water service areas

Explore at:

Dataset updated

Jul 6, 2024

Dataset provided by

U.S. Geological Survey

Description

This child item describes Python code used to query census data from the TigerWeb Representational State Transfer (REST) services and the U.S. Census Bureau Application Programming Interface (API). These data were needed as input feature variables for a machine learning model to predict public supply water use for the conterminous United States. Census data were retrieved for public-supply water service areas, but the census data collector could be used to retrieve data for other areas of interest. This dataset is part of a larger data release using machine learning to predict public supply water use for 12-digit hydrologic units from 2000-2020. Data retrieved by the census data collector code were used as input features in the public supply delivery and water use machine learning models. This page includes the following file: census_data_collector.zip - a zip file containing the census data collector Python code used to retrieve data from the U.S. Census Bureau and a README file.

Clear search

Close search

Google apps

Main menu

Python code used to download U.S. Census Bureau data for public-supply water...

ACS 5 Year Data by Community Area

ACS 5 Year Data by Ward

census-income

URLs

Comprehensive dataset and Python toolkit for housing market analysis in...

USA 2020 Census Population Characteristics - Place Geographies

2023 Major Demographics by US Census Block Group

census-bureau-international

Context

Querying BigQuery tables

Sample Query 1

standardSQL

Sample Query 2

standardSQL

Sample Query 3

Update frequency

Dataset source

Protocol data (Python version)

Datasets for Computational Methods and GIS Applications in Social Science

USA State Shapefiles