20 datasets found

USA State code
kaggle.com
zip
Updated Mar 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
corochann (2020). USA State code [Dataset]. https://www.kaggle.com/datasets/corochann/usa-state-code
Explore at:
zip(1393 bytes)Available download formats
Dataset updated
Mar 29, 2020
Authors
corochann
Area covered
United States
Description
Dataset

This dataset was created by corochann

Contents
US state county name & codes
kaggle.com
zip
Updated Jun 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VivekMangipudi (2017). US state county name & codes [Dataset]. https://www.kaggle.com/stansilas/us-state-county-name-codes
Explore at:
zip(25096 bytes)Available download formats
Dataset updated
Jun 6, 2017
Authors
VivekMangipudi
Area covered
United States
Description
Context

There is no story behind this data.

These are just supplementary datasets which I plan on using for plotting county wise data on maps.. (in particular for using with my kernel : https://www.kaggle.com/stansilas/maps-are-beautiful-unemployment-is-not/)
As that data set didn't have the info I needed for plotting an interactive map using highcharter .

Content

Since I noticed that most demographic datasets here on Kaggle, either have state code, state name, or county name + state name but not all of it i.e county name, fips code, state name + state code.

Using these two datasets one can get any combination of state county codes etc.

States.csv has State name + code
US counties.csv has county wise data.

Acknowledgements

Picture : https://unsplash.com/search/usa-states?photo=-RO2DFPl7wE
Counties : https://www.census.gov/geo/reference/codes/cou.html
State :

Inspiration

Not Applicable.
2019 Census US Population Data By State
kaggle.com
zip
Updated Jan 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peretz Cohen (2021). 2019 Census US Population Data By State [Dataset]. https://www.kaggle.com/peretzcohen/2019-census-us-population-data-by-state
Explore at:
zip(1464 bytes)Available download formats
Dataset updated
Jan 21, 2021
Authors
Peretz Cohen
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Area covered
United States
Description
Context

https://www.kaggle.com/peretzcohen/us-vaccine-status-by-state

Content

This population data is pulled from the 2019 US Census and is here along with latitude and longitude data for each states' capital city

Acknowledgements

Population Data - https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html Location Data - https://github.com/jasperdebie/VisInfo/blob/master/us-state-capitals.csv
US Geographic Codes Dataset
kaggle.com
zip
Updated Jun 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theodore Nowak (2018). US Geographic Codes Dataset [Dataset]. https://www.kaggle.com/tsnowak/us-geographic-codes
Explore at:
zip(222855 bytes)Available download formats
Dataset updated
Jun 13, 2018
Authors
Theodore Nowak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
US Geographic Codes Dataset

This code is used to generate a combined data set of US ZIP, FIPS, and County data for most ZIP Codes in the U.S. (41,867 to be exact).

Code to generate the data set from the government files listed below can be found here.

The Data

The dataset is organized as follows:

Zip Code: USPS ZIP code from here

State Name: Full state name (E.g. Michigan)

State Abrv: USPS abbreviated state name (E.g: MI)

State Code: FIPS State Code from here

County Name: County in which ZIP is located

County Code: FIPS County Code

FIPS Code: FIPS State Code + FIPS County Code from here

ANSI Code: American National Standards Institute Code

Centroid Lat: Latitude value of the county center

Centroid Long: Longitude value of the county center

Sources

The data used to create this data set was taken from several recent government data sets.

These are:

2017 US Gazetteer FIPS Dataset

2017 US Census County Polygons Dataset

2015 Department of Labor GPCI Dataset

General Census State Code Reference Document

Disclaimers

The final csv is in 'latin1' encoding to preserve the Spanish county names in Puerto Rico.

This data is from, and shall remain in the public domain, and the onus of responsibility lies with the user of this data.
U.S. Education Datasets: Unification Project
kaggle.com
zip
Updated Apr 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roy Garrard (2020). U.S. Education Datasets: Unification Project [Dataset]. https://www.kaggle.com/noriuk/us-education-datasets-unification-project
Explore at:
zip(155201337 bytes)Available download formats
Dataset updated
Apr 13, 2020
Authors
Roy Garrard
Area covered
United States
Description
Author's Note 2019/04/20: Revisiting this project, I recently discovered the incredibly comprehensive API produced by the Urban Institute. It achieves all of the goals laid out for this dataset in wonderful detail. I recommend that users interested pay a visit to their site.

Context

This dataset is designed to bring together multiple facets of U.S. education data into one convenient CSV (states_all.csv).

Contents

states_all.csv: The primary data file. Contains aggregates from all state-level sources in one CSV.

output_files/states_all_extended.csv: The contents of states_all.csv with additional data related to race and gender.

Column Breakdown

Identification

PRIMARY_KEY: A combination of the year and state name.

YEAR

STATE

Enrollment

A breakdown of students enrolled in schools by school year.

GRADES_PK: Number of students in Pre-Kindergarten education.

GRADES_4: Number of students in fourth grade.

GRADES_8: Number of students in eighth grade.

GRADES_12: Number of students in twelfth grade.

GRADES_1_8: Number of students in the first through eighth grades.

GRADES 9_12: Number of students in the ninth through twelfth grades.

GRADES_ALL: The count of all students in the state. Comparable to ENROLL in the financial data (which is the U.S. Census Bureau's estimate for students in the state).

The extended version of states_all contains additional columns that breakdown enrollment by race and gender. For example:

G06_A_A: Total number of sixth grade students.

G06_AS_M: Number of sixth grade male students whose ethnicity was classified as "Asian".

G08_AS_A_READING: Average reading score of eighth grade students whose ethnicity was classified as "Asian".

The represented races include AM (American Indian or Alaska Native), AS (Asian), HI (Hispanic/Latino), BL (Black or African American), WH (White), HP (Hawaiian Native/Pacific Islander), and TR (Two or More Races). The represented genders include M (Male) and F (Female).

Financials

A breakdown of states by revenue and expenditure.

ENROLL: The U.S. Census Bureau's count for students in the state. Should be comparable to GRADES_ALL (which is the NCES's estimate for students in the state).

TOTAL REVENUE: The total amount of revenue for the state.

FEDERAL_REVENUE

STATE_REVENUE

LOCAL_REVENUE

TOTAL_EXPENDITURE: The total expenditure for the state.

INSTRUCTION_EXPENDITURE

SUPPORT_SERVICES_EXPENDITURE

CAPITAL_OUTLAY_EXPENDITURE

OTHER_EXPENDITURE

Academic Achievement

A breakdown of student performance as assessed by the corresponding exams (math and reading, grades 4 and 8).

AVG_MATH_4_SCORE: The state's average score for fourth graders taking the NAEP math exam.

AVG_MATH_8_SCORE: The state's average score for eight graders taking the NAEP math exam.

AVG_READING_4_SCORE: The state's average score for fourth graders taking the NAEP reading exam.

AVG_READING_8_SCORE: The state's average score for eighth graders taking the NAEP reading exam.

Data Processing

The original sources can be found here:

# Enrollment https://nces.ed.gov/ccd/stnfis.asp # Financials https://www.census.gov/programs-surveys/school-finances/data/tables.html # Academic Achievement https://www.nationsreportcard.gov/ndecore/xplore/NDE

Data was aggregated using a Python program I wrote. The code (as well as additional project information) can be found [here][1].

Methodology Notes

Spreadsheets for NCES enrollment data for 2014, 2011, 2010, and 2009 were modified to place key data on the same sheet, making scripting easier.

The column 'ENROLL' represents the U.S. Census Bureau data value (financial data), while the column 'GRADES_ALL' represents the NCES data value (demographic data). Though the two organizations correspond on this matter, these values (which are ostensibly the same) do vary. Their documentation chalks this up to differences in membership (i.e. what is and is not a fourth grade student).

Enrollment data from NCES has seen a number of changes across survey years. One of the more notable is that data on student gender does not appear to have been collected until 2009. The information in states_all_extended.csv reflects this.

NAEP test score data is only available for certain years

The current version of this data is concerned with state-level patterns. It is the author's hope that future versions will allow for school district-level granularity.

Acknowledgements

Data is sourced from the U.S. Census Bureau and the National Center for Education Statistics (NCES).

Licensing Notes

The licensing of these datasets state that it must not be us...
COVID-19 Tracking Germany
kaggle.com
zip
Updated Feb 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heads or Tails (2023). COVID-19 Tracking Germany [Dataset]. https://www.kaggle.com/datasets/headsortails/covid19-tracking-germany
Explore at:
zip(14492010 bytes)Available download formats
Dataset updated
Feb 7, 2023
Authors
Heads or Tails
Area covered
Germany
Description
Read the associated blogpost for a detailed description of how this dataset was prepared; plus extra code for producing animated maps.

Context

The 2019 Novel Coronavirus (COVID-19) continues to spread in countries around the world. This dataset provides daily updated number of reported cases & deaths in Germany on the federal state (Bundesland) and county (Landkreis/Stadtkreis) level. In April 2021 I added a dataset on vaccination progress. In addition, I provide geospatial shape files and general state-level population demographics to aid the analysis.

Content

The dataset consists of thre main csv files: covid_de.csv, demgraphics_de.csv, and covid_de_vaccines.csv. The geospatial shapes are included in the de_state.* files. See the column descriptions below for more detailed information.

covid_de.csv: COVID-19 cases and deaths which will be updated daily. The original data are being collected by Germany's Robert Koch Institute and can be download through the National Platform for Geographic Data (the latter site also hosts an interactive dashboard). I reshaped and translated the data (using R tidyverse tools) to make it better accessible. This blogpost explains how I prepared the data, and describes how to produces animated maps.

demographics_de.csv: General Demographic Data about Germany on the federal state level. Those have been downloaded from Germany's Federal Office for Statistics (Statistisches Bundesamt) through their Open Data platform GENESIS. The data reflect the (most recent available) estimates on 2018-12-31. You can find the corresponding table here.

covid_de_vaccines.csv: In April 2021 I added this file that contains the Covid-19 vaccination progress for Germany as a whole. It details daily doses, broken down cumulatively by manufacturer, as well as the cumulative number of people having received their first and full vaccination. The earliest data are from 2020-12-27.

de_state.*: Geospatial shape files for Germany's 16 federal states. Downloaded via Germany's Federal Agency for Cartography and Geodesy . Specifically, the shape file was obtained from this link.

Column Description

COVID-19 dataset covid_de.csv:

state: Name of the German federal state. Germany has 16 federal states. I removed converted special characters from the original data.

county: The name of the German Landkreis (LK) or Stadtkreis (SK), which correspond roughly to US counties.

age_group: The COVID-19 data is being reported for 6 age groups: 0-4, 5-14, 15-34, 35-59, 60-79, and above 80 years old. As a shortcut the last category I'm using "80-99", but there might well be persons above 99 years old in this dataset. This column has a few NA entries.

gender: Reported as male (M) or female (F). This column has a few NA entries.

date: The calendar date of when a case or death were reported. There might be delays that will be corrected by retroactively assigning cases to earlier dates.

cases: COVID-19 cases that have been confirmed through laboratory work. This and the following 2 columns are counts per day, not cumulative counts.

deaths: COVID-19 related deaths.

recovered: Recovered cases.

Demographic dataset demographics_de.csv:

state, gender, age_group: same as above. The demographic data is available in higher age resolution, but I have binned it here to match the corresponding age groups in the covid_de.csv file.

population: Population counts for the respective categories. These numbers reflect the (most recent available) estimates on 2018-12-31.

Vaccination progress dataset covid_de_vaccines.csv:

date: calendar date of vaccination

doses, doses_first, doses_second: Daily count of administered doses: total, 1st shot, 2nd shot.

pfizer_cumul, moderna_cumul, astrazeneca_cumul: Daily cumulative number of administered vaccinations by manufacturer.

persons_first_cumul, persons_full_cumul: Daily cumulative number of people having received their 1st shot and full vaccination, respectively.

Acknowledgements

All the data have been extracted from open data sources which are being gratefully acknowledged:

The [Robert ...

Target Corporation

kaggle.com

zip

Updated Mar 25, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Ujjwal Mishra (2024). Target Corporation [Dataset]. https://www.kaggle.com/datasets/ujjwalinsights/target-case-study-using-sql/data

Explore at:

zip(50219115 bytes)Available download formats

Dataset updated

Mar 25, 2024

Authors

Ujjwal Mishra

License

https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

Description

Context:

Target is a globally renowned brand and a prominent retailer in the United States. Target makes itself a preferred shopping destination by offering outstanding value, inspiration, innovation and an exceptional guest experience that no other retailer can deliver.

This particular business case focuses on the operations of Target in Brazil and provides insightful information about 100,000 orders placed between 2016 and 2018. The dataset offers a comprehensive view of various dimensions including the order status, price, payment and freight performance, customer location, product attributes, and customer reviews.

By analyzing this extensive dataset, it becomes possible to gain valuable insights into Target's operations in Brazil. The information can shed light on various aspects of the business, such as order processing, pricing strategies, payment and shipping efficiency, customer demographics, product characteristics, and customer satisfaction levels.

Dataset: https://drive.google.com/drive/folders/1TGEc66YKbD443nslRi1bWgVd238gJCnb

The data is available in 8 csv files:

customers.csv
sellers.csv
order_items.csv
geolocation.csv
payments.csv
reviews.csv
orders.csv
products.csv

The column description for these csv files is given below. Certainly! Here are separate tables for each CSV file:

customers.csv:

Feature	Description
customer_id	ID of the consumer who made the purchase
customer_unique_id	Unique ID of the consumer
customer_zip_code_prefix	Zip Code of consumer’s location
customer_city	Name of the City from where order is made
customer_state	State Code from where order is made (Eg. São Paulo - SP)

sellers.csv:

Feature	Description
seller_id	Unique ID of the seller registered
seller_zip_code_prefix	Zip Code of the seller’s location
seller_city	Name of the City of the seller
seller_state	State Code (Eg. São Paulo - SP)

order_items.csv:

Feature	Description
order_id	A Unique ID of order made by the consumers
order_item_id	A Unique ID given to each item ordered in the order
product_id	A Unique ID given to each product available on the site
seller_id	Unique ID of the seller registered in Target
shipping_limit_date	The date before which the ordered product must be shipped
price	Actual price of the products ordered
freight_value	Price rate at which a product is delivered from one point to another

geolocations.csv:

Feature	Description
geolocation_zip_code_prefix	First 5 digits of Zip Code
geolocation_lat	Latitude
geolocation_lng	Longitude
geolocation_city	City
geolocation_state	State

payments.csv:

Feature	Description
order_id	A Unique ID of order made by the consumers
payment_sequential	Sequences of the payments made in case of EMI
payment_type	Mode of payment used (Eg. Credit Card)
payment_installments	Number of installments in case of EMI purchase
payment_value	Total amount paid for the purchase order

**orders.csv:...

Geolocation Data [Longitude Latitude]
kaggle.com
zip
Updated Mar 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
You Sheng (2022). Geolocation Data [Longitude Latitude] [Dataset]. https://www.kaggle.com/liewyousheng/geolocation
Explore at:
zip(3563856 bytes)Available download formats
Dataset updated
Mar 12, 2022
Authors
You Sheng
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.

Each CSV has the 1. Longitude 2. Latitude

of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code

Content

Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061

Last Updated On : 29th January 2022

Source

https://github.com/dr5hn/countries-states-cities-database
Looker Ecommerce BigQuery Dataset
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mustafa Keser (2024). Looker Ecommerce BigQuery Dataset [Dataset]. https://www.kaggle.com/datasets/mustafakeser4/looker-ecommerce-bigquery-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mustafa Keser
Description
Looker Ecommerce Dataset Description

CSV version of Looker Ecommerce Dataset.

Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.

1. distribution_centers.csv

Columns:

id: Unique identifier for each distribution center.

name: Name of the distribution center.

latitude: Latitude coordinate of the distribution center.

longitude: Longitude coordinate of the distribution center.

2. events.csv

Columns:

id: Unique identifier for each event.

user_id: Identifier for the user associated with the event.

sequence_number: Sequence number of the event.

session_id: Identifier for the session during which the event occurred.

created_at: Timestamp indicating when the event took place.

ip_address: IP address from which the event originated.

city: City where the event occurred.

state: State where the event occurred.

postal_code: Postal code of the event location.

browser: Web browser used during the event.

traffic_source: Source of the traffic leading to the event.

uri: Uniform Resource Identifier associated with the event.

event_type: Type of event recorded.

3. inventory_items.csv

Columns:

id: Unique identifier for each inventory item.

product_id: Identifier for the associated product.

created_at: Timestamp indicating when the inventory item was created.

sold_at: Timestamp indicating when the item was sold.

cost: Cost of the inventory item.

product_category: Category of the associated product.

product_name: Name of the associated product.

product_brand: Brand of the associated product.

product_retail_price: Retail price of the associated product.

product_department: Department to which the product belongs.

product_sku: Stock Keeping Unit (SKU) of the product.

product_distribution_center_id: Identifier for the distribution center associated with the product.

4. order_items.csv

Columns:

id: Unique identifier for each order item.

order_id: Identifier for the associated order.

user_id: Identifier for the user who placed the order.

product_id: Identifier for the associated product.

inventory_item_id: Identifier for the associated inventory item.

status: Status of the order item.

created_at: Timestamp indicating when the order item was created.

shipped_at: Timestamp indicating when the order item was shipped.

delivered_at: Timestamp indicating when the order item was delivered.

returned_at: Timestamp indicating when the order item was returned.

5. orders.csv

Columns:

order_id: Unique identifier for each order.

user_id: Identifier for the user who placed the order.

status: Status of the order.

gender: Gender information of the user.

created_at: Timestamp indicating when the order was created.

returned_at: Timestamp indicating when the order was returned.

shipped_at: Timestamp indicating when the order was shipped.

delivered_at: Timestamp indicating when the order was delivered.

num_of_item: Number of items in the order.

6. products.csv

Columns:

id: Unique identifier for each product.

cost: Cost of the product.

category: Category to which the product belongs.

name: Name of the product.

brand: Brand of the product.

retail_price: Retail price of the product.

department: Department to which the product belongs.

sku: Stock Keeping Unit (SKU) of the product.

distribution_center_id: Identifier for the distribution center associated with the product.

7. users.csv

Columns:

id: Unique identifier for each user.

first_name: First name of the user.

last_name: Last name of the user.

email: Email address of the user.

age: Age of the user.

gender: Gender of the user.

state: State where t...

Global EV Charging Stations Dataset

kaggle.com

zip

Updated Nov 2, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Rishee Panchal (2025). Global EV Charging Stations Dataset [Dataset]. https://www.kaggle.com/datasets/risheepanchal/global-ev-charging-stations-dataset

Explore at:

zip(471052 bytes)Available download formats

Dataset updated

Nov 2, 2025

Authors

Rishee Panchal

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Overview

This dataset provides a comprehensive snapshot of over 50,000 electric vehicle (EV) charging stations worldwide, scraped from the OpenChargeMap public API in November 2025. It focuses on key details for each station, including location, operator, status, and connector types. The data is ideal for geospatial analysis, infrastructure planning, EV routing apps, predictive modeling (e.g., availability forecasting), or sustainability studies.

Total Records: ~50,000
Geographic Coverage: Global, with heavy representation in Europe (e.g., France, Spain), North America (e.g., US, Canada), and emerging markets (e.g., Azerbaijan, Tunisia).
Time Period: Stations added or updated as of November 2025; the date_added column indicates when each record was entered into the source database.
Data Source: OpenChargeMap.org (official public API). This is community-contributed and crowdsourced data, so it may include user-submitted updates.
License: CC0 (Public Domain) – Free to use, share, and modify without restrictions. Always credit OpenChargeMap for ethical use.
File Format: CSV (comma-separated values)

Dataset Columns

Column	Description
`id`	Unique station ID
`title`	Station name (e.g., "Electra - Wambrechies")
`address`	Street address
`town`	City/town
`state`	State or province (may be blank)
`postcode`	ZIP/postal code
`country`	ISO 3166-1 alpha-2 code (e.g., FR = France)
`lat`, `lon`	GPS coordinates (WGS84)
`operator`	Charging network (e.g., Tesla, Electra)
`status`	"Operational", "Not Operational", etc.
`num_connectors`	Number of charging plugs
`connector_types`	Plug types (e.g., "CCS (Type 2)\|Type 2")
`date_added`	When station was added (UTC)

Country Codes

The country column uses standard 2-letter ISO codes. Based on the dataset, here are the most common ones with full names (not exhaustive; query the data for all unique values):

Code	Country Name
FR	France
ES	Spain
IT	Italy
US	United States
FI	Finland
TR	Turkey
BR	Brazil
BE	Belgium
AT	Austria
HU	Hungary
AM	Armenia
AZ	Azerbaijan
TN	Tunisia
RS	Serbia
NL	Netherlands
PL	Poland

Connector Types

Type	Description
CCS (Type 1)	CCS for North America (J1772 combo)
CCS (Type 2)	Combined Charging System (Europe/Asia standard)
Type 2 (Socket Only)	AC charging socket (Mennekes)
Type 2 (Tethered Connector)	Cable-attached Type 2
CHAdeMO	Japanese DC fast charging standard
Tesla	Proprietary Tesla connector
CEE 7/4 - Schuko - Type F	Household socket (Europe).
Unknown	Unspecified or rare type

Keywords: EV charging stations, electric vehicle infrastructure, OpenChargeMap, global EV chargers, CCS, Type 2, CHAdeMO, Tesla Supercharger, geospatial data, sustainability, green mobility, urban planning, climate action, 2025 EV dataset, public charging, fast charging, location intelligence, energy transition, zero emissions, clean transport

US Births by County and State
kaggle.com
zip
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Births by County and State [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-births-by-county-and-state
Explore at:
zip(3159011 bytes)Available download formats
Dataset updated
Jan 22, 2023
Authors
The Devastator
Area covered
United States
Description
US Births by County and State

1985-2015 Aggregated Data

By data.world's Admin [source]

About this dataset

This dataset contains an aggregation of birth data from the United Statesbetween 1985 and 2015. It consists of information on mothers' locations by state (including District of Columbia) and county, as well as information such as the month they gave birth, and aggregates giving the sum of births during that month. This data has been provided by both the National Bureau for Economic Research and National Center for Health Statistics, whose shared mission is to understand how life works in order to aid individuals in making decisions about their health and wellbeing. This dataset provides valuable insight into population trends across time and location - for example, which states have higher or lower birthrates than others? Which counties experience dramatic fluctuations over time? Given its scope, this dataset could be used in a number of contexts--from epidemiology research to population forecasting. Be sure to check out our other datasets related to births while you're here!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset could be used to examine local trends in birth rates over time or analyze births at different geographical locations. In order to maximize your use of this dataset, it is important that you understand what information the various columns contain.

The main columns are: State (including District of Columbia), County (coded using the FIPS county code number), Month (numbering from 1 for January through 12 for December), Year (4-digit year) countyBirths (calculated sum of births that occurred to mothers living in a county for a given month) and stateBirths (calculated sum of births that occurred to mothers living in a state for a given month). These fields should provide enough information for you analyze trends across geographic locations both at monthly and yearly levels. You could also consider combining variables such as Year with State or Year with Month or any other grouping combinations depending on your analysis goal.

In addition, while all data were downloaded on April 5th 2017, it is worth noting that all sources used followed privacy guidelines as laid out by NCHC so individual births occurring after 2005 are not included due to geolocation concerns.
We hope you find this dataset useful and can benefit from its content! With proper understanding of what each field contains, we are confident you will gain valuable insights on birth rates across counties within the United States during this period

Research Ideas

Establishing county-level trends in birth rates for the US over time.

Analyzing the relationship between month of birth and health outcomes for US babies after they are born (e.g., infant mortality, neurological development, etc.).

Comparing state/county-level differences in average numbers of twins born each year

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

File: allBirthData.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------| | State | The numerical order of the state where the mother lives. (Integer) | | Month | The month in which the birth took place. (Integer) | | Year | The year of the birth. (Integer) | | countyBirths | The calculated sum of births that occurred to mothers living in that county for that particular month. (Integer) | | stateBirths | The aggregate number at the level of entire states for any given month-year combination. (Integer) | | County | The county where the mother lives, coded using FIPS County Code. (Integer) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.
Income and Urban VS Rural For Each County in USA
kaggle.com
zip
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Mohamed (2025). Income and Urban VS Rural For Each County in USA [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/income-urban-vs-rural-for-each-county
Explore at:
zip(56670 bytes)Available download formats
Dataset updated
Jan 12, 2025
Authors
Ahmed Mohamed
Area covered
United States
Description
Income and Urban vs. Rural Population Dataset

Overview

This dataset provides insights into the population distribution and income levels across counties in the United States, with a classification of counties as either "Urban" or "Rural." The data was sourced from the U.S. Census Bureau's 2023 American Community Survey (ACS).

Methodology

Data Source:

API Endpoint: U.S. Census Bureau ACS 2023 API

Variables:

B01003_001E: Total population.

B19013_001E: Median household income.

Processing:

Counties were classified as "Urban" if their population was above the median population; otherwise, they were classified as "Rural."

FIPS codes were generated by concatenating State and County FIPS codes.

Columns:

County: County name.

State: State name.

FIPS: Combined state and county FIPS code.

State FIPS Code: State's Federal Information Processing Standard code.

County FIPS Code: County's FIPS code.

Total Population: Total population of the county.

Median Household Income: Median household income for the county.

Urban-Rural: Classification based on population (Urban or Rural).

Usage

This dataset can be used for: - Urban vs. rural demographic and economic analysis. - Income distribution studies. - Data visualization and mapping using FIPS codes.

License

This dataset is provided under the public domain. Proper attribution to the U.S. Census Bureau is appreciated.
1975-2021 Railroad Accidents
kaggle.com
zip
Updated Feb 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
koenry (2022). 1975-2021 Railroad Accidents [Dataset]. https://www.kaggle.com/datasets/koenry/1975-2021-railroad-accidents/discussion
Explore at:
zip(53484003 bytes)Available download formats
Dataset updated
Feb 28, 2022
Authors
koenry
Description
A dataset which I collected from: https://safetydata.fra.dot.gov/officeofsafety/publicsite/on_the_fly_download.aspx I did not find a year range to download the data from so I have downloaded it manually and imported it to a database and exported the full 1975-2021 year .csv file

I have also uploaded a .csv with changed states because the original one uses state codes. So I have wrote a a python script which changes the coresponding state code to its state name: https://github.com/koenry/dataScience_Project_Railroad_Accidents/blob/main/code/changeStates.py This data is provided by https://safetydata.fra.dot.gov/ and I do not own it. All I did is make my data analysis project little bit easier so I thought I would share this with everyone!
US Unemployment Rates per State: 2017-2021
kaggle.com
zip
Updated Dec 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pascal Eissler (2022). US Unemployment Rates per State: 2017-2021 [Dataset]. https://www.kaggle.com/datasets/pasicebear/us-unemployment-rates-per-state-20172021
Explore at:
zip(13752 bytes)Available download formats
Dataset updated
Dec 28, 2022
Authors
Pascal Eissler
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
This is monthly US unemployment rate data from January 2017 to November 2022. The datasets were curated from the Federal Reserve Economic Data that can be found here.

Definition of Unemployment Rate :

"The unemployment rate represents the number of unemployed as a percentage of the labour force. Labour force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces." (Source: FRED website)

Files

There are two datasets. The first dataset contains the total US unemployment rate and the second dataset contains unemployment rates per US state.

unemployment_rate_us.csv

unemployment_us - This is the total seasonally adjusted US unemployment rate in percent. You can find the data source here. first_day_of_month - The date of the first day of the month.

unemployment_rates.csv

first_day_of_month - The date of the first day of the month. state - The name of the state. unemployment_rate - This is the seasonally adjusted unemployment rate per US state in percent. You can find the data source here.

Feel free to let me know if you have any open questions with regard to the dataset.

Happy data science! ;)
Malaysia COVID-19 Data - Apr 2021
kaggle.com
zip
Updated May 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ansonnn (2021). Malaysia COVID-19 Data - Apr 2021 [Dataset]. https://www.kaggle.com/ansonnn/malaysia-covid19-data-apr-2021
Explore at:
zip(1141528 bytes)Available download formats
Dataset updated
May 5, 2021
Authors
Ansonnn
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Area covered
Malaysia
Description
Context

The dataset consists of COVID-19 cases in Malaysia from 27 March 2020 to 15 April 2021. This dataset is collected for the purpose of creating better visualizations for the COVID-19 cases in Malaysia. All of the data is web scraped from https://kpkesihatan.com/ by using BeautifulSoup library.

The data is also available in GitHub, along with the scripts made to scrape the data. There is also a Web Application made to show the visualizations.

Originally I planned to update the data daily but I find that it seems too tedious for me to do this alone without some sort of automated scripts or schedulers. I have been wondering how to do this efficiently with automation or schedulers, if someone knows how to do this efficiently, please reach out to me by emailing or message in LinkedIn, the links can be found in my GitHub, thank you very much.

Content

There are three CSV files and one GeoJSON file: - all_2020-03-27_2021-04-15.csv: all daily cases excluding state data - state_all.csv: all daily cases for each state - state_cumu.csv: all daily cumulative cases for each state - malaysia_state_province_boundary.geojson: Malaysia's GeoJSON map file

The columns consist of: 1. Date 2. Recovered 3. Cumulative Recovered 4. Imported Case (many NaN values till the end of 2020) 5. Local Case (many NaN values) 6. Active Case (many NaN values but can be inferred) 7. New Case 8. Cumulative Case 9. ICU - Number of patients admitted into Intensive Care Unit 10. Ventilator - Number of patients who need ventilator in ICU 11. Death 12. Cumulative Death 13. URL - link to the original webpage

Acknowledgements

Thanks to Info GIS MAP.com that provides Malaysia's GeoJSON file to create Choropleth maps.

Inspiration

Hopefully, there will be people utilizing the scripts or the data to create better visualizations.
Trips by Distance (US)
kaggle.com
Updated Aug 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adel Anseur (2023). Trips by Distance (US) [Dataset]. https://www.kaggle.com/datasets/adelanseur/trips-by-distance
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2023
Dataset provided by
Kaggle
Authors
Adel Anseur
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

The Daily Travel data and number of people staying home and not staying home are estimated for the Bureau of Transportation Statistics by the Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland.

The daily travel estimates are from a mobile device data panel from merged multiple data sources that address the geographic and temporal sample variation issues often observed in a single data source. The merged data panel only includes mobile devices whose anonymized location data meet a set of data quality standards, which further ensures the overall data quality and consistency. The data quality standards consider both temporal frequency and spatial accuracy of anonymized location point observations, temporal coverage and representativeness at the device level, spatial representativeness at the sample and county level, etc. A multi-level weighting method that employs both device and trip-level weights expands the sample to the underlying population at the county and state levels, before travel statistics are computed.

Notes

These data are experimental and may not meet all of our quality standards. Experimental data products are created using new data sources or methodologies that benefit data users in the absence of other relevant products. We are seeking feedback from data users and stakeholders on the quality and usefulness of these new products. Experimental data products that meet our quality standards and demonstrate sufficient user demand may enter regular production if resources permit.

Data in the charts and graphs above is updated weekly on Mondays. The data lags one week behind the current date.

Data analysis is conducted at the aggregate national, state, and county levels. To assure confidentiality and support data quality, no data are reported for a county if it has fewer than 50 devices in the sample on any given day.

Trips are defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. A movement with multiple stays of longer than 10 minutes before returning home is counted as multiple trips.

Key Features

1.Level : Indicates National, State, or County level metrics.

2.Date : The date when the data was recorded.

3.State FIPS : A two-digit code representing the FIPS state code.

4.State Postal Code : State postal code.

5.County FIPS : Five-digit FIPS county code.

6.County Name : County name.

7.Population Staying at Home : Number of residents staying at home, i.e., persons who make no trips with a trip end more than one mile away from home.

8.Population Not Staying at Home : Number of residents not staying at home.

9.Number of Trips : Number of trips made by residents, i.e., movements that include a stay of longer than 10 minutes at an anonymized location away from home.

10.Number of Trips <1 : Number of trips by residents shorter than one mile.

11.Number of Trips 1-3 : Number of trips by residents greater than one mile and shorter than 3 miles (1 ≤ trip distance < 3 miles).

12.Number of Trips 3-5 : Number of trips by residents greater than 3 miles and shorter than 5 miles (3 ≤ trip distance < 5 miles).

13.Number of Trips 5-10 : Number of trips by residents greater than 5 miles and shorter than 10 miles (5 ≤ trip distance < 10 miles).

14.Number of Trips 10-25 : Number of trips by residents greater than 10 miles and shorter than 25 miles (10 ≤ trip distance < 25 miles).

15.Number of Trips 25-50 : Number of trips by residents greater than 25 miles and shorter than 50 miles (25 ≤ trip distance < 50 miles).

16.Number of Trips 50-100 : Number of trips by residents greater than 50 miles and shorter than 100 miles (50 ≤ trip distance < 100 miles).

17.Number of Trips 100-250 : Number of trips by residents greater than 100 miles and shorter than 250 miles (100 ≤ trip distance < 250 miles).

18.Number of Trips 250-500 : Number of trips by residents greater than 250 miles and shorter than 500 miles (250 ≤ trip distance < 500 miles).

19.Number of Trips >=500 : Number of trips by residents greater than 500 miles (trip distance ≥ 500 miles).

20.Row ID : Unique row identifier.

21.Week : The week number corresponding to the recorded date.

22.Month : The month number corresponding to the recorded date.

If this was helpful, a vote is appreciated 😄!
E-commerce overview Dashboard
kaggle.com
zip
Updated Dec 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anand Singh (2022). E-commerce overview Dashboard [Dataset]. https://www.kaggle.com/datasets/anandku79/ecommerce-overview-dashboard/code
Explore at:
zip(348406 bytes)Available download formats
Dataset updated
Dec 10, 2022
Authors
Anand Singh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Well, this dataset contains raw and cleaned data where we have added state code and used some lookup functions to clean this dataset. It includes 4 files where 2 are CSV and 1 is png and dashboard

Dataset link - https://www.kaggle.com/datasets/benroshan/ecommerce-data
CBSE Schools Data
kaggle.com
zip
Updated Mar 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DebarghyaDas (2018). CBSE Schools Data [Dataset]. https://www.kaggle.com/deedydas/cbse-schools-data
Explore at:
zip(3461286 bytes)Available download formats
Dataset updated
Mar 26, 2018
Authors
DebarghyaDas
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
CBSE Schools Data

CBSE is one of the two national level boards of education in India (alongwith CISCE). While CISCE is a private board, CBSE is public, central government run board. Every year, over 1 million students take the CBSE Class XII (12) board examination as a high school leaving examination in India (and several schools abroad).

The full repository on Github is at cbse_schools_data.

The sister repository to this, contains the cisce_schools_data.

As of 2018, there are 20,367 schools affiliated with the CBSE (out of which only 220 are outside India). The details of each of these schools can be fetched from the CBSE School Directory. Here is an example URL endpoint of the school DPS RK Puram (aff_no = 2730017). You can replace the affno parameter with any Affiliation number to see the original raw data.

Instructions

The main contribution of this project is to scrape, parse, clean, document, dump and open the data for all of these schools. The scraping, parsing and cleaning code is not in this repository.

README_DATA_BASIC contains a protocol buffer like documentation for the basic data (in the basic/ folder). Lists each of the fields, including which ones are required and optional, the degree to which the optional ones are present, as well as the type and enum definitions of each field.

README_DATA_DETAILED contains a protocol buffer like documentation for the detailed data (in the detailed/ folder). Lists each of the fields, including which ones are required and optional, the degree to which the optional ones are present, as well as the type and enum definitions of each field.

README_DISTRICTS contains details of the district (alongwith state enums)

basic/ The basic data containing the primary 25 fields.

analyze_csv.py reads the csv file in Python and prepares it for analysis.

schools.csv the csv file - 6.1MB.

analyze_pickle.py reads the pickle file in Python and prepares it for analysis.

schools.p the pickle file - 9.8MB.

detailed/ The detailed data containing the primary 25 fields and the 119 detailed fields for a total of 144 fields.

analyze_csv.py reads the csv file in Python and prepares it for analysis.

schools_detailed.csv the detailed csv file - 12MB.

analyze_pickle.py reads the pickle file in Python and prepares it for analysis.

schools_detailed.p the detailed pickle file - 26.7MB.

Short Documentation (Basic)

There are 25 total fields per school, a total of ~510k data points. For full documentation, see README_DATA_BASIC.

required string name School name in upper case

required int32 aff_no Affiliation number, unique

required State state Indian State/Union Territory or "Foreign Schools"

optional District district Indian District (or Country if state == FOREIGN SCHOOlS)

required CbseRegion region One of the 10 CBSE regions this school is in the jurisdiction of.

required string address Postal Address

optional int32 pincode Indian pincode

optional string ph_no Phone number (with STD Code). ';' Separated phone-numbers.

optional string off_ph_no Office phone number. ';' Separated phone-numbers.

optional string res_ph_no Residential phone number. ';' Separated phone-numbers.

optional string fax_no Fax number. ';' Separated numbers.

optional string email Email address

optional string website Website

optional int32 year_found Year that the school was founded (between 1800 and 2018)

optional Date date_opened Date that the school was opened (in form "Sep 9, 2010")

optional string princi_name Name of the principal, upper case

optional Sex sex Gender/sex of the school/principal (unclear?).

optional int32 princi_qual Qualifications of the principal

optional int32 princi_exp_adm Number of years of administrative experience of the principal

optional int32 princi_exp_teach Number of years of teaching experience of the principal

required Status status Status of the school - e.g. Middle Class, Secondary or Senior Secondary

optional AffiliationType aff_type Affiliation Type e.g. Provisional, Permanent

optional Date aff_start Affiliation start date (in form "Sep 9, 2010")

optional Date aff_end Affiliation end date (in form "Sep 19, 2011")

optional string soc_name Name of Trust, Society or Managing Committee, upper case

Short Documentation (Detailed)

There are 144 tot...
Neighbor_District_India
kaggle.com
zip
Updated Oct 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sasikr (2021). Neighbor_District_India [Dataset]. https://www.kaggle.com/sasikr/neighbor-district-india
Explore at:
zip(18776 bytes)Available download formats
Dataset updated
Oct 18, 2021
Authors
Sasikr
Area covered
India
Description
The dataset consists of districts of India and their neighboring districts. The district is according to data from Covid19 data of India . A neighbor of a larger district is a combination of all the neighbors of its components. State code and district code are from vaccination data as their ids. Vaccination Data District name and their id is updated with a new name as per lastest change in India
Samsclub Stores MX
kaggle.com
zip
Updated Jun 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerardo Jaime Escareño (2024). Samsclub Stores MX [Dataset]. https://www.kaggle.com/datasets/gerardojaimeescareo/samsclub-stores-mx
Explore at:
zip(10702 bytes)Available download formats
Dataset updated
Jun 16, 2024
Authors
Gerardo Jaime Escareño
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This CSV file contains detailed information about Sam's Club stores located in Mexico. Each row represents a single store, and the columns provide various attributes and details about the stores. Below is a description of the columns included in the CSV file

Content

storeId - A unique identifier assigned to each store.

name - The official name of the store.

address1 - The street address of the store, including street number and name.

city - The city where the store is located.

state - The state within Mexico where the store is located.

postalCode - The postal code of the store's location.

phoneNumber - The contact phone number for the store.

hours - The store's hours of operation, including opening and closing times.

latitude - The geographical latitude of the store's location, useful for mapping.

longitude - The geographical longitude of the store's location, useful for mapping.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

corochann (2020). USA State code [Dataset]. https://www.kaggle.com/datasets/corochann/usa-state-code

USA State code

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

zip(1393 bytes)Available download formats

Dataset updated

Mar 29, 2020

Authors

corochann

Area covered

United States

Description

Dataset

This dataset was created by corochann

Clear search

Close search

Google apps

Main menu

USA State code

Dataset

Contents

US state county name & codes

Context

Content

Acknowledgements

Inspiration

2019 Census US Population Data By State

Context

Content

Acknowledgements

US Geographic Codes Dataset

US Geographic Codes Dataset

The Data

Sources

Disclaimers

U.S. Education Datasets: Unification Project

Context

Contents

Column Breakdown

Identification

Enrollment

Financials

Academic Achievement

Data Processing

Methodology Notes

Acknowledgements

Licensing Notes

COVID-19 Tracking Germany

Context

Content

Column Description

Acknowledgements

Target Corporation

Geolocation Data [Longitude Latitude]

Context

Content

Source

Looker Ecommerce BigQuery Dataset

Looker Ecommerce Dataset Description

1. distribution_centers.csv

2. events.csv

3. inventory_items.csv

4. order_items.csv

5. orders.csv

6. products.csv

7. users.csv

Global EV Charging Stations Dataset

Overview

Dataset Columns

Country Codes

Connector Types

US Births by County and State

US Births by County and State

1985-2015 Aggregated Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Income and Urban VS Rural For Each County in USA

Income and Urban vs. Rural Population Dataset

Overview

Methodology

Usage

License

1975-2021 Railroad Accidents

US Unemployment Rates per State: 2017-2021

Definition of Unemployment Rate :

Files

Malaysia COVID-19 Data - Apr 2021

Context

Content

Acknowledgements

Inspiration

1. `distribution_centers.csv`

2. `events.csv`

3. `inventory_items.csv`

4. `order_items.csv`

5. `orders.csv`

6. `products.csv`

7. `users.csv`