Facebook
TwitterThis dataset was created by corochann
Facebook
TwitterThere is no story behind this data.
These are just supplementary datasets which I plan on using for plotting county wise data on maps.. (in particular for using with my kernel : https://www.kaggle.com/stansilas/maps-are-beautiful-unemployment-is-not/)
As that data set didn't have the info I needed for plotting an interactive map using highcharter .
Since I noticed that most demographic datasets here on Kaggle, either have state code, state name, or county name + state name but not all of it i.e county name, fips code, state name + state code.
Using these two datasets one can get any combination of state county codes etc.
States.csv has State name + code
US counties.csv has county wise data.
Picture : https://unsplash.com/search/usa-states?photo=-RO2DFPl7wE
Counties : https://www.census.gov/geo/reference/codes/cou.html
State :
Not Applicable.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
https://www.kaggle.com/peretzcohen/us-vaccine-status-by-state
This population data is pulled from the 2019 US Census and is here along with latitude and longitude data for each states' capital city
Population Data - https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html Location Data - https://github.com/jasperdebie/VisInfo/blob/master/us-state-capitals.csv
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This code is used to generate a combined data set of US ZIP, FIPS, and County data for most ZIP Codes in the U.S. (41,867 to be exact).
Code to generate the data set from the government files listed below can be found here.
The dataset is organized as follows:
The data used to create this data set was taken from several recent government data sets.
These are:
The final csv is in 'latin1' encoding to preserve the Spanish county names in Puerto Rico.
This data is from, and shall remain in the public domain, and the onus of responsibility lies with the user of this data.
Facebook
TwitterAuthor's Note 2019/04/20: Revisiting this project, I recently discovered the incredibly comprehensive API produced by the Urban Institute. It achieves all of the goals laid out for this dataset in wonderful detail. I recommend that users interested pay a visit to their site.
This dataset is designed to bring together multiple facets of U.S. education data into one convenient CSV (states_all.csv).
states_all.csv:
The primary data file. Contains aggregates from all state-level sources in one CSV.
output_files/states_all_extended.csv:
The contents of states_all.csv with additional data related to race and gender.
PRIMARY_KEY: A combination of the year and state name.YEARSTATEA breakdown of students enrolled in schools by school year.
GRADES_PK: Number of students in Pre-Kindergarten education.
GRADES_4: Number of students in fourth grade.
GRADES_8: Number of students in eighth grade.
GRADES_12: Number of students in twelfth grade.
GRADES_1_8: Number of students in the first through eighth grades.
GRADES 9_12: Number of students in the ninth through twelfth grades.
GRADES_ALL: The count of all students in the state. Comparable to ENROLL in the financial data (which is the U.S.
Census Bureau's estimate for students in the state).
The extended version of states_all contains additional columns that breakdown enrollment by race and gender. For example:
G06_A_A: Total number of sixth grade students.
G06_AS_M: Number of sixth grade male students whose ethnicity was classified as "Asian".
G08_AS_A_READING: Average reading score of eighth grade students whose ethnicity was classified as "Asian".
The represented races include AM (American Indian or Alaska Native), AS (Asian), HI (Hispanic/Latino), BL (Black or African American), WH (White), HP (Hawaiian Native/Pacific Islander), and TR (Two or More Races). The represented genders include M (Male) and F (Female).
A breakdown of states by revenue and expenditure.
ENROLL: The U.S. Census Bureau's count for students in the state. Should be comparable to GRADES_ALL (which is the
NCES's estimate for students in the state).
TOTAL REVENUE: The total amount of revenue for the state.
FEDERAL_REVENUESTATE_REVENUELOCAL_REVENUETOTAL_EXPENDITURE: The total expenditure for the state.
INSTRUCTION_EXPENDITURESUPPORT_SERVICES_EXPENDITURE
CAPITAL_OUTLAY_EXPENDITURE
OTHER_EXPENDITURE
A breakdown of student performance as assessed by the corresponding exams (math and reading, grades 4 and 8).
AVG_MATH_4_SCORE: The state's average score for fourth graders taking the NAEP math exam.
AVG_MATH_8_SCORE: The state's average score for eight graders taking the NAEP math exam.
AVG_READING_4_SCORE: The state's average score for fourth graders taking the NAEP reading exam.
AVG_READING_8_SCORE: The state's average score for eighth graders taking the NAEP reading exam.
The original sources can be found here:
# Enrollment https://nces.ed.gov/ccd/stnfis.asp # Financials https://www.census.gov/programs-surveys/school-finances/data/tables.html # Academic Achievement https://www.nationsreportcard.gov/ndecore/xplore/NDE
Data was aggregated using a Python program I wrote. The code (as well as additional project information) can be found [here][1].
Spreadsheets for NCES enrollment data for 2014, 2011, 2010, and 2009 were modified to place key data on the same sheet, making scripting easier.
The column 'ENROLL' represents the U.S. Census Bureau data value (financial data), while the column 'GRADES_ALL' represents the NCES data value (demographic data). Though the two organizations correspond on this matter, these values (which are ostensibly the same) do vary. Their documentation chalks this up to differences in membership (i.e. what is and is not a fourth grade student).
Enrollment data from NCES has seen a number of changes across survey years. One of the more notable is that data on student gender does not appear to have been collected until 2009. The information in states_all_extended.csv reflects this.
NAEP test score data is only available for certain years
The current version of this data is concerned with state-level patterns. It is the author's hope that future versions will allow for school district-level granularity.
Data is sourced from the U.S. Census Bureau and the National Center for Education Statistics (NCES).
The licensing of these datasets state that it must not be us...
Facebook
TwitterRead the associated blogpost for a detailed description of how this dataset was prepared; plus extra code for producing animated maps.
The 2019 Novel Coronavirus (COVID-19) continues to spread in countries around the world. This dataset provides daily updated number of reported cases & deaths in Germany on the federal state (Bundesland) and county (Landkreis/Stadtkreis) level. In April 2021 I added a dataset on vaccination progress. In addition, I provide geospatial shape files and general state-level population demographics to aid the analysis.
The dataset consists of thre main csv files: covid_de.csv, demgraphics_de.csv, and covid_de_vaccines.csv. The geospatial shapes are included in the de_state.* files. See the column descriptions below for more detailed information.
covid_de.csv: COVID-19 cases and deaths which will be updated daily. The original data are being collected by Germany's Robert Koch Institute and can be download through the National Platform for Geographic Data (the latter site also hosts an interactive dashboard). I reshaped and translated the data (using R tidyverse tools) to make it better accessible. This blogpost explains how I prepared the data, and describes how to produces animated maps.
demographics_de.csv: General Demographic Data about Germany on the federal state level. Those have been downloaded from Germany's Federal Office for Statistics (Statistisches Bundesamt) through their Open Data platform GENESIS. The data reflect the (most recent available) estimates on 2018-12-31. You can find the corresponding table here.
covid_de_vaccines.csv: In April 2021 I added this file that contains the Covid-19 vaccination progress for Germany as a whole. It details daily doses, broken down cumulatively by manufacturer, as well as the cumulative number of people having received their first and full vaccination. The earliest data are from 2020-12-27.
de_state.*: Geospatial shape files for Germany's 16 federal states. Downloaded via Germany's Federal Agency for Cartography and Geodesy . Specifically, the shape file was obtained from this link.
COVID-19 dataset covid_de.csv:
state: Name of the German federal state. Germany has 16 federal states. I removed converted special characters from the original data.
county: The name of the German Landkreis (LK) or Stadtkreis (SK), which correspond roughly to US counties.
age_group: The COVID-19 data is being reported for 6 age groups: 0-4, 5-14, 15-34, 35-59, 60-79, and above 80 years old. As a shortcut the last category I'm using "80-99", but there might well be persons above 99 years old in this dataset. This column has a few NA entries.
gender: Reported as male (M) or female (F). This column has a few NA entries.
date: The calendar date of when a case or death were reported. There might be delays that will be corrected by retroactively assigning cases to earlier dates.
cases: COVID-19 cases that have been confirmed through laboratory work. This and the following 2 columns are counts per day, not cumulative counts.
deaths: COVID-19 related deaths.
recovered: Recovered cases.
Demographic dataset demographics_de.csv:
state, gender, age_group: same as above. The demographic data is available in higher age resolution, but I have binned it here to match the corresponding age groups in the covid_de.csv file.
population: Population counts for the respective categories. These numbers reflect the (most recent available) estimates on 2018-12-31.
Vaccination progress dataset covid_de_vaccines.csv:
date: calendar date of vaccination
doses, doses_first, doses_second: Daily count of administered doses: total, 1st shot, 2nd shot.
pfizer_cumul, moderna_cumul, astrazeneca_cumul: Daily cumulative number of administered vaccinations by manufacturer.
persons_first_cumul, persons_full_cumul: Daily cumulative number of people having received their 1st shot and full vaccination, respectively.
All the data have been extracted from open data sources which are being gratefully acknowledged:
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Context:
Target is a globally renowned brand and a prominent retailer in the United States. Target makes itself a preferred shopping destination by offering outstanding value, inspiration, innovation and an exceptional guest experience that no other retailer can deliver.
This particular business case focuses on the operations of Target in Brazil and provides insightful information about 100,000 orders placed between 2016 and 2018. The dataset offers a comprehensive view of various dimensions including the order status, price, payment and freight performance, customer location, product attributes, and customer reviews.
By analyzing this extensive dataset, it becomes possible to gain valuable insights into Target's operations in Brazil. The information can shed light on various aspects of the business, such as order processing, pricing strategies, payment and shipping efficiency, customer demographics, product characteristics, and customer satisfaction levels.
Dataset: https://drive.google.com/drive/folders/1TGEc66YKbD443nslRi1bWgVd238gJCnb
The data is available in 8 csv files:
The column description for these csv files is given below. Certainly! Here are separate tables for each CSV file:
customers.csv:
| Feature | Description |
|---|---|
| customer_id | ID of the consumer who made the purchase |
| customer_unique_id | Unique ID of the consumer |
| customer_zip_code_prefix | Zip Code of consumer’s location |
| customer_city | Name of the City from where order is made |
| customer_state | State Code from where order is made (Eg. São Paulo - SP) |
sellers.csv:
| Feature | Description |
|---|---|
| seller_id | Unique ID of the seller registered |
| seller_zip_code_prefix | Zip Code of the seller’s location |
| seller_city | Name of the City of the seller |
| seller_state | State Code (Eg. São Paulo - SP) |
order_items.csv:
| Feature | Description |
|---|---|
| order_id | A Unique ID of order made by the consumers |
| order_item_id | A Unique ID given to each item ordered in the order |
| product_id | A Unique ID given to each product available on the site |
| seller_id | Unique ID of the seller registered in Target |
| shipping_limit_date | The date before which the ordered product must be shipped |
| price | Actual price of the products ordered |
| freight_value | Price rate at which a product is delivered from one point to another |
geolocations.csv:
| Feature | Description |
|---|---|
| geolocation_zip_code_prefix | First 5 digits of Zip Code |
| geolocation_lat | Latitude |
| geolocation_lng | Longitude |
| geolocation_city | City |
| geolocation_state | State |
payments.csv:
| Feature | Description |
|---|---|
| order_id | A Unique ID of order made by the consumers |
| payment_sequential | Sequences of the payments made in case of EMI |
| payment_type | Mode of payment used (Eg. Credit Card) |
| payment_installments | Number of installments in case of EMI purchase |
| payment_value | Total amount paid for the purchase order |
**orders.csv:...
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Full Database of city state country available in CSV format. All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.
Each CSV has the 1. Longitude 2. Latitude
of each location, alongside other miscellaneous country data such as 3. Currency 4. State code 5. Phone country code
Total Countries : 250 Total States/Regions/Municipalities : 4,963 Total Cities/Towns/Districts : 148,061
Last Updated On : 29th January 2022
Facebook
TwitterCSV version of Looker Ecommerce Dataset.
Overview Dataset in BigQuery TheLook is a fictitious eCommerce clothing site developed by the Looker team. The dataset contains information >about customers, products, orders, logistics, web events and digital marketing campaigns. The contents of this >dataset are synthetic, and are provided to industry practitioners for the purpose of product discovery, testing, and >evaluation. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This >means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on >this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public >datasets.
distribution_centers.csvid: Unique identifier for each distribution center.name: Name of the distribution center.latitude: Latitude coordinate of the distribution center.longitude: Longitude coordinate of the distribution center.events.csvid: Unique identifier for each event.user_id: Identifier for the user associated with the event.sequence_number: Sequence number of the event.session_id: Identifier for the session during which the event occurred.created_at: Timestamp indicating when the event took place.ip_address: IP address from which the event originated.city: City where the event occurred.state: State where the event occurred.postal_code: Postal code of the event location.browser: Web browser used during the event.traffic_source: Source of the traffic leading to the event.uri: Uniform Resource Identifier associated with the event.event_type: Type of event recorded.inventory_items.csvid: Unique identifier for each inventory item.product_id: Identifier for the associated product.created_at: Timestamp indicating when the inventory item was created.sold_at: Timestamp indicating when the item was sold.cost: Cost of the inventory item.product_category: Category of the associated product.product_name: Name of the associated product.product_brand: Brand of the associated product.product_retail_price: Retail price of the associated product.product_department: Department to which the product belongs.product_sku: Stock Keeping Unit (SKU) of the product.product_distribution_center_id: Identifier for the distribution center associated with the product.order_items.csvid: Unique identifier for each order item.order_id: Identifier for the associated order.user_id: Identifier for the user who placed the order.product_id: Identifier for the associated product.inventory_item_id: Identifier for the associated inventory item.status: Status of the order item.created_at: Timestamp indicating when the order item was created.shipped_at: Timestamp indicating when the order item was shipped.delivered_at: Timestamp indicating when the order item was delivered.returned_at: Timestamp indicating when the order item was returned.orders.csvorder_id: Unique identifier for each order.user_id: Identifier for the user who placed the order.status: Status of the order.gender: Gender information of the user.created_at: Timestamp indicating when the order was created.returned_at: Timestamp indicating when the order was returned.shipped_at: Timestamp indicating when the order was shipped.delivered_at: Timestamp indicating when the order was delivered.num_of_item: Number of items in the order.products.csvid: Unique identifier for each product.cost: Cost of the product.category: Category to which the product belongs.name: Name of the product.brand: Brand of the product.retail_price: Retail price of the product.department: Department to which the product belongs.sku: Stock Keeping Unit (SKU) of the product.distribution_center_id: Identifier for the distribution center associated with the product.users.csvid: Unique identifier for each user.first_name: First name of the user.last_name: Last name of the user.email: Email address of the user.age: Age of the user.gender: Gender of the user.state: State where t...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive snapshot of over 50,000 electric vehicle (EV) charging stations worldwide, scraped from the OpenChargeMap public API in November 2025. It focuses on key details for each station, including location, operator, status, and connector types. The data is ideal for geospatial analysis, infrastructure planning, EV routing apps, predictive modeling (e.g., availability forecasting), or sustainability studies.
| Column | Description |
|---|---|
id | Unique station ID |
title | Station name (e.g., "Electra - Wambrechies") |
address | Street address |
town | City/town |
state | State or province (may be blank) |
postcode | ZIP/postal code |
country | ISO 3166-1 alpha-2 code (e.g., FR = France) |
lat, lon | GPS coordinates (WGS84) |
operator | Charging network (e.g., Tesla, Electra) |
status | "Operational", "Not Operational", etc. |
num_connectors | Number of charging plugs |
connector_types | Plug types (e.g., "CCS (Type 2)|Type 2") |
date_added | When station was added (UTC) |
The country column uses standard 2-letter ISO codes. Based on the dataset, here are the most common ones with full names (not exhaustive; query the data for all unique values):
| Code | Country Name |
|---|---|
| FR | France |
| ES | Spain |
| IT | Italy |
| US | United States |
| FI | Finland |
| TR | Turkey |
| BR | Brazil |
| BE | Belgium |
| AT | Austria |
| HU | Hungary |
| AM | Armenia |
| AZ | Azerbaijan |
| TN | Tunisia |
| RS | Serbia |
| NL | Netherlands |
| PL | Poland |
| Type | Description |
|---|---|
| CCS (Type 1) | CCS for North America (J1772 combo) |
| CCS (Type 2) | Combined Charging System (Europe/Asia standard) |
| Type 2 (Socket Only) | AC charging socket (Mennekes) |
| Type 2 (Tethered Connector) | Cable-attached Type 2 |
| CHAdeMO | Japanese DC fast charging standard |
| Tesla | Proprietary Tesla connector |
| CEE 7/4 - Schuko - Type F | Household socket (Europe). |
| Unknown | Unspecified or rare type |
Keywords: EV charging stations, electric vehicle infrastructure, OpenChargeMap, global EV chargers, CCS, Type 2, CHAdeMO, Tesla Supercharger, geospatial data, sustainability, green mobility, urban planning, climate action, 2025 EV dataset, public charging, fast charging, location intelligence, energy transition, zero emissions, clean transport
Facebook
TwitterBy data.world's Admin [source]
This dataset contains an aggregation of birth data from the United Statesbetween 1985 and 2015. It consists of information on mothers' locations by state (including District of Columbia) and county, as well as information such as the month they gave birth, and aggregates giving the sum of births during that month. This data has been provided by both the National Bureau for Economic Research and National Center for Health Statistics, whose shared mission is to understand how life works in order to aid individuals in making decisions about their health and wellbeing. This dataset provides valuable insight into population trends across time and location - for example, which states have higher or lower birthrates than others? Which counties experience dramatic fluctuations over time? Given its scope, this dataset could be used in a number of contexts--from epidemiology research to population forecasting. Be sure to check out our other datasets related to births while you're here!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset could be used to examine local trends in birth rates over time or analyze births at different geographical locations. In order to maximize your use of this dataset, it is important that you understand what information the various columns contain.
The main columns are: State (including District of Columbia), County (coded using the FIPS county code number), Month (numbering from 1 for January through 12 for December), Year (4-digit year) countyBirths (calculated sum of births that occurred to mothers living in a county for a given month) and stateBirths (calculated sum of births that occurred to mothers living in a state for a given month). These fields should provide enough information for you analyze trends across geographic locations both at monthly and yearly levels. You could also consider combining variables such as
YearwithStateorYearwithMonthor any other grouping combinations depending on your analysis goal.In addition, while all data were downloaded on April 5th 2017, it is worth noting that all sources used followed privacy guidelines as laid out by NCHC so individual births occurring after 2005 are not included due to geolocation concerns.
We hope you find this dataset useful and can benefit from its content! With proper understanding of what each field contains, we are confident you will gain valuable insights on birth rates across counties within the United States during this period
- Establishing county-level trends in birth rates for the US over time.
- Analyzing the relationship between month of birth and health outcomes for US babies after they are born (e.g., infant mortality, neurological development, etc.).
- Comparing state/county-level differences in average numbers of twins born each year
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: allBirthData.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------| | State | The numerical order of the state where the mother lives. (Integer) | | Month | The month in which the birth took place. (Integer) | | Year | The year of the birth. (Integer) | | countyBirths | The calculated sum of births that occurred to mothers living in that county for that particular month. (Integer) | | stateBirths | The aggregate number at the level of entire states for any given month-year combination. (Integer) | | County | The county where the mother lives, coded using FIPS County Code. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.
Facebook
TwitterThis dataset provides insights into the population distribution and income levels across counties in the United States, with a classification of counties as either "Urban" or "Rural." The data was sourced from the U.S. Census Bureau's 2023 American Community Survey (ACS).
Data Source:
B01003_001E: Total population.B19013_001E: Median household income.Processing:
Columns:
County: County name.State: State name.FIPS: Combined state and county FIPS code.State FIPS Code: State's Federal Information Processing Standard code.County FIPS Code: County's FIPS code.Total Population: Total population of the county.Median Household Income: Median household income for the county.Urban-Rural: Classification based on population (Urban or Rural).This dataset can be used for: - Urban vs. rural demographic and economic analysis. - Income distribution studies. - Data visualization and mapping using FIPS codes.
This dataset is provided under the public domain. Proper attribution to the U.S. Census Bureau is appreciated.
Facebook
TwitterA dataset which I collected from: https://safetydata.fra.dot.gov/officeofsafety/publicsite/on_the_fly_download.aspx I did not find a year range to download the data from so I have downloaded it manually and imported it to a database and exported the full 1975-2021 year .csv file
I have also uploaded a .csv with changed states because the original one uses state codes. So I have wrote a a python script which changes the coresponding state code to its state name: https://github.com/koenry/dataScience_Project_Railroad_Accidents/blob/main/code/changeStates.py
This data is provided by https://safetydata.fra.dot.gov/ and I do not own it. All I did is make my data analysis project little bit easier so I thought I would share this with everyone!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is monthly US unemployment rate data from January 2017 to November 2022. The datasets were curated from the Federal Reserve Economic Data that can be found here.
"The unemployment rate represents the number of unemployed as a percentage of the labour force. Labour force data are restricted to people 16 years of age and older, who currently reside in 1 of the 50 states or the District of Columbia, who do not reside in institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces." (Source: FRED website)
There are two datasets. The first dataset contains the total US unemployment rate and the second dataset contains unemployment rates per US state.
unemployment_rate_us.csv
unemployment_us - This is the total seasonally adjusted US unemployment rate in percent. You can find the data source here.
first_day_of_month - The date of the first day of the month.
unemployment_rates.csv
first_day_of_month - The date of the first day of the month.
state - The name of the state.
unemployment_rate - This is the seasonally adjusted unemployment rate per US state in percent. You can find the data source here.
Feel free to let me know if you have any open questions with regard to the dataset.
Happy data science! ;)
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The dataset consists of COVID-19 cases in Malaysia from 27 March 2020 to 15 April 2021. This dataset is collected for the purpose of creating better visualizations for the COVID-19 cases in Malaysia. All of the data is web scraped from https://kpkesihatan.com/ by using BeautifulSoup library.
The data is also available in GitHub, along with the scripts made to scrape the data. There is also a Web Application made to show the visualizations.
Originally I planned to update the data daily but I find that it seems too tedious for me to do this alone without some sort of automated scripts or schedulers. I have been wondering how to do this efficiently with automation or schedulers, if someone knows how to do this efficiently, please reach out to me by emailing or message in LinkedIn, the links can be found in my GitHub, thank you very much.
There are three CSV files and one GeoJSON file:
- all_2020-03-27_2021-04-15.csv: all daily cases excluding state data
- state_all.csv: all daily cases for each state
- state_cumu.csv: all daily cumulative cases for each state
- malaysia_state_province_boundary.geojson: Malaysia's GeoJSON map file
The columns consist of: 1. Date 2. Recovered 3. Cumulative Recovered 4. Imported Case (many NaN values till the end of 2020) 5. Local Case (many NaN values) 6. Active Case (many NaN values but can be inferred) 7. New Case 8. Cumulative Case 9. ICU - Number of patients admitted into Intensive Care Unit 10. Ventilator - Number of patients who need ventilator in ICU 11. Death 12. Cumulative Death 13. URL - link to the original webpage
Thanks to Info GIS MAP.com that provides Malaysia's GeoJSON file to create Choropleth maps.
Hopefully, there will be people utilizing the scripts or the data to create better visualizations.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Daily Travel data and number of people staying home and not staying home are estimated for the Bureau of Transportation Statistics by the Maryland Transportation Institute and Center for Advanced Transportation Technology Laboratory at the University of Maryland.
The daily travel estimates are from a mobile device data panel from merged multiple data sources that address the geographic and temporal sample variation issues often observed in a single data source. The merged data panel only includes mobile devices whose anonymized location data meet a set of data quality standards, which further ensures the overall data quality and consistency. The data quality standards consider both temporal frequency and spatial accuracy of anonymized location point observations, temporal coverage and representativeness at the device level, spatial representativeness at the sample and county level, etc. A multi-level weighting method that employs both device and trip-level weights expands the sample to the underlying population at the county and state levels, before travel statistics are computed.
These data are experimental and may not meet all of our quality standards. Experimental data products are created using new data sources or methodologies that benefit data users in the absence of other relevant products. We are seeking feedback from data users and stakeholders on the quality and usefulness of these new products. Experimental data products that meet our quality standards and demonstrate sufficient user demand may enter regular production if resources permit.
Data in the charts and graphs above is updated weekly on Mondays. The data lags one week behind the current date.
Data analysis is conducted at the aggregate national, state, and county levels. To assure confidentiality and support data quality, no data are reported for a county if it has fewer than 50 devices in the sample on any given day.
Trips are defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. A movement with multiple stays of longer than 10 minutes before returning home is counted as multiple trips.
1.Level : Indicates National, State, or County level metrics.
2.Date : The date when the data was recorded.
3.State FIPS : A two-digit code representing the FIPS state code.
4.State Postal Code : State postal code.
5.County FIPS : Five-digit FIPS county code.
6.County Name : County name.
7.Population Staying at Home : Number of residents staying at home, i.e., persons who make no trips with a trip end more than one mile away from home.
8.Population Not Staying at Home : Number of residents not staying at home.
9.Number of Trips : Number of trips made by residents, i.e., movements that include a stay of longer than 10 minutes at an anonymized location away from home.
10.Number of Trips <1 : Number of trips by residents shorter than one mile.
11.Number of Trips 1-3 : Number of trips by residents greater than one mile and shorter than 3 miles (1 ≤ trip distance < 3 miles).
12.Number of Trips 3-5 : Number of trips by residents greater than 3 miles and shorter than 5 miles (3 ≤ trip distance < 5 miles).
13.Number of Trips 5-10 : Number of trips by residents greater than 5 miles and shorter than 10 miles (5 ≤ trip distance < 10 miles).
14.Number of Trips 10-25 : Number of trips by residents greater than 10 miles and shorter than 25 miles (10 ≤ trip distance < 25 miles).
15.Number of Trips 25-50 : Number of trips by residents greater than 25 miles and shorter than 50 miles (25 ≤ trip distance < 50 miles).
16.Number of Trips 50-100 : Number of trips by residents greater than 50 miles and shorter than 100 miles (50 ≤ trip distance < 100 miles).
17.Number of Trips 100-250 : Number of trips by residents greater than 100 miles and shorter than 250 miles (100 ≤ trip distance < 250 miles).
18.Number of Trips 250-500 : Number of trips by residents greater than 250 miles and shorter than 500 miles (250 ≤ trip distance < 500 miles).
19.Number of Trips >=500 : Number of trips by residents greater than 500 miles (trip distance ≥ 500 miles).
20.Row ID : Unique row identifier.
21.Week : The week number corresponding to the recorded date.
22.Month : The month number corresponding to the recorded date.
If this was helpful, a vote is appreciated 😄!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Well, this dataset contains raw and cleaned data where we have added state code and used some lookup functions to clean this dataset. It includes 4 files where 2 are CSV and 1 is png and dashboard
Dataset link - https://www.kaggle.com/datasets/benroshan/ecommerce-data
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
CBSE is one of the two national level boards of education in India (alongwith CISCE). While CISCE is a private board, CBSE is public, central government run board. Every year, over 1 million students take the CBSE Class XII (12) board examination as a high school leaving examination in India (and several schools abroad).
The full repository on Github is at cbse_schools_data.
The sister repository to this, contains the cisce_schools_data.
As of 2018, there are 20,367 schools affiliated with the CBSE (out of which only 220 are
outside India).
The details of each of these schools can be fetched from the CBSE School Directory.
Here is an example URL endpoint of the school DPS RK Puram (aff_no = 2730017).
You can replace the affno parameter with any Affiliation number to see the original raw data.
The main contribution of this project is to scrape, parse, clean, document, dump and open the data for all of these schools. The scraping, parsing and cleaning code is not in this repository.
README_DATA_BASIC contains a protocol buffer like documentation for the basic data (in the basic/ folder). Lists each of the fields, including which ones are required and optional, the degree to which the optional ones are present, as well as the type and enum definitions of each field.README_DATA_DETAILED contains a protocol buffer like documentation for the detailed data (in the detailed/ folder). Lists each of the fields, including which ones are required and optional, the degree to which the optional ones are present, as well as the type and enum definitions of each field.README_DISTRICTS contains details of the district (alongwith state enums)basic/ The basic data containing the primary 25 fields.
analyze_csv.py reads the csv file in Python and prepares it for analysis.schools.csv the csv file - 6.1MB.analyze_pickle.py reads the pickle file in Python and prepares it for analysis.schools.p the pickle file - 9.8MB.detailed/ The detailed data containing the primary 25 fields and the 119 detailed fields for a total of 144 fields.
analyze_csv.py reads the csv file in Python and prepares it for analysis.schools_detailed.csv the detailed csv file - 12MB.analyze_pickle.py reads the pickle file in Python and prepares it for analysis.schools_detailed.p the detailed pickle file - 26.7MB.There are 25 total fields per school, a total of ~510k data points. For full documentation, see README_DATA_BASIC.
required string name School name in upper caserequired int32 aff_no Affiliation number, uniquerequired State state Indian State/Union Territory or "Foreign Schools"optional District district Indian District (or Country if state == FOREIGN SCHOOlS)required CbseRegion region One of the 10 CBSE regions this school is in the jurisdiction of.required string address Postal Addressoptional int32 pincode Indian pincodeoptional string ph_no Phone number (with STD Code). ';' Separated phone-numbers.optional string off_ph_no Office phone number. ';' Separated phone-numbers.optional string res_ph_no Residential phone number. ';' Separated phone-numbers.optional string fax_no Fax number. ';' Separated numbers.optional string email Email addressoptional string website Websiteoptional int32 year_found Year that the school was founded (between 1800 and 2018)optional Date date_opened Date that the school was opened (in form "Sep 9, 2010")optional string princi_name Name of the principal, upper caseoptional Sex sex Gender/sex of the school/principal (unclear?).optional int32 princi_qual Qualifications of the principaloptional int32 princi_exp_adm Number of years of administrative experience of the principaloptional int32 princi_exp_teach Number of years of teaching experience of the principalrequired Status status Status of the school - e.g. Middle Class, Secondary or Senior Secondaryoptional AffiliationType aff_type Affiliation Type e.g. Provisional, Permanentoptional Date aff_start Affiliation start date (in form "Sep 9, 2010")optional Date aff_end Affiliation end date (in form "Sep 19, 2011")optional string soc_name Name of Trust, Society or Managing Committee, upper caseThere are 144 tot...
Facebook
TwitterThe dataset consists of districts of India and their neighboring districts. The district is according to data from Covid19 data of India . A neighbor of a larger district is a combination of all the neighbors of its components. State code and district code are from vaccination data as their ids. Vaccination Data District name and their id is updated with a new name as per lastest change in India
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This CSV file contains detailed information about Sam's Club stores located in Mexico. Each row represents a single store, and the columns provide various attributes and details about the stores. Below is a description of the columns included in the CSV file
storeId - A unique identifier assigned to each store.name - The official name of the store.address1 - The street address of the store, including street number and name.city - The city where the store is located.state - The state within Mexico where the store is located.postalCode - The postal code of the store's location.phoneNumber - The contact phone number for the store.hours - The store's hours of operation, including opening and closing times.latitude - The geographical latitude of the store's location, useful for mapping.longitude - The geographical longitude of the store's location, useful for mapping. Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThis dataset was created by corochann