81 datasets found

USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
data.amerigeoss.org
Updated May 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
May 5, 2022
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.
USA Names
console.cloud.google.com
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=de&inv=1&invt=Ab2mjA (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=de
Explore at:
Dataset updated
Jul 15, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
United States
Description
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
US state county name & codes
kaggle.com
Updated Jun 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VivekMangipudi (2017). US state county name & codes [Dataset]. https://www.kaggle.com/stansilas/us-state-county-name-codes/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2017
Dataset provided by
Kaggle
Authors
VivekMangipudi
Area covered
United States
Description
Context

There is no story behind this data.

These are just supplementary datasets which I plan on using for plotting county wise data on maps.. (in particular for using with my kernel : https://www.kaggle.com/stansilas/maps-are-beautiful-unemployment-is-not/)
As that data set didn't have the info I needed for plotting an interactive map using highcharter .

Content

Since I noticed that most demographic datasets here on Kaggle, either have state code, state name, or county name + state name but not all of it i.e county name, fips code, state name + state code.

Using these two datasets one can get any combination of state county codes etc.

States.csv has State name + code
US counties.csv has county wise data.

Acknowledgements

Picture : https://unsplash.com/search/usa-states?photo=-RO2DFPl7wE
Counties : https://www.census.gov/geo/reference/codes/cou.html
State :

Inspiration

Not Applicable.
Historic US Census - 1900
redivis.com
application/jsonl +7
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1900 [Dataset]. http://doi.org/10.57761/mez6-j880
Explore at:
arrow, spss, avro, sas, application/jsonl, csv, parquet, stataAvailable download formats
Unique identifier
https://doi.org/10.57761/mez6-j880
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Feb 1, 1900 - Dec 31, 1900
Area covered
United States
Description
Documentation

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Section 2

This dataset was created on 2020-01-10 22:51:40.810 by merging multiple datasets together. The source datasets for this version were:

IPUMS 1900 households: This dataset includes all households from the 1900 US census.

IPUMS 1900 persons: This dataset includes all individuals from the 1910 US census.

IPUMS 1900 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1900 datasets.

Section 3

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Census Data
catalog.data.gov
datadiscoverystudio.org
+3more
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
Explore at:
Dataset updated
Mar 1, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
US Census Demographic Data
kaggle.com
zip
Updated Mar 3, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MuonNeutrino (2019). US Census Demographic Data [Dataset]. https://www.kaggle.com/muonneutrino/us-census-demographic-data
Explore at:
zip(11110116 bytes)Available download formats
Dataset updated
Mar 3, 2019
Authors
MuonNeutrino
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset expands on my earlier New York City Census Data dataset. It includes data from the entire country instead of just New York City. The expanded data will allow for much more interesting analyses and will also be much more useful at supporting other data sets.

Content

The data here are taken from the DP03 and DP05 tables of the 2015 American Community Survey 5-year estimates. The full datasets and much more can be found at the American Factfinder website. Currently, I include two data files:

acs2015_census_tract_data.csv: Data for each census tract in the US, including DC and Puerto Rico.

acs2015_county_data.csv: Data for each county or county equivalent in the US, including DC and Puerto Rico.

The two files have the same structure, with just a small difference in the name of the id column. Counties are political subdivisions, and the boundaries of some have been set for centuries. Census tracts, however, are defined by the census bureau and will have a much more consistent size. A typical census tract has around 5000 or so residents.

The Census Bureau updates the estimates approximately every year. At least some of the 2016 data is already available, so I will likely update this in the near future.

Acknowledgements

The data here were collected by the US Census Bureau. As a product of the US federal government, this is not subject to copyright within the US.

Inspiration

There are many questions that we could try to answer with the data here. Can we predict things such as the state (classification) or household income (regression)? What kinds of clusters can we find in the data? What other datasets can be improved by the addition of census data?
Consumers who saw or heard of the movie “Call Me by Your Name” in the U.S....
statista.com
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Consumers who saw or heard of the movie “Call Me by Your Name” in the U.S. 2018 [Dataset]. https://www.statista.com/statistics/805600/public-awareness-call-me-by-your-name/
Explore at:
Dataset updated
Jan 5, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 25, 2018 - Feb 27, 2018
Area covered
United States
Description
The statistic presents data on the share of consumers who have seen, want to see, or at least heard of the movie “Call Me by Your Name” in the United States as of February 2018. During a survey, nine percent of respondents stated they wanted to see the movie “Call Me by Your Name”.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
w
Dataset of books called The American polity : the people and their...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The American polity : the people and their government [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+American+polity+%3A+the+people+and+their+government
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The American polity : the people and their government. It features 7 columns including author, publication date, language, and book publisher.
Popular White Last Names in the US
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Popular White Last Names in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-white-last-names-in-the-us/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
United States
Description
This dataset represents the popular last names in the United States for White.

🏥🏥US healthcare providers by cities 💊💊

kaggle.com

Updated Nov 1, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Shiv_D24Coder (2023). 🏥🏥US healthcare providers by cities 💊💊 [Dataset]. https://www.kaggle.com/datasets/shivd24coder/us-healthcare-providers-by-cities

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 1, 2023

Dataset provided by

Kaggle

Authors

Shiv_D24Coder

License

https://www.usa.gov/government-works/https://www.usa.gov/government-works/

Area covered

United States

Description

key Features

Column Name	Description
city_name	The name of the city where healthcare providers are located.
result_count	The count of healthcare providers in the city.
results	Details of healthcare providers in the city.
created_epoch	The epoch timestamp when the provider's information was created.
enumeration_type	The type of enumeration for the provider (e.g., NPI-1, NPI-2).
last_updated_epoch	The epoch timestamp when the provider's information was last updated.
number	The unique identifier for the healthcare provider.
addresses	Information about the provider's addresses, including mailing and location addresses.
country_code	The country code for the provider's address (e.g., US for the United States).
country_name	The country name for the provider's address.
address_purpose	The purpose of the address (e.g., MAILING, LOCATION).
address_type	The type of address (e.g., DOM - Domestic).
address_1	The first line of the provider's address.
address_2	The second line of the provider's address.
city	The city where the provider is located.
state	The state where the provider is located.
postal_code	The postal code or ZIP code for the provider's location.
telephone_number	The telephone number for the provider's contact.
practiceLocations	Details about the provider's practice locations.
basic	Basic information about the provider, including their name, credentials, and gender.
first_name	The first name of the healthcare provider.
last_name	The last name of the healthcare provider.
middle_name	The middle name of the healthcare provider.
credential	The credential of the healthcare provider (e.g., PT, DPT).
sole_proprietor	Indicates whether the provider is a sole proprietor (e.g., YES, NO).
gender	The gender of the healthcare provider (e.g., M, F).
enumeration_date	The date when the provider's enumeration was recorded.
last_updated	The date when the provider's information was last updated.
taxonomies	Information about the provider's taxonomies, including code, description, state, license, and primary designation.
identifiers	Additional identifiers for the healthcare provider.
endpoints	Information about communication endpoints for the provider.
other_names	Any other names associated with the healthcare provider.

How to use this Dataset

1. Healthcare Provider Analysis: This dataset can be used to perform in-depth analyses of healthcare providers across various cities. You can extract insights into the distribution of different types of healthcare professionals, their practice locations, and their specialties. This information is valuable for healthcare workforce planning and resource allocation.

2. Geospatial Mapping: Utilize the city names and addresses in the dataset to create geospatial visualizations. You can map the locations of healthcare providers in each city, helping stakeholders identify areas with potential shortages or surpluses of healthcare services.

3. Provider Directory Development: The dataset provides detailed information about healthcare providers, including their names, contact details, and credentials. You can use this data to build a comprehensive healthcare provider directory or search tool, helping patients and healthcare organizations find and connect with the right providers in their area.

If you find this dataset useful, give it an upvote – it's a small gesture that goes a long way! Thanks for your support. 😄

f
Distribution of first name and last name frequencies by country
figshare.com
xlsx
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21956795.v2
Dataset updated
Feb 2, 2023
Dataset provided by
figshare
Authors
Mike Thelwall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Distribution of first and last name frequencies of academic authors by country.

Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China
US Household Income Statistics
kaggle.com
Updated Apr 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Golden Oak Research Group (2018). US Household Income Statistics [Dataset]. https://www.kaggle.com/goldenoakresearch/us-household-income-stats-geo-locations/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Golden Oak Research Group
Area covered
United States
Description
New Upload:

Added +32,000 more locations. For information on data calculations please refer to the methodology pdf document. Information on how to calculate the data your self is also provided as well as how to buy data for $1.29 dollars.

What you get:

The database contains 32,000 records on US Household Income Statistics & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 348,893 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to up vote. Up vote right now, please. Enjoy!

Household & Geographic Statistics:

Mean Household Income (double)

Median Household Income (double)

Standard Deviation of Household Income (double)

Number of Households (double)

Square area of land at location (double)

Square area of water at location (double)

Geographic Location:

Longitude (double)

Latitude (double)

State Name (character)

State abbreviated (character)

State_Code (character)

County Name (character)

City Name (character)

Name of city, town, village or CPD (character)

Primary, Defines if the location is a track and block group.

Zip Code (character)

Area Code (character)

Abstract

The dataset originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.

License

Only proper citing is required please see the documentation for details. Have Fun!!!

Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.

Sources, don't have 2 dollars? Get the full information yourself!

2011-2015 ACS 5-Year Documentation was provided by the U.S. Census Reports. Retrieved August 2, 2017, from https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/5_year_by_state/

Found Errors?

Please tell us so we may provide you the most accurate data possible. You may reach us at: research_development@goldenoakresearch.com

for any questions you can reach me on at 585-626-2965

please note: it is my personal number and email is preferred

Check our data's accuracy: Census Fact Checker

Access all 348,893 location records and more:

Don't settle. Go big and win big. Optimize your potential. Overcome limitation and outperform expectation. Access all household income records on a scale roughly equivalent to a neighborhood, see link below:

Website: Golden Oak Research Kaggle Deals all databases $1.29 Limited time only

A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.
Popular Black Last Names in the US
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Popular Black Last Names in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-black-last-names-in-the-us/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
United States
Description
This dataset represents the popular last names in the United States for Black.

🏥 US Work-related injury

kaggle.com

Updated Aug 14, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

mexwell (2023). 🏥 US Work-related injury [Dataset]. https://www.kaggle.com/datasets/mexwell/us-work-related-injury

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 14, 2023

Dataset provided by

Kaggle

Authors

mexwell

License

http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

Area covered

United States

Description

The Occupational Safety and Health Administration (OSHA) collected work-related injury and illness data from employers within specific industry and employment size specifications from 2002 through 2011. This data collection is called the OSHA Data Initiative or ODI. The data provided is used by OSHA to calculate establishment specific injury and illness incidence rates. This searchable database contains a table with the name, address, industry, and associated Total Case Rate (TCR), Days Away, Restricted, and Transfer (DART) case rate, and the Days Away From Work (DAFWII) case rate for the establishments that provided OSHA with valid data for calendar years 2002 through 2011. This data has been sampled down from its original size to 4%. In addition, the original dataset only has data from a small portion of all private sector establishments in the United States (80,000 out of 7.5 million total establishments). Therefore, these data are not representative of all businesses and general conclusions pertaining to all US business should not be overdrawn. Data quality: While OSHA takes multiple steps to ensure the data collected is accurate, problems and errors invariably exist for a small percentage of establishments. OSHA does not believe the data for the establishments with the highest rates on this file are accurate in absolute terms. Efforts were made during the collection cycle to correct submission errors, however some remain unresolved. It would be a mistake to say establishments with the highest rates on this file are the ‘most dangerous’ or ‘worst’ establishments in the Nation. Rate Calculation: An incidence rate of injuries and illnesses is computed from the following formula: (Number of injuries and illnesses X 200,000) / Employee hours worked = Incidence rate. The Total Case Rate includes all cases recorded on the OSHA Form 300 (Column G + Column H + Column I + Column J). The Days Away/Restriced/Transfer includes cases recorded in Column H + Column I. The Days Away includes cases recorded in Column H. For further information on injury and illness incidence rates, please visit the Bureau of Labor Statistics’ webpage at http://www.bls.gov/iif/osheval.htm State Participation: Not all state plan states participate in the ODI. The following states did not participate in the 2010 ODI (collection of CY 2009 data), establishment data is not available for these states: Alaska; Oregon; Puerto Rico; South Carolina; Washington; Wyoming.

Data Dictionary

Key	List of...	Comment	Example Value
year	Integer	$MISSING_FIELD	`2002`
address.city	String	$MISSING_FIELD	`"Cherry Hill"`
address.state	String	$MISSING_FIELD	`"NJ"`
address.street	String	$MISSING_FIELD	`"100 Dobbs Ln Ste 102"`
address.zip	Integer	$MISSING_FIELD	`8034`
business.name	String	$MISSING_FIELD	`"United States Cold Storage"`
business.second name	String	$MISSING_FIELD	`"US Cold"`
industry.division	String	$MISSING_FIELD	`"Transportation, Communications, Electric, Gas, And Sanitary Services"`
industry.id	Integer	$MISSING_FIELD	`4222`
industry.label	String	$MISSING_FIELD	`"Refrigerated Warehousing and Storage"`
industry.major_group	String	$MISSING_FIELD	`"Motor Freight Transportation And Warehousing"`
statistics.days away	Float	$MISSING_FIELD	`0.0`
statistics.days away/restricted/transfer	Float	$MISSING_FIELD	`0.0`
statistics.total case rate	Float	$MISSING_FIELD	`0.0`

Acknowlegement

Original Data

CORGIS Dataset Project

Foto von National Cancer Institute auf Unsplash

o
Places - United States of America
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Places - United States of America [Dataset]. https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-place/
Explore at:
geojson, csv, json, excelAvailable download formats
Dataset updated
Jun 6, 2024
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Area covered
United States
Description
This dataset is part of the Geographical repository maintained by Opendatasoft. This dataset contains data for places and equivalent entities in United States of America.This layer both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. Processors and tools are using this data. Enhancements Add ISO 3166-3 codes. Simplify geometries to provide better performance across the services. Add administrative hierarchy.
w
Dataset of books called American Indian and African American people,...
workwithdata.com
Updated Apr 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called American Indian and African American people, communities, and interactions : an annotated bibliography [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=American+Indian+and+African+American+people%2C+communities%2C+and+interactions+%3A+an+annotated+bibliography
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is American Indian and African American people, communities, and interactions : an annotated bibliography. It features 7 columns including author, publication date, language, and book publisher.
United States COVID-19 Community Levels by County
healthdata.gov
data.virginia.gov
+1more
application/rdfxml +5
Updated Mar 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cdc.gov (2022). United States COVID-19 Community Levels by County [Dataset]. https://healthdata.gov/dataset/United-States-COVID-19-Community-Levels-by-County/nn5b-j5u9
Explore at:
application/rssxml, json, tsv, csv, xml, application/rdfxmlAvailable download formats
Dataset updated
Mar 8, 2022
Dataset provided by
data.cdc.gov
Area covered
United States
Description
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.

The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.

Using these data, the COVID-19 community level was classified as low, medium, or high.

COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.

For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

Archived Data Notes:

This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.

March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.

March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.

March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.

March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.

March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).

March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.

April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.

April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials t
Places
catalog.data.gov
datasets.ai
+1more
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau (USCB) (Point of Contact) (2024). Places [Dataset]. https://catalog.data.gov/dataset/places2
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
The Places dataset was published on August 31, 2022 from the United States Census Bureau (USCB) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2022, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census, but some CDPs were added or updated through the 2022 BAS as well.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datagov/usa-names

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Feb 12, 2019

Dataset provided by

Data.govhttps://data.gov/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Clear search

Close search

Google apps

Main menu

USA Name Data

Context

Content

Acknowledgements

Inspiration

Baby Names from Social Security Card Applications - National Data

USA Names

US state county name & codes

Context

Content

Acknowledgements

Inspiration

Historic US Census - 1900

Documentation

Section 2

Section 3

Census Data

US Census Demographic Data

Context

Content

Acknowledgements

Inspiration

Consumers who saw or heard of the movie “Call Me by Your Name” in the U.S....

Geonames - All Cities with a population > 1000

Dataset of books called The American polity : the people and their...

Popular White Last Names in the US

🏥🏥US healthcare providers by cities 💊💊

key Features

How to use this Dataset

Distribution of first name and last name frequencies by country

US Household Income Statistics

New Upload:

What you get:

Household & Geographic Statistics:

Geographic Location:

Abstract

License

Sources, don't have 2 dollars? Get the full information yourself!

Found Errors?

Access all 348,893 location records and more:

Popular Black Last Names in the US

🏥 US Work-related injury

Data Dictionary

Acknowlegement

Places - United States of America

Dataset of books called American Indian and African American people,...

United States COVID-19 Community Levels by County

Places

USA Name Data

USA Name Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration