5 datasets found

US state_trends.csv
kaggle.com
Updated Jan 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ANKITHA SRIDHAR (2024). US state_trends.csv [Dataset]. http://doi.org/10.34740/kaggle/dsv/7426536
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/7426536
Dataset updated
Jan 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ANKITHA SRIDHAR
Area covered
United States
Description
This dataset, named "state_trends.csv," contains information about different U.S. states. Let's break down the attributes and understand what each column represents:

state: The name of the U.S. state.

state_code: The two-letter postal code abbreviation for the state.

population: The population of the state.

sq_miles: The total land area of the state in square miles.

pop_density: Population density, which is the number of people per square mile.

region: The geographical region of the United States to which the state belongs (e.g., South, West).

psych_region: A description of the psychological region based on personality traits.

psy_reg: A shortened version of the psychological region.

extraversion: A measure of the state's population tendency toward extraversion.

agreeableness: A measure of the state's population tendency toward agreeableness.

conscientiousness: A measure of the state's population tendency toward conscientiousness.

neuroticism: A measure of the state's population tendency toward neuroticism.

openness: A measure of the state's population tendency toward openness.

data_science: A score related to the state's interest or proficiency in the field of data science.

artificial_intelligence: A score related to the state's interest or proficiency in artificial intelligence.

machine_learning: A score related to the state's interest or proficiency in machine learning.

data_analysis: A score related to the state's interest or proficiency in data analysis.

business_intelligence: A score related to the state's interest or proficiency in business intelligence.

spreadsheet: A score related to the state's interest or proficiency in spreadsheet usage.

statistics: A score related to the state's interest or proficiency in statistics.

art: A score related to the state's interest or involvement in the field of art.

dance: A score related to the state's interest or involvement in dance.

museum: A score related to the state's interest or presence of museums.

basketball: A score related to the state's interest or involvement in basketball.

football: A score related to the state's interest or involvement in football.

baseball: A score related to the state's interest or involvement in baseball.

soccer: A score related to the state's interest or involvement in soccer.

hockey: A score related to the state's interest or involvement in hockey.

has_nba: Indicates whether the state has a National Basketball Association (NBA) team (Yes/No).

has_nfl: Indicates whether the state has a National Football League (NFL) team (Yes/No).

has_mlb: Indicates whether the state has a Major League Baseball (MLB) team (Yes/No).

has_mls: Indicates whether the state has a Major League Soccer (MLS) team (Yes/No).

has_nhl: Indicates whether the state has a National Hockey League (NHL) team (Yes/No).

has_any: Indicates whether the state has any of the mentioned professional sports teams (Yes/No).

In summary, this dataset provides a variety of information about U.S. states, including demographic data, geographical region, psychological region, personality traits, and scores related to interests or proficiencies in various fields such as data science, art, and sports.
Metadata record for: A deep database of medical abbreviations and acronyms...
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scientific Data Curation Team (2023). Metadata record for: A deep database of medical abbreviations and acronyms for natural language processing [Dataset]. http://doi.org/10.6084/m9.figshare.14068949.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14068949.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Scientific Data Curation Team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset contains key characteristics about the data described in the Data Descriptor A deep database of medical abbreviations and acronyms for natural language processing. Contents:

1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON format
United States Baby Names Count
kaggle.com
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). United States Baby Names Count [Dataset]. https://www.kaggle.com/datasets/thedevastator/united-states-baby-names-count/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2023
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
United States Baby Names Count

United States Baby Names Dataset

By Amber Thomas [source]

About this dataset

The data is based on a complete sample of records on Social Security card applications as of March 2021 and is presented in three main files: baby-names-national.csv, baby-names-state.csv, and baby-names-territories.csv. These files contain detailed information about names given to babies at the national level (50 states and District of Columbia), state level (individual states), and territory level (including American Samoa, Guam, Northern Mariana Islands Puerto Rico and U.S. Virgin Islands) respectively.

Each entry in the dataset includes several key attributes such as state_abb or territory_code representing the abbreviation or code indicating the specific state or territory where the baby was born. The sex attribute denotes the gender of each baby – either male or female – while year represents the specific birth year when each baby was born.

Another important attribute is name which indicates given name selected for each individual newborn.The count attribute provides numerical data about how many babies received a particular name within a specific state/territory, gender combination for a given year.

It's also worth noting that all names included have at least two characters in length to ensure high data quality standards.

How to use the dataset

- Understanding the Columns

The dataset consists of multiple columns with specific information about each baby name entry. Here are the key columns in this dataset:

state_abb: The abbreviation of the state or territory where the baby was born.

sex: The gender of the baby.

year: The year in which the baby was born.

name: The given name of the baby.

count: The number of babies with a specific name born in a certain state, gender, and year.

- Exploring National Data

To analyze national trends or overall popularity across all states and years: a) Focus on baby-names-national.csv. b) Use columns like name, sex, year, and count to study trends over time.

- Analyzing State-Level Data

To examine specific states' data: a) Utilize baby-names-state.csv file. b) Filter data by desired states using state_abb column values. c) Combine analysis with other relevant attributes like gender, year, etc., for detailed insights.

- Understanding Territory Data

For insights into United States territories (American Samoa, Guam, Northern Mariana Islands, Puerto Rico, U.S Virgin Islands): a) Access informative data from baby-names-territories.csv. b) Analyze based on similar principles as state-level data but considering unique territory factors.

- Gender-Specific Analysis

You can study names' popularity specifically among males or females by filtering the data using the sex column. This will allow you to explore gender-specific naming trends and preferences.

- Identifying Regional Patterns

To identify naming patterns in specific regions: a) Analyze state-level or territory-level data. b) Look for variations in name popularity across different states or territories.

- Analyzing Name Popularity over Time

Track the popularity of specific names over time using the name, year, and count columns. This can help uncover trends, fluctuations, and changes in names' usage and popularity.

- Comparing Names and Variations

Use this

Research Ideas

Tracking Popularity Trends: This dataset can be used to analyze the popularity of baby names over time. By examining the count of babies with a specific name born in different years, trends and shifts in naming preferences can be identified.

Gender Analysis: The dataset includes information on the gender of each baby. It can be used to study gender patterns and differences in naming choices. For example, it would be possible to compare the frequency and popularity of certain names among males and females.

Regional Variations: With state abbreviations provided, it is possible to explore regional variations in baby naming trends within the United States. Researchers could examine how certain names are more popular or unique to specific states or territories, highlighting cultural or geographical factors that influence naming choices

Acknowledgements

If you use this dataset in your research, please credit the original a...
Data from: HarDWR - Harmonized Water Rights Records
osti.gov
Updated Apr 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDOE Office of Science (SC), Biological and Environmental Research (BER) (2024). HarDWR - Harmonized Water Rights Records [Dataset]. http://doi.org/10.57931/2341234
Explore at:
Unique identifier
https://doi.org/10.57931/2341234
Dataset updated
Apr 25, 2024
Dataset provided by
United States Department of Energyhttp://energy.gov/
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
Description
For a detailed description of the database of which this record is only one part, please see the HarDWR meta-record. Here we present a new dataset of western U.S. water rights records. This dataset provides consistent unique identifiers for each spatial unit of water management across the domain, unique identifiers for each water right record, and a consistent categorization scheme that puts each water right record into one of 7 broad use categories. These data were instrumental in conducting a study of the multi-sector dynamics of intersectoral water allocation changes through water markets (Grogan et al., in review). Specifically, the data were formatted for use as input to a process-based hydrologic model, WBM, with a water rights module (Grogan et al., in review). While this specific study motivated the development of the database presented here, U.S. west water management is a rich area of study (e.g., Anderson and Woosly, 2005; Tidwell, 2014; Null and Prudencio, 2016; Carney et al, 2021) so releasing this database publicly with documentation and usage notes will enable other researchers to do further work on water management in the U.S. west. The raw downloaded data for each state is described in Lisk et al. (in review), as well as here. The dataset is a series of various files organized by state sub-directories. The first two characters of each file name is the abbreviation for the state the in which the file contains data for. After the abbreviation is the text which describes the contents of the file. Here is each file type described in detail: XXFullHarmonizedRights.csv: A file of the combined groundwater and surface water records for each state. Essentially, this file is the merging of XXGroundwaterHarmonizedRights.csv and XXSurfaceWaterHarmonizedRights.csv by state. The column headers for each of this type of file are: state - The name of the state the data comes from. FIPS - The two-digit numeric state ID code. waterRightID - The unique identifying ID of the water right, the same identifier as its state uses. priorityDate - The priority date associated with the right. origWaterUse - The original stated water use(s) from the state. waterUse - The water use category under the unified use categories established here. source - Whether the right is for surface water or groundwater. basinNum - The alpha-numeric identifier of the WMA the record belongs to. CFS - The maximum flow of the allocation in cubic feet per second (ft3s-1). Arizona is unique among the states, as its surface and groundwater resources are managed with two different sets of boundaries. So, for Arizona, the basinNum column is missing and instead there are two columns: surBasinNum - The alpha-numeric identifier of the surface water WMA the record belongs to. grdBasinNum - The alpha-numeric identifier of the groundwater WMA the record belongs to. XXStatePOD.shp: A shapefile which identifies the location of the Points of Diversion for the state's water rights. It should be noted that not all water right records in XXFullHarmonizedRights.csv have coordinates, and therefore may be missing from this file. XXStatePOU.shp: A shapefile which contains the area(s) in which each water right is claimed to be used. Currently, only Idaho and Washington provided valid data to include within this file. XXGroundwaterHarmonizedRights.csv: A file which contains only harmonized groundwater rights collected from each state. See XXFullHarmonizedRights.csv for more details on how the data is formatted. XXSurfaceWaterHarmonizedRights.csv: A file which contains only harmonized surface water rights collected from each state. See XXFullHarmonizedRights.csv for more details on how the data is formatted. Additionally, one file, stateWMALabels.csv, is not stored within a sub-directory. While we have referred to the spatial boundaries that each state uses to manage its water resources as WMAs, this term is not shared across all states. This file lists the proper name for each boundary set, by state. For those whom may be interested in exploring our code more in depth, we are also making available an internal data file for convenience. The file is in .RData format and contains everything described above as well as some minor additional objects used within the code calculating the cumulative curves. For completeness, here is a detailed description of the various objects which can be found within the .RData file: states: A character vector containing the state names for those states in which data was collected for. More importantly, the index of the state name is also the index in which that state's data can be found in the various following list objects. For example, if California is the third index in this object, the data for California will also be in the third index for each accompanying list. rightsByState_ground: A list of data frames with the cleaned ground water rights collected from each state. This object holds the the data that is exported to created the xxGroundwaterHarmonizedRights.csv files. rightsByState_surface: A list of data frames with the cleaned surface water rights collected from each state. This object holds the the data that is exported to created the xxSurfaceWaterHarmonizedRights.csv files. fullRightsRecs: A list of the combined groundwater and surface water records for each state. This object holds the the data that is exported to created the xxFullHarmonizedRights.csv files. projProj: The spatial projection used for map creation in the beginning of the project. Specifically, the World Geodetic System (WGS84) as a coordinate reference system (CRS) string in PROJ.4 format. wmaStateLabel: The name and/or abbreviation for what each state legally calls their WMAs. h2oUseByState: A list of spatial polygon data frames which contain the area(s) in which each water right is claimed to be used. It should be noted that not all water right records have a listed area(s) of use in this object. Currently, only Idaho and Washington provided valid data to be included in this object. h2oDivByState: A list of spatial points data frames which identifies the location of the Point of Diversion for the state's water rights. It should be noted that not all water right records have a listed Point of Diversion in this object. spatialWMAByState: A list of spatial polygon data frames which contain the spatial WMA boundaries for each state. The only data contained within the table are identifiers for each polygon. It is worth reiterating that Arizona is the only state in which the surface and groundwater WMA boundaries are not the same. wmaIDByState: A list which contains the unique ID values of the WMAs for each state. plottingDim: A character vector used to inform mapping functions for internal map making. Each state is classified as either "tall" or "wide", to maximize space on a typical 8x11 page. The code related to the creation of this dataset can be viewed within HarDWR GitHub Repository/dataHarmonization.
z
Project Tycho Level 1 data: Counts of multiple diseases reported in UNITED...
zenodo.org
data.niaid.nih.gov
json, xml, zip
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Project Tycho Level 1 data: Counts of multiple diseases reported in UNITED STATES OF AMERICA, 1916-2011 [Dataset]. http://doi.org/10.5281/zenodo.12608992
Explore at:
zip, xml, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12608992
Dataset updated
Jul 1, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1916 - 2011
Area covered
United States
Description
Project Tycho data include counts of infectious disease cases or deaths per time interval. A count is equivalent to a data point. Project Tycho level 1 data include data counts that have been standardized for a specific, published, analysis. Standardization of level 1 data included representing various types of data counts into a common format and excluding data counts that are not required for the intended analysis. In addition, external data such as population data may have been integrated with disease data to derive rates or for other applications.
Version 1.0.0 of level 1 data includes counts at the state level for smallpox, polio, measles, mumps, rubella, hepatitis A, and whooping cough and at the city level for diphtheria. The time period of data varies per disease somewhere between 1916 and 2011. This version includes cases as well as incidence rates per 100,000 population based on historical population estimates. These data have been used by investigators at the University of Pittsburgh to estimate the impact of vaccination programs in the United States, published in the New England Journal of Medicine: http://www.nejm.org/doi/full/10.1056/NEJMms1215400. See this paper for additional methods and detail about the origin of level 1 version 1.0.0 data.
Level 1 version 1.0.0 data is represented in a CSV file with 7 columns:
epi_week: a six digit number that represents the year and epidemiological week for which disease cases or deaths were reported (yyyyww)
state: the two digit postal code state abbreviation that represents the state for which a count has been reported
loc: the name of a state or city for which a count has been reported, capitalized
loc_type: the type of location (STATE or CITY) for which a count has been reported
disease: the disease for which a count has been reported: HEPATITIS A, MEASLES, MUMPS, PERTUSSIS, POLIO, RUBELLA, SMALLPOX, or DIPHTHERIA
cases: the number of cases reported for the specified disease, epidemiological week, and location
incidence_per_100000: the number of cases per 100,000 people, computed using historical population counts for cities and states as reported by the US Census Bureau
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

ANKITHA SRIDHAR (2024). US state_trends.csv [Dataset]. http://doi.org/10.34740/kaggle/dsv/7426536

US state_trends.csv

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/7426536

Dataset updated

Jan 18, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

ANKITHA SRIDHAR

Area covered

United States

Description

This dataset, named "state_trends.csv," contains information about different U.S. states. Let's break down the attributes and understand what each column represents:

state: The name of the U.S. state.
state_code: The two-letter postal code abbreviation for the state.
population: The population of the state.
sq_miles: The total land area of the state in square miles.
pop_density: Population density, which is the number of people per square mile.
region: The geographical region of the United States to which the state belongs (e.g., South, West).
psych_region: A description of the psychological region based on personality traits.
psy_reg: A shortened version of the psychological region.
extraversion: A measure of the state's population tendency toward extraversion.
agreeableness: A measure of the state's population tendency toward agreeableness.
conscientiousness: A measure of the state's population tendency toward conscientiousness.
neuroticism: A measure of the state's population tendency toward neuroticism.
openness: A measure of the state's population tendency toward openness.
data_science: A score related to the state's interest or proficiency in the field of data science.
artificial_intelligence: A score related to the state's interest or proficiency in artificial intelligence.
machine_learning: A score related to the state's interest or proficiency in machine learning.
data_analysis: A score related to the state's interest or proficiency in data analysis.
business_intelligence: A score related to the state's interest or proficiency in business intelligence.
spreadsheet: A score related to the state's interest or proficiency in spreadsheet usage.
statistics: A score related to the state's interest or proficiency in statistics.
art: A score related to the state's interest or involvement in the field of art.
dance: A score related to the state's interest or involvement in dance.
museum: A score related to the state's interest or presence of museums.
basketball: A score related to the state's interest or involvement in basketball.
football: A score related to the state's interest or involvement in football.
baseball: A score related to the state's interest or involvement in baseball.
soccer: A score related to the state's interest or involvement in soccer.
hockey: A score related to the state's interest or involvement in hockey.
has_nba: Indicates whether the state has a National Basketball Association (NBA) team (Yes/No).
has_nfl: Indicates whether the state has a National Football League (NFL) team (Yes/No).
has_mlb: Indicates whether the state has a Major League Baseball (MLB) team (Yes/No).
has_mls: Indicates whether the state has a Major League Soccer (MLS) team (Yes/No).
has_nhl: Indicates whether the state has a National Hockey League (NHL) team (Yes/No).
has_any: Indicates whether the state has any of the mentioned professional sports teams (Yes/No).

In summary, this dataset provides a variety of information about U.S. states, including demographic data, geographical region, psychological region, personality traits, and scores related to interests or proficiencies in various fields such as data science, art, and sports.

Clear search

Close search

Google apps

Main menu

US state_trends.csv

Metadata record for: A deep database of medical abbreviations and acronyms...

United States Baby Names Count

United States Baby Names Count

United States Baby Names Dataset

About this dataset

How to use the dataset

- Understanding the Columns

- Exploring National Data

- Analyzing State-Level Data

- Understanding Territory Data

- Gender-Specific Analysis

- Identifying Regional Patterns

- Analyzing Name Popularity over Time

- Comparing Names and Variations

Research Ideas

Acknowledgements

Data from: HarDWR - Harmonized Water Rights Records

Project Tycho Level 1 data: Counts of multiple diseases reported in UNITED...

US state_trends.csv