The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
A Dataset which consists of the latitude and longitude information of the 29 Indian states.
The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Data on cumulative coronavirus cases and deaths can be found in two files for states and counties.
Each row of data reports cumulative counts based on our best reporting up to the moment we publish an update. We do our best to revise earlier entries in the data when we receive new information.
Both files contain FIPS codes, a standard geographic identifier, to make it easier for an analyst to combine this data with other data sets like a map file or population data.
State-level data can be found in the us-states.csv file.
date,state,fips,cases,deaths
2020-01-21,Washington,53,1,0
...
County-level data can be found in the us-counties.csv file.
date,county,state,fips,cases,deaths
2020-01-21,Snohomish,Washington,53061,1,0
...
In some cases, the geographies where cases are reported do not map to standard county boundaries. See the list of geographic exceptions for more detail on these.
This dataset contains COVID-19 data for the United States of America made available by The New York Times on github at https://github.com/nytimes/covid-19-data
2015-2016 NSDUH State Estimates – Individual Excel and CSV Files by Outcome
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a detailed overview of the population statistics for each U.S. state for the years 2023 and 2024. It includes the population count, growth rate, percentage of the U.S. population, and population density per square mile.
The State of the Union Address (S.O.T.U) is an annual message delivered by the President of the United States to a joint session of the United States Congress at the beginning of each calendar year in office. The message typically includes a budget message and an economic report of the nation, and also allows the President to propose a legislative agenda and national priorities.
This dataset is a CSV file with columns President, Year, Title, and Text. The Text column contains a list of string formatted sentences comprised of the text of each S.O.T.U.
Thanks Wikidata! - Data sourced from wikidata pages: https://www.wikidata.org/w/index.php?title=Q28371311&oldid=992890506
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset of license plate recognition
Dataset offers 89,986 images of vehicles featuring license plates from the USA, making it an excellent resource for tasks involving OCR (Optical Character Recognition), license plate identification, and vehicle registration data extraction. Each image is accompanied by a CSV file that provides the corresponding plate text and country code, ideal for developing and testing text recognition systems. With this dataset, researchers and developers can… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This spreadsheet dataset (.csv file) contains annual modeled output of land-use and land-cover change transitions in square kilometers (km2) by specified transition group, scenario, timestep, WEAP hydrologic zone, and 4 sub-regions within the broader California Central Valley, modeled using the LUCAS ST-SIM for the period 2011-2101 across 5 future scenarios. Four of the scenarios were developed as part of the Central Valley Landscape Conservation Project. The 4 original scenarios include a Bad-Business-As-Usual (BBAU; high water availability, poor management), California Dreamin’ (DREAM; high water availability, good management), Central Valley Dustbowl (DUST; low water availability, poor management), and Everyone Equally Miserable (EEM; low water availability, good management). These scenarios represent alternative plausible futures, capturing a range of climate variability, land management activities, and habitat restoration goals. We parameterized our models based on close inte ...
The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.
The raw data for this paper have been received by individual states in PDF or Excel files. (For each state there might be several PDF or Excel files for each year.) In the data we uploaded on GitHub, we transferred these raw data (the various pdfs and excels) into a single CSV file and have created a standardized waste outcome---specifically, state-generated, municipal solid waste (MSW) disposal. In the README file, we include more details regarding all the other supporting data and code we have used.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Daily baseflow data, along with input datasets for 1661 basins for the hydrologica years from 1981 to 2022, can be downloaded in CSV format from the DeepBase repository on FigShare. The baseflow datafiles for the basins are zipped into archives named ‘Daily_Baseflow_Cluster[cluster_number].zip’, corresponding to their respective clusters. All the static inputs for 1661 basins are provided in a csv file named ‘Static_Inputs.csv’. The statistic attributes for the static inputs, calculated for each cluster, are provided in the file ‘14Clusters_statistics.csv’. All the dynamic forcings for 1661 basins are provided in csv files with the format of ‘Daymet_[basin_id].csv’ and are zipped into an archive named ‘Daily_DayMet_Forcings.zip’. The USGS gauge IDs of training basins (mentioned as gauged basins) are provided at ‘530basins_ids.txt’. The associated shapefiles for each cluster, including the polygons of the basins titled ‘DeepBase_Clusters.zip’ along with the PDF version of the cluster map titled ‘DeepBase_Clusters_map.pdf’ are accessible via the DeepBase repository.
Dataset contains information on relationship between selected territorial elements and units of territorial registration. Data is specified in seven CSV files for the whole Czech Republic. File adresni-mista-vazby-cr.csv contains links of address points to the following elements – street, municipality part, town district (MOMC), Prague city district (MOP), town district of Prague (SPRAVOBV), municipality, municipality with an authorized municipal office (POU), municipality with extended competence (ORP), higher territorial self-governing entity (VÚSC) and election district (VO). File vazby-cr.csv contains links between elements municipality part, municipality, POU, ORP, VUSC, cohesion region (REGSOUDR) up to the element of state. File vazby-hlm-praha.csv contains modularity of elements in the city of Prague: MOMC, SPRAVOBV, municipality, POU, ORP, VUSC, REGSOUDR and state. File vazby-katastr-uzemi-cr.csv contains modularity of basic urban units (ZSJ) into cadastral units (KATUZ) and municipalities. File vazby-momc-statutarni-mesta.csv contains modularity of territorial elements in territorialy structured statutory cities: MOMC, MOP, obec, POU, ORP, VUSC, REGSOUDR and state. File vazby-okresy-cr.csv contains links between elements of municipality part, municipality, county, region (old – defined in 1960) and state. File vazby-ulice-obce-s-ulicni-siti.csv contains links of streets to the municipality. Dataset is provided as Open Data (licence CC-BY 4.0). Data is based on RÚIAN (Register of Territorial Identification, Addresses and Real Estates). Files are created during the first day of each month with data valid to the last day of previous month. The whole dataset is compressed (ZIP) for downloading. More in the Act No. 111/2009 Coll., on the Basic Registers, in Decree No. 359/2011 Coll., on the Basic Register of Territorial Identification, Addresses and Real Estates.
A dataset within the Harmonized Database of Western U.S. Water Rights (HarDWR). For a detailed description of the database, please see the meta-record v2.0. Changelog v2.0 - Switched source data from collecting records from each state independently to using the WestDAAT dataset v1.0 - Initial public release Description In order to hold a water right in the western United States, an entity, (e.g., an individual, corporation, municipality, sovereign government, or non-profit) must register a physical document with the state's water regulatory agency. State water agencies each maintain their own database containing all registered water right documents within the state, along with relevant metadata such as the point of diversion and place of use of the water. All western U.S. states have digitized their individual water rights databases, as well as geospatial data defining the areas in which water rights are managed. Each state maintains and provides their own water rights data in accordance with individual state regulations and standards. In addition, while all states make their water rights publicly available, each provides their records in unique formats, meaning that file types, field availability, and terms vary from state to state. This leads to additional challenges to managing resources which crossmore » state lines, or conducting consistent multi-state water analyses. For the first version of HarDWR, we collected the water rights databases from 11 Western States of the United States. In order to preform regional analyses with the collected data, the raw records had to be harmonized into one single format. The Water Data Exchange (WaDE) is a program dedicated to the sharing of water-related data for the Western U.S. in a singular consistent format. Created by the Western States Water Council (WSWC) to facilitate the collection and dissemination of water data among WSWC's member states and the public, WaDE provides an important service for those interested in water resource planning and management in their focus region. Of the services which WaDE provides, the one of the most interesting is the WestDAAT dataset, which is a collection of water rights data provided by the 18 WSWC member states that have been standardized into a single format, much like we had done on a more limited scale with HarDWR v1. For this version of HarDWR we decided to use WestDAAT, specifically a snapshot created in Feburary 2024, as our water rights source data. A full explanation of the benefits gained from this switch can be found in the description of the updated Harmonized Water Rights Records v2.0, but in short it has allowed us to focus more of our efforts on answering research questions and gaining a more realistic understanding of how water rights are allocated. For more information on how the data for WestDAAT was collected, please see the WaDE data summary. Terms of Use While WaDE works directly with the state agencies to collect and standardize the water rights records, the ultimate authority for the water rights data remains the individual states. Each state, and their respective water right authorities, have made their water right records available for non-commercial reference uses. In addition, the states make no guarantees as to the completeness, accuracy, or timeliness of their respective databases, let alone the modifications which we, the authors of this paper, have made to the collected records. None of the states should be held liable for using this data outside of its intended use. As several of the states update their water rights databases daily, the information provided here is not the latest possible, and should not be used for legal purposes. WestDAAT itself has irregular updates. Additional questions about the data the source states provided should be directed to the respective state agencies (see methods.csv and organization.csv files described below). In addition, although data was presented here was not collected directly from the states, several states requested specifically worked disclaimers when sharing their data. These disclaimers are included here as an acknowledgement from where the water rights data is primarily sourced. Colorado: "The data made available here has been modified for use from its original source, which is the State of Colorado. THE STATE OF COLORADO MAKES NO REPRESENTATIONS OR WARRANTY AS TO THE COMPLETENESS, ACCURACY, TIMELINESS, OR CONTENT OF ANY DATA MADE AVAILABLE THROUGH THIS SITE. THE STATE OF COLORADO EXPRESSLY DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the Web feed is being used at one's own risk." Montana: "The Montana State Library provides this product/service for informational purposes only. The Library did not produce it for, nor is it suitable for legal, engineering, or surveying purposes. Consumers of this information should review or consult the primary data and information sources to ascertain the viability of the information for their purposes. The Library provides these data in good faith but does not represent or warrant its accuracy, adequacy, or completeness. In no event shall the Library be liable for any incorrect results or analysis; any direct, indirect, special, or consequential damages to any party; or any lost profits arising out of or in connection with the use or the inability to use the data or the services provided. The Library makes these data and services available as a convenience to the public, and for no other purpose. The Library reserves the right to change or revise published data and/or services at any time." Oregon: "This product is for informational purposes and may not have been prepared for, or be suitable for legal, engineering, or surveying purposes. Users of this information should review or consult the primary data and information sources to ascertain the usability of the information." File Descriptions The unmodified February, 2024 WestDAAT snapshot is composed of nine files. Below is a brief description of each file, as well as how they were utilized for HarDWR. WaDEDataDictionaryTerms.xlsx: As the file's name implies, this is a data dictionary for all of the below named files. This file describes the column names for each of the following files, with the exception of citation.txt which does not have any columns. The descriptions for each file are divided by tab,with the same name as their associated file, within this document. allocationamount.csv: The "main" file of the group, it contains the water right records for each state. Of particular note, each water right is broken down into one or more water allocations. Allocations may be withdrawn from one or more locations, or even multiple allocations associated with a particular location. This is a more subtle and realistic representation of how water is used than what was available in the first version of HarDWR. For the records from some states, this can mean that multiple allocations listed under a single right will appear as rows within this file. citation.txt: A combination of contact information for WaDE personnel, disclaimer about how the data should be used, and guidelines for citing WestDAAT. methods.csv: A file describing the source and method by which WaDE collected water rights data from each state. organization.csv: A file listing the water rights authoritative agencies for each state. sites.csv: This file provides the geographic, and other descriptors, of the physical location of allocations, called 'sites'. To reiterate, it is possible for one allocation to be associated with multiple sites, as well as one site to be associated with multiple allocations. The two descriptors which we were most interested in where the site's coordinates, as well as whether the site was classified as a Point of Diversion (POD) or a Place of Use (POU). As a general rule, PODs are geographic points, while POUs are areas typically represented as property boundaries or irregularly shaped polygons. sites_pouGeometry.csv: For those allocations with a POU site, this file contains the defining points for the associated polygons. variables.csv: A file describing the units in which an allocation's water amount is reported within WestDAAT. This information is essentially a repeat of the 'AllocationFlow_CFS' and 'AllocationVolume_AF' columns within allocationamount.csv, at least for our purposes. watersources: This file describes the source of water from which each site extracts from. For our purposes, this table was used to determine whether the water came from Surface Water, Groundwater, or Unspecified Water.« less
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Veteran data in .csv files. Includes population/demographic data of age distribution, period of service, income, and education. Also includes population projections. Compares Connecticut to national data.
Postal Codes Dataset for United States, US including name of the city, town, or place, various administrative divisions and alternative city names.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0. Contents:
metadata.zip: The dataset metadata and analysis results as CSV files. scripts-and-logs.zip: Scripts and logs of the dataset creation. LICENSE: The Open Data Commons Attribution License (ODC-By) v1.0 text. README.md: This document. redistributable-repositiories.zip: Shallow copies of the head state of all redistributable repositories with an IaC program. This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io. Metadata The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files. repositories.csv:
ID (integer): GitHub repository ID url (string): GitHub repository URL downloaded (boolean): Whether cloning the repository succeeded name (string): Repository name description (string): Repository description licenses (string, list of strings): Repository licenses redistributable (boolean): Whether the repository's licenses permit redistribution created (string, date & time): Time of the repository's creation updated (string, date & time): Time of the last update to the repository pushed (string, date & time): Time of the last push to the repository fork (boolean): Whether the repository is a fork forks (integer): Number of forks archive (boolean): Whether the repository is archived programs (string, list of strings): Project file path of each IaC program in the repository programs.csv:
ID (string): Project file path of the IaC program repository (integer): GitHub repository ID of the repository containing the IaC program directory (string): Path of the directory containing the IaC program's project file solution (string, enum): PL-IaC solution of the IaC program ("AWS CDK", "CDKTF", "Pulumi") language (string, enum): Programming language of the IaC program (enum values: "csharp", "go", "haskell", "java", "javascript", "python", "typescript", "yaml") name (string): IaC program name description (string): IaC program description runtime (string): Runtime string of the IaC program testing (string, list of enum): Testing techniques of the IaC program (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") tests (string, list of strings): File paths of IaC program's tests testing-files.csv:
file (string): Testing file path language (string, enum): Programming language of the testing file (enum values: "csharp", "go", "java", "javascript", "python", "typescript") techniques (string, list of enum): Testing techniques used in the testing file (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") keywords (string, list of enum): Keywords found in the testing file (enum values: "/go/auto", "/testing/integration", "@AfterAll", "@BeforeAll", "@Test", "@aws-cdk", "@aws-cdk/assert", "@pulumi.runtime.test", "@pulumi/", "@pulumi/policy", "@pulumi/pulumi/automation", "Amazon.CDK", "Amazon.CDK.Assertions", "Assertions_", "HashiCorp.Cdktf", "IMocks", "Moq", "NUnit", "PolicyPack(", "ProgramTest", "Pulumi", "Pulumi.Automation", "PulumiTest", "ResourceValidationArgs", "ResourceValidationPolicy", "SnapshotTest()", "StackValidationPolicy", "Testing", "Testing_ToBeValidTerraform(", "ToBeValidTerraform(", "Verifier.Verify(", "WithMocks(", "[Fact]", "[TestClass]", "[TestFixture]", "[TestMethod]", "[Test]", "afterAll(", "assertions", "automation", "aws-cdk-lib", "aws-cdk-lib/assert", "aws_cdk", "aws_cdk.assertions", "awscdk", "beforeAll(", "cdktf", "com.pulumi", "def test_", "describe(", "github.com/aws/aws-cdk-go/awscdk", "github.com/hashicorp/terraform-cdk-go/cdktf", "github.com/pulumi/pulumi", "integration", "junit", "pulumi", "pulumi.runtime.setMocks(", "pulumi.runtime.set_mocks(", "pulumi_policy", "pytest", "setMocks(", "set_mocks(", "snapshot", "software.amazon.awscdk.assertions", "stretchr", "test(", "testing", "toBeValidTerraform(", "toMatchInlineSnapshot(", "toMatchSnapshot(", "to_be_valid_terraform(", "unittest", "withMocks(") program (string): Project file path of the testing file's IaC program Dataset Creation scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:
A list of all repositories with a PL-IaC program configuration file was created using search-repositories.py (documented below). The execution took two weeks due to the non-deterministic nature of GitHub's REST API, causing excessive retries. A shallow copy of the head of all repositories was downloaded using download-repositories.py (documented below). Using analysis.ipynb, the repositories were analyzed for the programs' metadata, including the used programming languages and licenses. Based on the analysis, all repositories with at least one IaC program and a redistributable license were packaged into redistributable-repositiories.zip, excluding any node_modules and .git directories. Searching Repositories The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:
Github access token. Name of the CSV output file. Filename to search for. File extensions to search for, separated by commas. Min file size for the search (for all files: 0). Max file size for the search or * for unlimited (for all files: *). Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/ AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup Limitations The script uses the GitHub code search API and inherits its limitations:
Only forks with more stars than the parent repository are included. Only the repositories' default branches are considered. Only files smaller than 384 KB are searchable. Only repositories with fewer than 500,000 files are considered. Only repositories that have had activity or have been returned in search results in the last year are considered. More details: https://docs.github.com/en/search-github/searching-on-github/searching-code The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api Downloading Repositories download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:
Name of the repositories CSV files generated through search-repositories.py, separated by commas. Output directory to download the repositories to. Name of the CSV output file. The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor A rasterized building footprint dataset for the United States. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Annual and time-period fire statistics in CSV format for the AOIs of the NWCC active forecast stations. The statistics are based on NIFC fire historical and current perimeters and MTBS burn severity data. This release contains NIFC data from 1996 to current (July 10, 2025) and MTBS data from 1996 to 2022. Annual statsitics were generated for the time period of 1996 to 2025. Time-period statistics were generated from 1998 to 2022 with a 5 years time interval. The time periods are: 2018-2022 (last 5 years), 2013-2022 (last 10 years), 2008-2022 (last 15 years), 2003-2022 (last 20 years), and 1998-2022 (last 25 years).
The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.