100+ datasets found

EPA FRS Facilities Combined File CSV Download for the State of Arkansas
catalog.data.gov
Updated Nov 29, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Environmental Information (OEI) - Office of Information Collection (OIC) (2020). EPA FRS Facilities Combined File CSV Download for the State of Arkansas [Dataset]. https://catalog.data.gov/dataset/epa-frs-facilities-combined-file-csv-download-for-the-state-of-arkansas
Explore at:
Dataset updated
Nov 29, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Arkansas
Description
The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.
Indian Latitude and Longitude
kaggle.com
Updated Jan 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anurag Peddi (2021). Indian Latitude and Longitude [Dataset]. https://www.kaggle.com/datasets/anurag1817/indian-latlong
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2021
Dataset provided by
Kaggle
Authors
Anurag Peddi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Area covered
India
Description
A Dataset which consists of the latitude and longitude information of the 29 Indian states.
EPA FRS Facilities Combined File CSV Download for the State of Texas
catalog.data.gov
Updated Nov 29, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Environmental Information (OEI) - Office of Information Collection (OIC) (2020). EPA FRS Facilities Combined File CSV Download for the State of Texas [Dataset]. https://catalog.data.gov/dataset/epa-frs-facilities-combined-file-csv-download-for-the-state-of-texas
Explore at:
Dataset updated
Nov 29, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Texas
Description
The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.
A
The New York Times Coronavirus (Covid-19) Cases and Deaths in the United...
data.amerigeoss.org
csv
Updated Mar 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UN Humanitarian Data Exchange (2023). The New York Times Coronavirus (Covid-19) Cases and Deaths in the United States [Dataset]. https://data.amerigeoss.org/sl/dataset/nyt-covid-19-data
Explore at:
csvAvailable download formats
Dataset updated
Mar 30, 2023
Dataset provided by
UN Humanitarian Data Exchange
Area covered
United States
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

United States Data

Data on cumulative coronavirus cases and deaths can be found in two files for states and counties.

Each row of data reports cumulative counts based on our best reporting up to the moment we publish an update. We do our best to revise earlier entries in the data when we receive new information.

Both files contain FIPS codes, a standard geographic identifier, to make it easier for an analyst to combine this data with other data sets like a map file or population data.

State-Level Data

State-level data can be found in the us-states.csv file.

date,state,fips,cases,deaths 2020-01-21,Washington,53,1,0 ...

County-Level Data

County-level data can be found in the us-counties.csv file.

date,county,state,fips,cases,deaths 2020-01-21,Snohomish,Washington,53061,1,0 ...

In some cases, the geographies where cases are reported do not map to standard county boundaries. See the list of geographic exceptions for more detail on these.

Github Repository

This dataset contains COVID-19 data for the United States of America made available by The New York Times on github at https://github.com/nytimes/covid-19-data
2015-2016 NSDUH State Estimates – Individual Excel and CSV Files by Outcome
catalog.data.gov
odgavaprod.ogopendata.com
Updated Sep 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). 2015-2016 NSDUH State Estimates – Individual Excel and CSV Files by Outcome [Dataset]. https://catalog.data.gov/dataset/2015-2016-nsduh-state-estimates-individual-excel-and-csv-files-by-outcome
Explore at:
Dataset updated
Sep 7, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Description
2015-2016 NSDUH State Estimates – Individual Excel and CSV Files by Outcome
US States Ranked by Population 2024
kaggle.com
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibrar Ali (2024). US States Ranked by Population 2024 [Dataset]. https://www.kaggle.com/datasets/dataanalyst001/us-states-ranked-by-population-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 4, 2024
Dataset provided by
Kaggle
Authors
Ibrar Ali
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
This dataset provides a detailed overview of the population statistics for each U.S. state for the years 2023 and 2024. It includes the population count, growth rate, percentage of the U.S. population, and population density per square mile.
All U.S State Of The Union Speeches (1790-2019)
kaggle.com
zip
Updated Sep 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jyron (2019). All U.S State Of The Union Speeches (1790-2019) [Dataset]. https://www.kaggle.com/datasets/jyronw/us-state-of-the-union-addresses-1790-2019
Explore at:
zip(3645771 bytes)Available download formats
Dataset updated
Sep 11, 2019
Authors
jyron
Area covered
United States
Description
Context

The State of the Union Address (S.O.T.U) is an annual message delivered by the President of the United States to a joint session of the United States Congress at the beginning of each calendar year in office. The message typically includes a budget message and an economic report of the nation, and also allows the President to propose a legislative agenda and national priorities.

Content

This dataset is a CSV file with columns President, Year, Title, and Text. The Text column contains a list of string formatted sentences comprised of the text of each S.O.T.U.

Acknowledgements

Thanks Wikidata! - Data sourced from wikidata pages: https://www.wikidata.org/w/index.php?title=Q28371311&oldid=992890506

Inspiration

How does Presidential Popularity relate to S.OT.U sentiment analysis for a given year?

How has the vocabulary of presidents changed throughout the 200+ year document history?

Determine the significant historical events occurring during a given year based on the address of that year, or of future/preceding years.
g
Coronavirus (Covid-19) Data in the United States
github.com
openicpsr.org
+2more
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
Explore at:
csvAvailable download formats
Dataset provided by
New York Times
License
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
h
united-states-license-plate-dataset
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). united-states-license-plate-dataset [Dataset]. https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset
Explore at:
Dataset updated
Jul 1, 2025
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Dataset of license plate recognition

Dataset offers 89,986 images of vehicles featuring license plates from the USA, making it an excellent resource for tasks involving OCR (Optical Character Recognition), license plate identification, and vehicle registration data extraction. Each image is accompanied by a CSV file that provides the corresponding plate text and country code, ideal for developing and testing text recognition systems. With this dataset, researchers and developers can… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/united-states-license-plate-dataset.
U
State Class Transition Spreadsheet (Area of Land Transition into Each Class...
data.usgs.gov
Updated Jun 24, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tamara Wilson; Elliott Matchett; Kristin Byrd; Erin Conlisk; Matthew Reiter; Lorraine Flint; Alan Flint; Monica Moritsch; Cynthia Wallace (2021). State Class Transition Spreadsheet (Area of Land Transition into Each Class per Year, per Scenario) [Dataset]. http://doi.org/10.5066/P9BSZM8R
Explore at:
Unique identifier
https://doi.org/10.5066/P9BSZM8R
Dataset updated
Jun 24, 2021
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Tamara Wilson; Elliott Matchett; Kristin Byrd; Erin Conlisk; Matthew Reiter; Lorraine Flint; Alan Flint; Monica Moritsch; Cynthia Wallace
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
2011 - 2101
Description
This spreadsheet dataset (.csv file) contains annual modeled output of land-use and land-cover change transitions in square kilometers (km2) by specified transition group, scenario, timestep, WEAP hydrologic zone, and 4 sub-regions within the broader California Central Valley, modeled using the LUCAS ST-SIM for the period 2011-2101 across 5 future scenarios. Four of the scenarios were developed as part of the Central Valley Landscape Conservation Project. The 4 original scenarios include a Bad-Business-As-Usual (BBAU; high water availability, poor management), California Dreamin’ (DREAM; high water availability, good management), Central Valley Dustbowl (DUST; low water availability, poor management), and Everyone Equally Miserable (EEM; low water availability, good management). These scenarios represent alternative plausible futures, capturing a range of climate variability, land management activities, and habitat restoration goals. We parameterized our models based on close inte ...
EPA FRS Facilities Single File CSV Download for the State of Wisconsin
catalog.data.gov
Updated Nov 29, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Environmental Information (OEI) - Office of Information Collection (OIC) (2020). EPA FRS Facilities Single File CSV Download for the State of Wisconsin [Dataset]. https://catalog.data.gov/dataset/epa-frs-facilities-single-file-csv-download-for-the-state-of-wisconsin
Explore at:
Dataset updated
Nov 29, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Wisconsin
Description
The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.
d
Data for: Of the first five US states with food waste bans, Massachusetts...
datadryad.org
search.dataone.org
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fiorentia Zoi Anglou; Robert Evan Sanders; Ioannis Stamatopoulos (2024). Data for: Of the first five US states with food waste bans, Massachusetts alone has reduced landfill waste [Dataset]. http://doi.org/10.5061/dryad.bzkh189h4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.bzkh189h4
Dataset updated
Aug 27, 2024
Dataset provided by
Dryad
Authors
Fiorentia Zoi Anglou; Robert Evan Sanders; Ioannis Stamatopoulos
Time period covered
Jan 4, 2024
Area covered
Massachusetts, United States
Description
The raw data for this paper have been received by individual states in PDF or Excel files. (For each state there might be several PDF or Excel files for each year.) In the data we uploaded on GitHub, we transferred these raw data (the various pdfs and excels) into a single CSV file and have created a standardized waste outcome---specifically, state-generated, municipal solid waste (MSW) disposal. In the README file, we include more details regarding all the other supporting data and code we have used.
DeepBase: A Deep Learning-based Daily Baseflow Data across the United States...
springernature.figshare.com
txt
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parnian Ghaneei; Hamid Moradkhani (2025). DeepBase: A Deep Learning-based Daily Baseflow Data across the United States [Dataset]. http://doi.org/10.6084/m9.figshare.27312927.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27312927.v1
Dataset updated
Jan 8, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Parnian Ghaneei; Hamid Moradkhani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Daily baseflow data, along with input datasets for 1661 basins for the hydrologica years from 1981 to 2022, can be downloaded in CSV format from the DeepBase repository on FigShare. The baseflow datafiles for the basins are zipped into archives named ‘Daily_Baseflow_Cluster[cluster_number].zip’, corresponding to their respective clusters. All the static inputs for 1661 basins are provided in a csv file named ‘Static_Inputs.csv’. The statistic attributes for the static inputs, calculated for each cluster, are provided in the file ‘14Clusters_statistics.csv’. All the dynamic forcings for 1661 basins are provided in csv files with the format of ‘Daymet_[basin_id].csv’ and are zipped into an archive named ‘Daily_DayMet_Forcings.zip’. The USGS gauge IDs of training basins (mentioned as gauged basins) are provided at ‘530basins_ids.txt’. The associated shapefiles for each cluster, including the polygons of the basins titled ‘DeepBase_Clusters.zip’ along with the PDF version of the cluster map titled ‘DeepBase_Clusters_map.pdf’ are accessible via the DeepBase repository.
g
Hierarchy of addresses RÚIAN data distributed by the country in the CSV...
geoportal.gov.cz
data.gov.cz
+1more
xml
Updated Sep 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Hierarchy of addresses RÚIAN data distributed by the country in the CSV format [Dataset]. https://geoportal.gov.cz/php/micka/record/basic/CZ-00025712-CUZK_SERIES-MD_RUIAN-CSV-HIE-ST?dlang=eng
Explore at:
xmlAvailable download formats
Dataset updated
Sep 19, 2025
Variables measured
https://www.slovnikcuzk.eu/termin.php?tid=3663&l=obec, https://www.slovnikcuzk.eu/termin.php?tid=4007&l=adresa, https://www.slovnikcuzk.eu/termin.php?tid=3930&l=cast-obce, http://inspire.ec.europa.eu/metadata-codelist/SpatialScope/national, https://www.slovnikcuzk.eu/termin.php?tid=2057&l=popisne-cislo-budovy, https://www.slovnikcuzk.eu/termin.php?tid=2022&l=evidencni-cislo-budovy, https://www.slovnikcuzk.eu/termin.php?tid=2050&l=orientacni-cislo-budovy, https://www.slovnikcuzk.eu/termin.php?tid=3782&l=uzemne-spravni-jednotka, https://www.slovnikcuzk.eu/termin.php?tid=1236&l=definicni-bod--reprezentacni-bod--centroid
Description
Dataset contains information on relationship between selected territorial elements and units of territorial registration. Data is specified in seven CSV files for the whole Czech Republic. File adresni-mista-vazby-cr.csv contains links of address points to the following elements – street, municipality part, town district (MOMC), Prague city district (MOP), town district of Prague (SPRAVOBV), municipality, municipality with an authorized municipal office (POU), municipality with extended competence (ORP), higher territorial self-governing entity (VÚSC) and election district (VO). File vazby-cr.csv contains links between elements municipality part, municipality, POU, ORP, VUSC, cohesion region (REGSOUDR) up to the element of state. File vazby-hlm-praha.csv contains modularity of elements in the city of Prague: MOMC, SPRAVOBV, municipality, POU, ORP, VUSC, REGSOUDR and state. File vazby-katastr-uzemi-cr.csv contains modularity of basic urban units (ZSJ) into cadastral units (KATUZ) and municipalities. File vazby-momc-statutarni-mesta.csv contains modularity of territorial elements in territorialy structured statutory cities: MOMC, MOP, obec, POU, ORP, VUSC, REGSOUDR and state. File vazby-okresy-cr.csv contains links between elements of municipality part, municipality, county, region (old – defined in 1960) and state. File vazby-ulice-obce-s-ulicni-siti.csv contains links of streets to the municipality. Dataset is provided as Open Data (licence CC-BY 4.0). Data is based on RÚIAN (Register of Territorial Identification, Addresses and Real Estates). Files are created during the first day of each month with data valid to the last day of previous month. The whole dataset is compressed (ZIP) for downloading. More in the Act No. 111/2009 Coll., on the Basic Registers, in Decree No. 359/2011 Coll., on the Basic Register of Territorial Identification, Addresses and Real Estates.
o
HarDWR - Raw Water Rights Records
osti.gov
Updated Oct 31, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caccese, Robert; Fisher-Vanden, Karen; Fowler, Lara; Grogan, Danielle; Lammers, Richard; Lisk, Matthew; Olmstead, Sheila; Peklak, Darrah; Zheng, Jiameng; Zuidema, Shan (2020). HarDWR - Raw Water Rights Records [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2475305
Explore at:
Dataset updated
Oct 31, 2020
Dataset provided by
USDOE Office of Science (SC), Biological and Environmental Research (BER)
MultiSector Dynamics - Living, Intuitive, Value-adding, Environment
Authors
Caccese, Robert; Fisher-Vanden, Karen; Fowler, Lara; Grogan, Danielle; Lammers, Richard; Lisk, Matthew; Olmstead, Sheila; Peklak, Darrah; Zheng, Jiameng; Zuidema, Shan
Description
A dataset within the Harmonized Database of Western U.S. Water Rights (HarDWR). For a detailed description of the database, please see the meta-record v2.0. Changelog v2.0 - Switched source data from collecting records from each state independently to using the WestDAAT dataset v1.0 - Initial public release Description In order to hold a water right in the western United States, an entity, (e.g., an individual, corporation, municipality, sovereign government, or non-profit) must register a physical document with the state's water regulatory agency. State water agencies each maintain their own database containing all registered water right documents within the state, along with relevant metadata such as the point of diversion and place of use of the water. All western U.S. states have digitized their individual water rights databases, as well as geospatial data defining the areas in which water rights are managed. Each state maintains and provides their own water rights data in accordance with individual state regulations and standards. In addition, while all states make their water rights publicly available, each provides their records in unique formats, meaning that file types, field availability, and terms vary from state to state. This leads to additional challenges to managing resources which crossmore » state lines, or conducting consistent multi-state water analyses. For the first version of HarDWR, we collected the water rights databases from 11 Western States of the United States. In order to preform regional analyses with the collected data, the raw records had to be harmonized into one single format. The Water Data Exchange (WaDE) is a program dedicated to the sharing of water-related data for the Western U.S. in a singular consistent format. Created by the Western States Water Council (WSWC) to facilitate the collection and dissemination of water data among WSWC's member states and the public, WaDE provides an important service for those interested in water resource planning and management in their focus region. Of the services which WaDE provides, the one of the most interesting is the WestDAAT dataset, which is a collection of water rights data provided by the 18 WSWC member states that have been standardized into a single format, much like we had done on a more limited scale with HarDWR v1. For this version of HarDWR we decided to use WestDAAT, specifically a snapshot created in Feburary 2024, as our water rights source data. A full explanation of the benefits gained from this switch can be found in the description of the updated Harmonized Water Rights Records v2.0, but in short it has allowed us to focus more of our efforts on answering research questions and gaining a more realistic understanding of how water rights are allocated. For more information on how the data for WestDAAT was collected, please see the WaDE data summary. Terms of Use While WaDE works directly with the state agencies to collect and standardize the water rights records, the ultimate authority for the water rights data remains the individual states. Each state, and their respective water right authorities, have made their water right records available for non-commercial reference uses. In addition, the states make no guarantees as to the completeness, accuracy, or timeliness of their respective databases, let alone the modifications which we, the authors of this paper, have made to the collected records. None of the states should be held liable for using this data outside of its intended use. As several of the states update their water rights databases daily, the information provided here is not the latest possible, and should not be used for legal purposes. WestDAAT itself has irregular updates. Additional questions about the data the source states provided should be directed to the respective state agencies (see methods.csv and organization.csv files described below). In addition, although data was presented here was not collected directly from the states, several states requested specifically worked disclaimers when sharing their data. These disclaimers are included here as an acknowledgement from where the water rights data is primarily sourced. Colorado: "The data made available here has been modified for use from its original source, which is the State of Colorado. THE STATE OF COLORADO MAKES NO REPRESENTATIONS OR WARRANTY AS TO THE COMPLETENESS, ACCURACY, TIMELINESS, OR CONTENT OF ANY DATA MADE AVAILABLE THROUGH THIS SITE. THE STATE OF COLORADO EXPRESSLY DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. The data is subject to change as modifications and updates are complete. It is understood that the information contained in the Web feed is being used at one's own risk." Montana: "The Montana State Library provides this product/service for informational purposes only. The Library did not produce it for, nor is it suitable for legal, engineering, or surveying purposes. Consumers of this information should review or consult the primary data and information sources to ascertain the viability of the information for their purposes. The Library provides these data in good faith but does not represent or warrant its accuracy, adequacy, or completeness. In no event shall the Library be liable for any incorrect results or analysis; any direct, indirect, special, or consequential damages to any party; or any lost profits arising out of or in connection with the use or the inability to use the data or the services provided. The Library makes these data and services available as a convenience to the public, and for no other purpose. The Library reserves the right to change or revise published data and/or services at any time." Oregon: "This product is for informational purposes and may not have been prepared for, or be suitable for legal, engineering, or surveying purposes. Users of this information should review or consult the primary data and information sources to ascertain the usability of the information." File Descriptions The unmodified February, 2024 WestDAAT snapshot is composed of nine files. Below is a brief description of each file, as well as how they were utilized for HarDWR. WaDEDataDictionaryTerms.xlsx: As the file's name implies, this is a data dictionary for all of the below named files. This file describes the column names for each of the following files, with the exception of citation.txt which does not have any columns. The descriptions for each file are divided by tab,with the same name as their associated file, within this document. allocationamount.csv: The "main" file of the group, it contains the water right records for each state. Of particular note, each water right is broken down into one or more water allocations. Allocations may be withdrawn from one or more locations, or even multiple allocations associated with a particular location. This is a more subtle and realistic representation of how water is used than what was available in the first version of HarDWR. For the records from some states, this can mean that multiple allocations listed under a single right will appear as rows within this file. citation.txt: A combination of contact information for WaDE personnel, disclaimer about how the data should be used, and guidelines for citing WestDAAT. methods.csv: A file describing the source and method by which WaDE collected water rights data from each state. organization.csv: A file listing the water rights authoritative agencies for each state. sites.csv: This file provides the geographic, and other descriptors, of the physical location of allocations, called 'sites'. To reiterate, it is possible for one allocation to be associated with multiple sites, as well as one site to be associated with multiple allocations. The two descriptors which we were most interested in where the site's coordinates, as well as whether the site was classified as a Point of Diversion (POD) or a Place of Use (POU). As a general rule, PODs are geographic points, while POUs are areas typically represented as property boundaries or irregularly shaped polygons. sites_pouGeometry.csv: For those allocations with a POU site, this file contains the defining points for the associated polygons. variables.csv: A file describing the units in which an allocation's water amount is reported within WestDAAT. This information is essentially a repeat of the 'AllocationFlow_CFS' and 'AllocationVolume_AF' columns within allocationamount.csv, at least for our purposes. watersources: This file describes the source of water from which each site extracts from. For our purposes, this table was used to determine whether the water came from Surface Water, Groundwater, or Unspecified Water.« less
US Department of Veterans Affairs - State Summary_Connecticut
datalumos.org
delimited
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Veterans Affairs (2025). US Department of Veterans Affairs - State Summary_Connecticut [Dataset]. http://doi.org/10.3886/E228163V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E228163V1
Dataset updated
Apr 29, 2025
Dataset authored and provided by
United States Department of Veterans Affairshttp://va.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2019 - 2021
Area covered
Connecticut
Description
Veteran data in .csv files. Includes population/demographic data of age distribution, period of service, income, and education. Also includes population projections. Compares Connecticut to national data.
d
Postal Codes Dataset for United States, US
datahub.io
csv
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Postal Codes Dataset for United States, US [Dataset]. https://datahub.io/logistics/postal-codes-us
Explore at:
csvAvailable download formats
Dataset updated
Oct 1, 2024
Area covered
United States
Description
Postal Codes Dataset for United States, US including name of the city, town, or place, various administrative divisions and alternative city names.
Z
PIPr: A Dataset of Public Infrastructure as Code Programs
data.niaid.nih.gov
zenodo.org
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvaneschi, Guido (2023). PIPr: A Dataset of Public Infrastructure as Code Programs [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8262770
Explore at:
Dataset updated
Nov 28, 2023
Dataset provided by
Sokolowski, Daniel
Spielmann, David
Salvaneschi, Guido
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0. Contents:

metadata.zip: The dataset metadata and analysis results as CSV files. scripts-and-logs.zip: Scripts and logs of the dataset creation. LICENSE: The Open Data Commons Attribution License (ODC-By) v1.0 text. README.md: This document. redistributable-repositiories.zip: Shallow copies of the head state of all redistributable repositories with an IaC program. This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io. Metadata The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files. repositories.csv:

ID (integer): GitHub repository ID url (string): GitHub repository URL downloaded (boolean): Whether cloning the repository succeeded name (string): Repository name description (string): Repository description licenses (string, list of strings): Repository licenses redistributable (boolean): Whether the repository's licenses permit redistribution created (string, date & time): Time of the repository's creation updated (string, date & time): Time of the last update to the repository pushed (string, date & time): Time of the last push to the repository fork (boolean): Whether the repository is a fork forks (integer): Number of forks archive (boolean): Whether the repository is archived programs (string, list of strings): Project file path of each IaC program in the repository programs.csv:

ID (string): Project file path of the IaC program repository (integer): GitHub repository ID of the repository containing the IaC program directory (string): Path of the directory containing the IaC program's project file solution (string, enum): PL-IaC solution of the IaC program ("AWS CDK", "CDKTF", "Pulumi") language (string, enum): Programming language of the IaC program (enum values: "csharp", "go", "haskell", "java", "javascript", "python", "typescript", "yaml") name (string): IaC program name description (string): IaC program description runtime (string): Runtime string of the IaC program testing (string, list of enum): Testing techniques of the IaC program (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") tests (string, list of strings): File paths of IaC program's tests testing-files.csv:

file (string): Testing file path language (string, enum): Programming language of the testing file (enum values: "csharp", "go", "java", "javascript", "python", "typescript") techniques (string, list of enum): Testing techniques used in the testing file (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") keywords (string, list of enum): Keywords found in the testing file (enum values: "/go/auto", "/testing/integration", "@AfterAll", "@BeforeAll", "@Test", "@aws-cdk", "@aws-cdk/assert", "@pulumi.runtime.test", "@pulumi/", "@pulumi/policy", "@pulumi/pulumi/automation", "Amazon.CDK", "Amazon.CDK.Assertions", "Assertions_", "HashiCorp.Cdktf", "IMocks", "Moq", "NUnit", "PolicyPack(", "ProgramTest", "Pulumi", "Pulumi.Automation", "PulumiTest", "ResourceValidationArgs", "ResourceValidationPolicy", "SnapshotTest()", "StackValidationPolicy", "Testing", "Testing_ToBeValidTerraform(", "ToBeValidTerraform(", "Verifier.Verify(", "WithMocks(", "[Fact]", "[TestClass]", "[TestFixture]", "[TestMethod]", "[Test]", "afterAll(", "assertions", "automation", "aws-cdk-lib", "aws-cdk-lib/assert", "aws_cdk", "aws_cdk.assertions", "awscdk", "beforeAll(", "cdktf", "com.pulumi", "def test_", "describe(", "github.com/aws/aws-cdk-go/awscdk", "github.com/hashicorp/terraform-cdk-go/cdktf", "github.com/pulumi/pulumi", "integration", "junit", "pulumi", "pulumi.runtime.setMocks(", "pulumi.runtime.set_mocks(", "pulumi_policy", "pytest", "setMocks(", "set_mocks(", "snapshot", "software.amazon.awscdk.assertions", "stretchr", "test(", "testing", "toBeValidTerraform(", "toMatchInlineSnapshot(", "toMatchSnapshot(", "to_be_valid_terraform(", "unittest", "withMocks(") program (string): Project file path of the testing file's IaC program Dataset Creation scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:

A list of all repositories with a PL-IaC program configuration file was created using search-repositories.py (documented below). The execution took two weeks due to the non-deterministic nature of GitHub's REST API, causing excessive retries. A shallow copy of the head of all repositories was downloaded using download-repositories.py (documented below). Using analysis.ipynb, the repositories were analyzed for the programs' metadata, including the used programming languages and licenses. Based on the analysis, all repositories with at least one IaC program and a redistributable license were packaged into redistributable-repositiories.zip, excluding any node_modules and .git directories. Searching Repositories The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:

Github access token. Name of the CSV output file. Filename to search for. File extensions to search for, separated by commas. Min file size for the search (for all files: 0). Max file size for the search or * for unlimited (for all files: *). Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/ AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup Limitations The script uses the GitHub code search API and inherits its limitations:

Only forks with more stars than the parent repository are included. Only the repositories' default branches are considered. Only files smaller than 384 KB are searchable. Only repositories with fewer than 500,000 files are considered. Only repositories that have had activity or have been returned in search results in the last year are considered. More details: https://docs.github.com/en/search-github/searching-on-github/searching-code The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api Downloading Repositories download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:

Name of the repositories CSV files generated through search-repositories.py, separated by commas. Output directory to download the repositories to. Name of the CSV output file. The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.
Metadata record for: A rasterized building footprint dataset for the United...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scientific Data Curation Team (2023). Metadata record for: A rasterized building footprint dataset for the United States [Dataset]. http://doi.org/10.6084/m9.figshare.12444776.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12444776.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Scientific Data Curation Team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
This dataset contains key characteristics about the data described in the Data Descriptor A rasterized building footprint dataset for the United States. Contents:

1. human readable metadata summary table in CSV format 2. machine readable metadata file in JSON format
AOI polygon fire statistics CSV files
nwcc-nrcs.hub.arcgis.com
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA NRCS ArcGIS Online (2024). AOI polygon fire statistics CSV files [Dataset]. https://nwcc-nrcs.hub.arcgis.com/datasets/6928853d9d7c450a84984d4c66f95e9c
Explore at:
Dataset updated
Nov 24, 2024
Dataset provided by
Natural Resources Conservation Servicehttp://www.nrcs.usda.gov/
United States Department of Agriculturehttp://usda.gov/
Authors
USDA NRCS ArcGIS Online
Description
Annual and time-period fire statistics in CSV format for the AOIs of the NWCC active forecast stations. The statistics are based on NIFC fire historical and current perimeters and MTBS burn severity data. This release contains NIFC data from 1996 to current (July 10, 2025) and MTBS data from 1996 to 2022. Annual statsitics were generated for the time period of 1996 to 2025. Time-period statistics were generated from 1998 to 2022 with a 5 years time interval. The time periods are: 2018-2022 (last 5 years), 2013-2022 (last 10 years), 2008-2022 (last 15 years), 2003-2022 (last 20 years), and 1998-2022 (last 25 years).

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. EPA Office of Environmental Information (OEI) - Office of Information Collection (OIC) (2020). EPA FRS Facilities Combined File CSV Download for the State of Arkansas [Dataset]. https://catalog.data.gov/dataset/epa-frs-facilities-combined-file-csv-download-for-the-state-of-arkansas

EPA FRS Facilities Combined File CSV Download for the State of Arkansas

Explore at:

Dataset updated

Nov 29, 2020

Dataset provided by

United States Environmental Protection Agencyhttp://www.epa.gov/

Area covered

Arkansas

Description

The Facility Registry System (FRS) identifies facilities, sites, or places subject to environmental regulation or of environmental interest to EPA programs or delegated states. Using vigorous verification and data management procedures, FRS integrates facility data from program national systems, state master facility records, tribal partners, and other federal agencies and provides the Agency with a centrally managed, single source of comprehensive and authoritative information on facilities.

Clear search

Close search

Google apps

Main menu

EPA FRS Facilities Combined File CSV Download for the State of Arkansas

Indian Latitude and Longitude

EPA FRS Facilities Combined File CSV Download for the State of Texas

The New York Times Coronavirus (Covid-19) Cases and Deaths in the United...

United States Data

State-Level Data

County-Level Data

Github Repository

2015-2016 NSDUH State Estimates – Individual Excel and CSV Files by Outcome

US States Ranked by Population 2024

All U.S State Of The Union Speeches (1790-2019)

Context

Content

Acknowledgements

Inspiration

Coronavirus (Covid-19) Data in the United States

united-states-license-plate-dataset

State Class Transition Spreadsheet (Area of Land Transition into Each Class...

EPA FRS Facilities Single File CSV Download for the State of Wisconsin

Data for: Of the first five US states with food waste bans, Massachusetts...

DeepBase: A Deep Learning-based Daily Baseflow Data across the United States...

Hierarchy of addresses RÚIAN data distributed by the country in the CSV...

HarDWR - Raw Water Rights Records

US Department of Veterans Affairs - State Summary_Connecticut

Postal Codes Dataset for United States, US

PIPr: A Dataset of Public Infrastructure as Code Programs

Metadata record for: A rasterized building footprint dataset for the United...

AOI polygon fire statistics CSV files

EPA FRS Facilities Combined File CSV Download for the State of ArkansasSee More Versions

EPA FRS Facilities Combined File CSV Download for the State of Arkansas