These geospatial data and their accompanying report outline many areas of coal in the United States beneath more than 3,000 ft of overburden. Based on depth, these areas may be targets for injection and storage of supercritical carbon dioxide. Additional areas where coal exists beneath more than 1,000 ft of overburden are also outlined; these may be targets for geologic storage of carbon dioxide in conjunction with enhanced coalbed methane production. These areas of deep coal were compiled as polygons into a shapefile for use in a geographic information system (GIS). The coal-bearing formation names, coal basin or field names, geographic provinces, coal ranks, coal geologic ages, and estimated individual coalbed thicknesses (if known) of the coal-bearing formations were included. An additional point shapefile, coal_co2_projects.shp, contains the locations of pilot projects for carbon dioxide injection into coalbeds. This report is not a comprehensive study of deep coal in the United States. Some areas of deep coal were excluded based on geologic or data-quality criteria, while others may be absent from the literature and still others may have been overlooked by the authors.
[1] Status is determined using the baseline, final, and target value. The statuses used in Healthy People 2020 were:
1 - Target met or exceeded—One of the following applies: (i) At baseline, the target was not met or exceeded, and the most recent value was equal to or exceeded the target. (The percentage of targeted change achieved was equal to or greater than 100%.); (ii) The baseline and most recent values were equal to or exceeded the target. (The percentage of targeted change achieved was not assessed.)
2 - Improved—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved 10% or more of the targeted change.
3 - Little or no detectable change—One of the following applies: (i) Movement was toward the target, standard errors were available, and the percentage of targeted change achieved was not statistically significant; (ii) Movement was toward the target, standard errors were not available, and the objective had achieved less than 10% of the targeted change; (iii) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was not statistically significant; (iv) Movement was away from the baseline and target, standard errors were not available, and the objective had moved less than 10% relative to the baseline; (v) No change was observed between the baseline and the final data point.
4 - Got worse—One of the following applies: (i) Movement was away from the baseline and target, standard errors were available, and the percent change relative to the baseline was statistically significant; (ii) Movement was away from the baseline and target, standard errors were not available, and the objective had moved 10% or more relative to the baseline.
5 - Baseline only—The objective only had one data point, so progress toward target attainment could not be assessed. Note that if additional data points did not meet the criteria for statistical reliability, data quality, or confidentiality, the objective was categorized as baseline only.
6 - Informational—A target was not set for this objective, so progress toward target attainment could not be assessed.
[2] The final value is generally based on data available on the Healthy People 2020 website as of January 2020. For objectives that are continuing into Healthy People 2030, more recent data are available on the Healthy People 2030 website: https://health.gov/healthypeople.
[3] For objectives that moved toward their targets, movement toward the target was measured as the percentage of targeted change achieved (unless the target was already met or exceeded at baseline):
Percentage of targeted change achieved = (Final value - Baseline value) / (HP2020 target - Baseline value) * 100
[4] For objectives that were not improving, did not meet or exceed their targets, and did not move towards their targets, movement away from the baseline was measured as the magnitude of the percent change from baseline:
Magnitude of percent change from baseline = |Final value - Baseline value| / Baseline value * 100
[5] Statistical significance was tested when the objective had a target, at least two data points (of unequal value), and available standard errors of the data. A normal distribution was assumed. All available digits were used to test statistical significance. Statistical significance of the percentage of targeted change achieved or the magnitude of the percentage change from baseline was assessed at the 0.05 level using a normal one-sided test.
[6] For more information on the Healthy People 2020 methodology for measuring progress toward target attainment and the elimination of health disparities, see: Healthy People Statistical Notes, no 27; available from: https://www.cdc.gov/nchs/data/sta
Facebook received 73,390 user data requests from federal agencies and courts in the United States during the second half of 2023. The social network produced some user data in 88.84 percent of requests from U.S. federal authorities. The United States accounts for the largest share of Facebook user data requests worldwide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of 22 data set of 50+ requirements each, expressed as user stories.
The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]
The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light
This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1
The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.
g02-federalspending.txt
(2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.
g03-loudoun.txt
(2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.
g04-recycling.txt
(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).
g05-openspending.txt
(2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.
g11-nsf.txt
(2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.
g08-frictionless.txt
(2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.
g14-datahub.txt
(2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.
g16-mis.txt
(2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.
g17-cask.txt
(2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.
g18-neurohub.txt
(2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.
g22-rdadmp.txt
(2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.
g23-archivesspace.txt
(2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, and
born digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its
This layer shows population broken down by race and Hispanic origin. This is shown by tract, county, and state centroids. This service is updated annually to contain the most currently released American Community Survey (ACS) 5-year data, and contains estimates and margins of error. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis.
Techsalerator’s Business Funding Data for North America is an extensive and insightful resource designed for businesses, investors, and financial analysts who need a deep understanding of the Asian funding landscape. This dataset meticulously captures and categorizes critical information about the funding activities of companies across the continent, providing valuable insights into the financial health and investment trends within various sectors.
What the Dataset Includes: Funding Rounds: Detailed records of funding rounds for companies in North America, including the size of the round, the date it occurred, and the stages of investment (Seed, Series A, Series B, etc.).
Investment Sources: Information on the sources of investment, such as venture capital firms, private equity investors, angel investors, and corporate investors.
Financial Milestones: Key financial achievements and benchmarks reached by companies, including valuation increases, revenue milestones, and profitability metrics.
Sector-Specific Data: Insights into how different sectors are performing, with data segmented by industry verticals such as technology, healthcare, finance, and consumer goods.
Geographic Breakdown: An overview of funding trends and activities specific to each North America country, allowing users to identify regional patterns and opportunities.
EU Countries Included in the Dataset: Antigua and Barbuda Bahamas Barbados Belize Canada Costa Rica Cuba Dominica Dominican Republic El Salvador Grenada Guatemala Haiti Honduras Jamaica Mexico Nicaragua Panama Saint Kitts and Nevis Saint Lucia Saint Vincent and the Grenadines Trinidad and Tobago United States
Benefits of the Dataset: Informed Decision-Making: Investors and analysts can use the data to make well-informed investment decisions by understanding funding trends and financial health across different regions and sectors. Strategic Planning: Businesses can leverage the insights to identify potential investors, benchmark against industry peers, and plan their funding strategies effectively. Market Analysis: The dataset helps in analyzing market dynamics, identifying emerging sectors, and spotting investment opportunities across North America. Techsalerator’s Business Funding Data for North America is a vital tool for anyone involved in the financial and investment sectors, offering a granular view of the funding landscape and enabling more strategic and data-driven decisions.
This description provides a more detailed view of what the dataset offers and highlights the relevance and benefits for various stakeholders.
The City Health Dashboard presents city- and/or census tract-level data for over 970 cities across the United States to describe population health within local contexts. Metrics included in the dashboard encompass five broad domains: health outcomes, social and economic factors, health behavior, physical environment, and clinical care.
The underlying data originates from a combination of publicly-available and private sources, including the U.S. Census Bureau, Centers for Disease Control, Environmental Protection Agency, Federal Bureau of Investigation, American Medical Association, ParkServe®, and Walk Score®.
An up-to-date list of all cities in the Dashboard may be found here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a compilation of processed data on citation and references for research papers including their author, institution and open access info for a selected sample of academics analysed using Microsoft Academic Graph (MAG) data and CORE. The data for this dataset was collected during December 2019 to January 2020.Six countries (Austria, Brazil, Germany, India, Portugal, United Kingdom and United States) were the focus of the six questions which make up this dataset. There is one csv file per country and per question (36 files in total). More details about the creation of this dataset are available on the public ON-MERRIT D3.1 deliverable report.The dataset is a combination of two different data sources, one part is a dataset created on analysing promotion policies across the target countries, while the second part is a set of data points available to understand the publishing behaviour. To facilitate the analysis the dataset is organised in the following seven folders:PRTThe dataset with the file name "PRT_policies.csv" contains the related information as this was extracted from promotion, review and tenure (PRT) policies. Q1: What % of papers coming from a university are Open Access?- Dataset Name format: oa_status_countryname_papers.csv- Dataset Contents: Open Access (OA) status of all papers of all the universities listed in Times Higher Education World University Rankings (THEWUR) for the given country. A paper is marked OA if there is at least an OA link available. OA links are collected using the CORE Discovery API.- Important considerations about this dataset: - Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. - The service we used to recognise if a paper is OA, CORE Discovery, does not contain entries for all paperids in MAG. This implies that some of the records in the dataset extracted will not have either a true or false value for the _is_OA_ field. - Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q2: How are papers, published by the selected universities, distributed across the three scientific disciplines of our choice?- Dataset Name format: fsid_countryname_papers.csv- Dataset Contents: For the given country, all papers for all the universities listed in THEWUR with the information of fieldofstudy they belong to.- Important considerations about this dataset: * MAG can associate a paper to multiple fieldofstudyid. If a paper belongs to more than one of our fieldofstudyid, separate records were created for the paper with each of those _fieldofstudyid_s.- MAG assigns fieldofstudyid to every paper with a score. We preserve only those records whose score is more than 0.5 for any fieldofstudyid it belongs to.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Q3: What is the gender distribution in authorship of papers published by the universities?- Dataset Name format: author_gender_countryname_papers.csv- Dataset Contents: All papers with their author names for all the universities listed in THEWUR.- Important considerations about this dataset :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- An external script was executed to determine the gender of the authors. The script is available here.Q4: Distribution of staff seniority (= number of years from their first publication until the last publication) in the given university.- Dataset Name format: author_ids_countryname_papers.csv- Dataset Contents: For a given country, all papers for authors with their publication year for all the universities listed in THEWUR.- Important considerations about this work :- When there are multiple collaborators(authors) for the same paper, this dataset makes sure that only the records for collaborators from within selected universities are preserved.- Calculating staff seniority can be achieved in various ways. The most straightforward option is to calculate it as _academic_age = MAX(year) - MIN(year) _for each authorid.Q5: Citation counts (incoming) for OA vs Non-OA papers published by the university.- Dataset Name format: cc_oa_countryname_papers.csv- Dataset Contents: OA status and OA links for all papers of all the universities listed in THEWUR and for each of those papers, count of incoming citations available in MAG.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to.- Only those records marked as true for _is_OA_ field can be said to be OA. Others with false or no value for is_OA field are unknown status (i.e. not necessarily closed access).Q6: Count of OA vs Non-OA references (outgoing) for all papers published by universities.- Dataset Name format: rc_oa_countryname_-papers.csv- Dataset Contents: Counts of all OA and unknown papers referenced by all papers published by all the universities listed in THEWUR.- Important considerations about this dataset :- CORE Discovery was used to establish the OA status of papers being referenced.- Papers with multiple authorship are preserved only once towards each of the distinct institutions their authors may belong to. Papers with authorship from multiple universities are counted once towards each of the universities concerned.Additional files:- _fieldsofstudy_mag_.csv: this file contains a dump of fieldsofstudy table of MAG mapping each of the ids to their actual field of study name.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.State and Local Government Employment and Payroll Data: U.S. and States: 2017 - 2024.Table ID.GOVSTIMESERIES.GS00EP01.Survey/Program.Public Sector.Year.2024.Dataset.PUB Public Sector Annual Surveys and Census of Governments.Source.U.S. Census Bureau, Public Sector.Release Date.2025-03-27.Release Schedule.The Annual Survey of Public Employment & Payroll occurs every year, except in Census years. Data are typically released yearly in the first quarter. There is approximately one year between the reference period and data release. Revisions to published data occur annually for the next two years. Census of Governments years, those ending in '2' and '7' may have slightly later releases due to extended processing time..Dataset Universe.Census of Governments - Organization (CG):The universe of this file is all federal, state, and local government units in the United States. In addition to the federal government and the 50 state governments, the Census Bureau recognizes five basic types of local governments. The government types are: County, Municipal, Township, Special District, and School District. Of these five types, three are categorized as General Purpose governments: County, municipal, and township governments are readily recognized and generally present no serious problem of classification. However, legislative provisions for school district and special district governments are diverse. These two types are categorized as Special Purpose governments. Numerous single-function and multiple-function districts, authorities, commissions, boards, and other entities, which have varying degrees of autonomy, exist in the United States. The basic pattern of these entities varies widely from state to state. Moreover, various classes of local governments within a particular state also differ in their characteristics. Refer to the Individual State Descriptions report for an overview of all government entities authorized by state.The Public Use File provides a listing of all independent government units, and dependent school districts active as of fiscal year ending June 30, 2024. The Annual Surveys of Public Employment & Payroll (EP) and State and Local Government Finances (LF):The target population consists of all 50 state governments, the District of Columbia, and a sample of local governmental units (counties, cities, townships, special districts, school districts). In years ending in '2' and '7' the entire universe is canvassed. In intervening years, a sample of the target population is surveyed. Additional details on sampling are available in the survey methodology descriptions for those years.The Annual Survey of Public Pensions (PP):The target population consists of state- and locally-administered defined benefit funds and systems of all 50 state governments, the District of Columbia, and a sample of local governmental units (counties, cities, townships, special districts, school districts). In years ending in '2' and '7' the entire universe is canvassed. In intervening years, a sample of the target population is surveyed. Additional details on sampling are available in the survey methodology descriptions for those years.The Annual Surveys of State Government Finance (SG) and State Government Tax Collections (TC):The target population consists of all 50 state governments. No local governments are included. For the purpose of Census Bureau statistics, the term "state government" refers not only to the executive, legislative, and judicial branches of a given state, but it also includes agencies, institutions, commissions, and public authorities that operate separately or somewhat autonomously from the central state government but where the state government maintains administrative or fiscal control over their activities as defined by the Census Bureau. Additional details are available in the survey methodology description.The Annual Survey of School System Finances (SS):The Annual Survey of School System Finances targets all public school systems providing elementary and/or secondary education in all 50 states and the District of Columbia..Methodology.Data Items and Other Identifying Records.Full-time and part-time employmentFull-time and part-time payrollPart-time hours worked (prior to 2019)Full-time equivalent employmentTotal full-time and part-time employmentTotal full-time and part-time payrollDefinitions can be found by clicking on the column header in the table or by accessing the Glossary.For detailed information, see Government Finance and Employment Classification Manual..Unit(s) of Observation.The basic reporting unit is the governmental unit, defined as an organized entity which in addition to having governmental character, has sufficient discretion in the management of its own affairs to distinguish it as separate from the administrative structure of any other governmental unit.The reporting units for the Annual Survey of School System Finances are public school sy...
Listing of SONYMA target areas by US Census Bureau Census Tract or Block Numbering Area (BNA). The State of New York Mortgage Agency (SONYMA) targets specific areas designated as ‘areas of chronic economic distress’ for its homeownership lending programs. Each state designates ‘areas of chronic economic distress’ with the approval of the US Secretary of Housing and Urban Development (HUD). SONYMA identifies its target areas using US Census Bureau census tracts and block numbering areas. Both census tracts and block numbering areas subdivide individual counties. SONYMA also relates each of its single-family mortgages to a specific census tract or block numbering area. New York State identifies ‘areas of chronic economic distress’ using census tract numbers. 26 US Code § 143 (current through Pub. L. 114-38) defines the criteria that the Secretary of Housing and Urban Development uses in approving designations of ‘areas of chronic economic distress’ as: i) the condition of the housing stock, including the age of the housing and the number of abandoned and substandard residential units, (ii) the need of area residents for owner-financing under this section, as indicated by low per capita income, a high percentage of families in poverty, a high number of welfare recipients, and high unemployment rates, (iii) the potential for use of owner-financing under this section to improve housing conditions in the area, and (iv) the existence of a housing assistance plan which provides a displacement program and a public improvements and services program. The US Census Bureau’s decennial census last took place in 2010 and will take place again in 2020. While the state designates ‘areas of chronic economic distress,’ the US Department of Housing and Urban Development must approve the designation. The designation takes place after the decennial census.
This dataset contains the cumulative estimated health benefits due to reductions in fine particulates or particulate matter less than 2.5 micrometers in diameter (PM2.5) from 2023 to 2050. The Mayor’s Office of Management and Budget (OMB) obtained the health events avoided values through collaboration with the NYC Health Department. The reductions in PM2.5 are the same reductions found in the "Forecasted Emissions and PM2.5 Reductions from City Actions" dataset. For any additional detail please refer to section 6 of the New York City Climate Budgeting Technical Appendices (https://www.nyc.gov/assets/omb/downloads/pdf/exec24-nyccbta.pdf). This dataset is going to be updated once a year during the Executive Budget.
You can find the complete collection of Climate Budget data by clicking here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As global emissions and temperatures continue to rise, global climate models offer projections as to how the climate will change in years to come. These model projections can be used for a variety of end-uses to better understand how current systems will be affected by the changing climate. While climate models predict every individual year, using a single year may not be representative as there may be outlier years. It can also be useful to represent a multi-year period with a single year of data. Both items are currently addressed when working with past weather data by a using Typical Meteorological Year (TMY)methodology. This methodology works by statistically selecting representative months from a number of years and appending these months to achieve a single representative year for a given period. In this analysis, the TMY methodology is used to develop Future Typical Meteorological Year (fTMY) using climate model projections. The resulting set of fTMY data is then formatted into EnergyPlus weather (epw) fi les that can be used for building simulation to estimate the impact of climate scenarios on the built environment.
This dataset contains the cross-climate-model version fTMY files for 3281 US Counties in the continental United States. The data for each county is derived from six different global climate models (GCMs) from the 6th Phase of Coupled Models Intercomparison Project CMIP6-ACCESSCM2, BCC-CSM2-MR, CNRM-ESM2-1, MPI-ESM1-2-HR, MRI-ESM2-0, NorESM2-MM. The six climate models were statistically downscaled for 1980–2014 in the historical period and 2015–2100 in the future period under the SSP585 scenario using the methodology described in Rastogi et al. (2022). Additionally, hourly data was derived from the daily downscaled output using the Mountain Microclimate Simulation Model (MTCLIM; Thornton and Running, 1999). The shared socioeconomic pathway (SSP) used for this analysis was SSP 5 and the representative concentration pathway (RCP) used was RCP 8.5. More information about SSP and RCP can be referred to O'Neill et al. (2020).
Please be aware that in cases where a location contains multiple .EPW files, it indicates that there are multiple weather data collection points within that location.
More information about the six selected CMIP6 GCMs:
ACCESS-CM2 -
http://dx.doi.org/10.1071/ES19040
BCC-CSM2-MR -
https://doi.org/10.5194/gmd-14-2977-2021
CNRM-ESM2-1-
https://doi.org/10.1029/2019MS001791
MPI-ESM1-2-HR -
https://doi.org/10.5194/gmd-12-3241-2019
MRI-ESM2-0 -
https://doi.org/10.2151/jmsj.2019-051
NorESM2-MM -
https://doi.org/10.5194/gmd-13-6165-2020
Additional references:
O'Neill, B. C., Carter, T. R., Ebi, K. et al. (2020). Achievements and Needs for the Climate Change Scenario Framework.
Nat. Clim. Chang. 10, 1074–1084 (2020). https://doi.org/10.1038/s41558-020-00952-0
Rastogi, D., Kao, S.-C., and Ashfaq, M. (2022). How May the Choice of Downscaling Techniques and Meteorological Reference Observations Affect Future Hydroclimate Projections? Earth's Future, 10, e2022EF002734. https://doi.org/10.1029/2022EF002734Thornton, P. E. and Running, S. W. (1999). An Improved Algorithm for Estimating Incident Daily Solar Radiation from Measurements of Temperature, Humidity and Precipitation, Agricultural and Forest Meteorology, 93, 211-228.
Please cite the following if this data is used in any research or project:
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New (2023). “Multi-Model Future Typical Meteorological (fTMY) Weather Files for nearly every US County.” The 3rd ACM International Workshop on Big Data and Machine Learning for Smart Buildings and Cities and BuildSys '23: The 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Istanbul, Turkey, November 15-16, 2023. DOI: 10.1145/3600100.3626637
Cross-Model Version:
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). " Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (Cross-Model Version-SSP1-RCP2.6)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10719204, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). " Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (Cross-Model Version-SSP2-RCP4.5)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10719178, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). " Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (Cross-Model Version-SSP3-RCP7.0)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10698921, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2023). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County (Cross-Model version-SSP5-RCP8.5)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10420668, Dec 2023. [Data]
Model-specific Version:
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (West and Midwest - SP1-RCP2.6)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10729277, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (East and South - SSP1-RCP2.6)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10729279, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (West and Midwest - SP2-RCP4.5)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10729223, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (East and South - SSP2-RCP4.5)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10729201, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (West and Midwest - SP3-RCP7.0)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10729157, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2024). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County in CONUS (East and South - SSP3-RCP7.0)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.10729199, Feb 2024. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2023). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County (East and South – SSP5-RCP8.5)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.8335814, Sept 2023. [Data]
Shovan Chowdhury, Fengqi Li, Avery Stubbings, Joshua R. New, Deeksha Rastogi, and Shih-Chieh Kao (2023). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation for every US County (West and Midwest – SSP5-RCP8.5)." ORNL internal Scientific and Technical Information (STI) report, doi: 10.5281/zenodo.8338548, Sept 2023. [Data]
Representative Cities Version:
Bass, Brett, New, Joshua R., Rastogi, Deeksha and Kao, Shih-Chieh (2022). "Future Typical Meteorological Year (fTMY) US Weather Files for Building Simulation (1.0) [Data set]." Zenodo, doi.org/10.5281/zenodo.6939750, Aug. 2022. [<a
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is the "development dataset" for the DCASE 2024 Challenge Task 2.
The data consists of the normal/anomalous operating sounds of seven types of real/toy machines. Each recording is a single-channel 10-second audio that includes both a machine's operating sound and environmental noise. The following seven types of real/toy machines are used in this task:
ToyCar
ToyTrain
Fan
Gearbox
Bearing
Slide rail
Valve
Overview of the task
Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.
This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.
Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.
Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.
Train a model for a completely new machine type For a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.
Train a model using a limited number of machines from its machine type While sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.
5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.
The last requirement is newly introduced in DCASE 2024 Task2.
Definition
We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".
"Machine type" indicates the type of machine, which in the development dataset is one of seven: fan, gearbox, bearing, slide rail, valve, ToyCar, and ToyTrain.
A section is defined as a subset of the dataset for calculating performance metrics.
The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.
Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.
Dataset
This dataset consists of seven machine types. For each machine type, one section is provided, and the section is a complete set of training and test data. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training, (ii) ten clips of normal sounds in the target domain for training, and (iii) 100 clips each of normal and anomalous sounds for the test. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.
File names and attribute csv files
File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:
[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...
For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.
Recording procedure
Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.
Directory structure
/dev_data
slider
means "slide rail") - /ToyCar - /ToyTrain - /valve Baseline system
The baseline system is available on the Github repository .The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.
Condition of use
This dataset was created jointly by Hitachi, Ltd. and NTT Corporation and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Citation
Contact
If there is any problem, please contact us:
Tomoya Nishida, tomoya.nishida.ax@hitachi.com
Keisuke Imoto, keisuke.imoto@ieee.org
Noboru Harada, noboru@ieee.org
Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp
Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unemployment Rate in the United States increased to 4.30 percent in August from 4.20 percent in July of 2025. This dataset provides the latest reported value for - United States Unemployment Rate - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
About the Dataset
This data set contains claims information for meal reimbursement for sites participating in CACFP as child centers for the program year 2024-2025. This includes Child Care Centers, At-Risk centers, Head Start sites, Outside School Hours sites, and Emergency Shelters . The CACFP program year begins October 1 and ends September 30.
This dataset only includes claims submitted by CACFP sites operating as child centers.Sites can participate in multiple CACFP sub-programs. Each record (row) represents monthly meals data for a single site and for a single CACFP center sub-program.
To filter data for a specific CACFP center Program, select "View Data" to open the Exploration Canvas filter tools. Select the program(s) of interest from the Program field. A filtering tutorial can be found HERE
For meals data on CACFP participants operating as Day Care Homes, Adult Day Care Centers, or child care centers for previous program years, please refer to the corresponding “Child and Adult Care Food Programs (CACFP) – Meal Reimbursement” dataset for that sub-program available on the State of Texas Open Data Portal.
An overview of all CACFP data available on the Texas Open Data Portal can be found at our TDA Data Overview - Child and Adult Care Food Programs page.
An overview of all TDA Food and Nutrition data available on the Texas Open Data Portal can be found at our TDA Data Overview - Food and Nutrition Open Data page.
More information about accessing and working with TDA data on the Texas Open Data Portal can be found on the SquareMeals.org website on the TDA Food and Nutrition Open Data page.
About Dataset Updates
TDA aims to post new program year data by December 15 of the active program year. Participants have 60 days to file monthly reimbursement claims. Dataset updates will occur daily until 90 days after the close of the program year. After 90 days from the close of the program year, the dataset will be updated at six months and one year from the close of the program year before becoming archived. Archived datasets will remain published but will not be updated. Any data posted during the active program year is subject to change.
About the Agency
The Texas Department of Agriculture administers 12 U.S. Department of Agriculture nutrition programs in Texas including the National School Lunch and School Breakfast Programs, the Child and Adult Care Food Programs (CACFP), and the summer meal programs. TDA’s Food and Nutrition division provides technical assistance and training resources to partners operating the programs and oversees the USDA reimbursements they receive to cover part of the cost associated with serving food in their facilities. By working to ensure these partners serve nutritious meals and snacks, the division adheres to its mission — Feeding the Hungry and Promoting Healthy Lifestyles.
For more information on these programs, please visit our website.
"
NSP3 funds authorized under the Dodd Frank Wall Street Reform and Consumer Protection Act (Dodd-Frank Act) of 2010 provides a third round of neighborhood stabilization grants to all states and select governments on a formula basis. Unlike NSP2, grantees are not constrained to census tract boundaries and can define their target areas more precisely to their intended target area.
Wildlife Restoration funds (manufacturer’s federal excise taxes) generated from the sale of firearms, ammunition, and archery equipment, support the construction, operation, and maintenance of over 850 public target ranges in the United States. This represents a significant investment in safe, structured environments where the public may participate in all kinds of target shooting. In the last six months, there are 9 new ranges being built as well as 8 ranges being upgraded or expanded.This dataset is a compilation of the firearms, archery, and combined (both firearms & archery) ranges that have received funding through the Wildlife Restoration Act. The data is provided by each Region every six months to fulfill a Director's Request.Contact:Elena Campbell (elena_campbell@fws.gov)
Target USA Cup is a premier youth soccer tournament in North America, hosting 1,200 teams from around the world. The organization is based in Blaine, Minnesota, and is committed to providing a unique and exciting experience for young soccer players. With a focus on competition, recreation, and development, Target USA Cup offers a range of activities and opportunities for players to grow their skills and passion for the game.
The tournament takes place over two weekends in July, featuring a variety of games, matches, and events for players of all ages and skill levels. Target USA Cup is supported by a range of sponsors, including Target, MNUFC, and Gatorade, among others. With its commitment to excellence and reputation for delivering a top-notch experience, Target USA Cup is a must-attend event for any young soccer player or fan.
MealMe provides comprehensive grocery and retail SKU-level product data, including real-time pricing, from the top 100 retailers in the USA and Canada. Our proprietary technology ensures accurate and up-to-date insights, empowering businesses to excel in competitive intelligence, pricing strategies, and market analysis.
Retailers Covered: MealMe’s database includes detailed SKU-level data and pricing from leading grocery and retail chains such as Walmart, Target, Costco, Kroger, Safeway, Publix, Whole Foods, Aldi, ShopRite, BJ’s Wholesale Club, Sprouts Farmers Market, Albertsons, Ralphs, Pavilions, Gelson’s, Vons, Shaw’s, Metro, and many more. Our coverage spans the most influential retailers across North America, ensuring businesses have the insights needed to stay competitive in dynamic markets.
Key Features: SKU-Level Granularity: Access detailed product-level data, including product descriptions, categories, brands, and variations. Real-Time Pricing: Monitor current pricing trends across major retailers for comprehensive market comparisons. Regional Insights: Analyze geographic price variations and inventory availability to identify trends and opportunities. Customizable Solutions: Tailored data delivery options to meet the specific needs of your business or industry. Use Cases: Competitive Intelligence: Gain visibility into pricing, product availability, and assortment strategies of top retailers like Walmart, Costco, and Target. Pricing Optimization: Use real-time data to create dynamic pricing models that respond to market conditions. Market Research: Identify trends, gaps, and consumer preferences by analyzing SKU-level data across leading retailers. Inventory Management: Streamline operations with accurate, real-time inventory availability. Retail Execution: Ensure on-shelf product availability and compliance with merchandising strategies. Industries Benefiting from Our Data CPG (Consumer Packaged Goods): Optimize product positioning, pricing, and distribution strategies. E-commerce Platforms: Enhance online catalogs with precise pricing and inventory information. Market Research Firms: Conduct detailed analyses to uncover industry trends and opportunities. Retailers: Benchmark against competitors like Kroger and Aldi to refine assortments and pricing. AI & Analytics Companies: Fuel predictive models and business intelligence with reliable SKU-level data. Data Delivery and Integration MealMe offers flexible integration options, including APIs and custom data exports, for seamless access to real-time data. Whether you need large-scale analysis or continuous updates, our solutions scale with your business needs.
Why Choose MealMe? Comprehensive Coverage: Data from the top 100 grocery and retail chains in North America, including Walmart, Target, and Costco. Real-Time Accuracy: Up-to-date pricing and product information ensures competitive edge. Customizable Insights: Tailored datasets align with your specific business objectives. Proven Expertise: Trusted by diverse industries for delivering actionable insights. MealMe empowers businesses to unlock their full potential with real-time, high-quality grocery and retail data. For more information or to schedule a demo, contact us today!
Lending Club offers peer-to-peer (P2P) loans through a technological platform for various personal finance purposes and is today one of the companies that dominate the US P2P lending market. The original dataset is publicly available on Kaggle and corresponds to all the loans issued by Lending Club between 2007 and 2018. The present version of the dataset is for constructing a granting model, that is, a model designed to make decisions on whether to grant a loan based on information available at the time of the loan application. Consequently, our dataset only has a selection of variables from the original one, which are the variables known at the moment the loan request is made. Furthermore, the target variable of a granting model represents the final status of the loan, that are "default" or "fully paid". Thus, we filtered out from the original dataset all the loans in transitory states. Our dataset comprises 1,347,681 records or obligations (approximately 60% of the original) and it was also cleaned for completeness and consistency (less than 1% of our dataset was filtered out).
TARGET VARIABLE
The dataset includes a target variable based on the final resolution of the credit: the default category corresponds to the event charged off and the non-default category to the event fully paid. It does not consider other values in the loan status variable since this variable represents the state of the loan at the end of the considered time window. Thus, there are no loans in transitory states. The original dataset includes the target variable “loan status”, which contains several categories ('Fully Paid', 'Current', 'Charged Off', 'In Grace Period', 'Late (31-120 days)', 'Late (16-30 days)', 'Default'). However, in our dataset, we just consider loans that are either “Fully Paid” or “Default” and transform this variable into a binary variable called “Default”, with a 0 for fully paid loans and a 1 for defaulted loans.
EXPLANATORY VARIABLES
The explanatory variables that we use correspond only to the information available at the time of the application. Variables such as the interest rate, grade, or subgrade are generated by the company as a result of a credit risk assessment process, so they were filtered out from the dataset as they must not be considered in risk models to predict the default in granting of credit.
FULL LIST OF VARIABLES
Loan identification variables:
id: Loan id (unique identifier).
issue_d: Month and year in which the loan was approved.
Quantitative variables:
revenue: Borrower's self-declared annual income during registration.
dti_n: Indebtedness ratio for obligations excluding mortgage. Monthly information. This ratio has been calculated considering the indebtedness of the whole group of applicants. It is estimated as the ratio calculated using the co-borrowers’ total payments on the total debt obligations divided by the co-borrowers’ combined monthly income.
loan_amnt: Amount of credit requested by the borrower.
fico_n: Defined between 300 and 850, reported by Fair Isaac Corporation as a risk measure based on historical credit information reported at the time of application. This value has been calculated as the average of the variables “fico_range_low” and “fico_range_high” in the original dataset.
experience_c: Binary variable that indicates whether the borrower is new to the entity. This variable is constructed from the credit date of the previous obligation in LC and the credit date of the current obligation; if the difference between dates is positive, it is not considered as a new experience with LC.
Categorical variables:
emp_length: Categorical variable with the employment length of the borrower (includes the no information category)
purpose: Credit purpose category for the loan request.
home_ownership_n: Homeownership status provided by the borrower in the registration process. Categories defined by LC: “mortgage”, “rent”, “own”, “other”, “any”, “none”. We merged the categories “other”, “any” and “none” as “other”.
addr_state: Borrower's residence state from the USA.
zip_code: Zip code of the borrower's residence.
Textual variables
title: Title of the credit request description provided by the borrower.
desc: Description of the credit request provided by the borrower.
We cleaned the textual variables. First, we removed all those descriptions that contained the default description provided by Lending Club on its web form (“Tell your story. What is your loan for?”). Moreover, we removed the prefix “Borrower added on DD/MM/YYYY >” from the descriptions to avoid any temporal background on them. Finally, as these descriptions came from a web form, we substituted all the HTML elements by their character (e.g. “&” was substituted by “&”, “<” was substituted by “<”, etc.).
RELATED WORKS
This dataset has been used in the following academic articles:
Sanz-Guerrero, M. Arroyo, J. (2024). Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P Lending. arXiv preprint arXiv:2401.16458. https://doi.org/10.48550/arXiv.2401.16458
Ariza-Garzón, M.J., Arroyo, J., Caparrini, A., Segovia-Vargas, M.J. (2020). Explainability of a machine learning granting scoring model in peer-to-peer lending. IEEE Access 8, 64873 - 64890. https://doi.org/10.1109/ACCESS.2020.2984412
These geospatial data and their accompanying report outline many areas of coal in the United States beneath more than 3,000 ft of overburden. Based on depth, these areas may be targets for injection and storage of supercritical carbon dioxide. Additional areas where coal exists beneath more than 1,000 ft of overburden are also outlined; these may be targets for geologic storage of carbon dioxide in conjunction with enhanced coalbed methane production. These areas of deep coal were compiled as polygons into a shapefile for use in a geographic information system (GIS). The coal-bearing formation names, coal basin or field names, geographic provinces, coal ranks, coal geologic ages, and estimated individual coalbed thicknesses (if known) of the coal-bearing formations were included. An additional point shapefile, coal_co2_projects.shp, contains the locations of pilot projects for carbon dioxide injection into coalbeds. This report is not a comprehensive study of deep coal in the United States. Some areas of deep coal were excluded based on geologic or data-quality criteria, while others may be absent from the literature and still others may have been overlooked by the authors.