Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of 22 data set of 50+ requirements each, expressed as user stories.
The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]
The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light
This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1
The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.
g02-federalspending.txt
(2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.
g03-loudoun.txt
(2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.
g04-recycling.txt
(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).
g05-openspending.txt
(2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.
g11-nsf.txt
(2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.
g08-frictionless.txt
(2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.
g14-datahub.txt
(2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.
g16-mis.txt
(2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.
g17-cask.txt
(2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.
g18-neurohub.txt
(2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.
g22-rdadmp.txt
(2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.
g23-archivesspace.txt
(2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, and
born digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
GitHub page: https://github.com/soarsmu/NICHE
https://brightdata.com/licensehttps://brightdata.com/license
Use our constantly updated Walmart products dataset to get a complete snapshot of new products, categories, pricing, and consumer reviews. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases: Identify product inventory gaps and increased demand for certain products, analyze consumer sentiment and define a pricing strategy by locating similar products and categories among your competitors. The dataset includes all major data points: product, SKU, GTIN, currency,timestamp, price,a nd more. Get your Walmart dataset today!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Kaggle Datasets For Traffic is a dataset for object detection tasks - it contains Traffic Sign annotations for 8,122 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
A. SUMMARY This dataset is used to report on public dataset access and usage within the open data portal. Each row sums the amount of users who access a dataset each day, grouped by access type (API Read, Download, Page View, etc).
B. HOW THE DATASET IS CREATED This dataset is created by joining two internal analytics datasets generated by the SF Open Data Portal. We remove non-public information during the process.
C. UPDATE PROCESS This dataset is scheduled to update every 7 days via ETL.
D. HOW TO USE THIS DATASET This dataset can help you identify stale datasets, highlight the most popular datasets and calculate other metrics around the performance and usage in the open data portal.
Please note a special call-out for two fields: - "derived": This field shows if an asset is an original source (derived = "False") or if it is made from another asset though filtering (derived = "True"). Essentially, if it is derived from another source or not. - "provenance": This field shows if an asset is "official" (created by someone in the city of San Francisco) or "community" (created by a member of the community, not official). All community assets are derived as members of the community cannot add data to the open data portal.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time-coincident load, wind, and solar data including actual and probabilistic forecast datasets at 5-min resolution for ERCOT, MISO, NYISO, and SPP. Wind and solar profiles are supplied for existing sites as well as planned sites based on interconnection queue projects as of 2021. For ERCOT actuals are provided for 2017 and 2018 and forecasts for 2018, and for the remaining ISOs actuals are provided for 2018 and 2019 and forecasts for 2019. There datasets were produced by NREL as part of the ARPA-E PERFORM project, an ARPA-E funded program that aim to use time-coincident power and load seeks to develop innovative management systems that represent the relative delivery risk of each asset and balance the collective risk of all assets across the grid. For more information on the datasets and methods used to generate them see https://github.com/PERFORM-Forecasts/documentation.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
AIT Log Data Sets
This repository contains synthetic log data suitable for evaluation of intrusion detection systems. The logs were collected from four independent testbeds that were built at the Austrian Institute of Technology (AIT) following the approach by Landauer et al. (2020) [1]. Please refer to the paper for more detailed information on automatic testbed generation and cite it if the data is used for academic publications. In brief, each testbed simulates user accesses to a webserver that runs Horde Webmail and OkayCMS. The duration of the simulation is six days. On the fifth day (2020-03-04) two attacks are launched against each web server.
The archive AIT-LDS-v1_0.zip contains the directories "data" and "labels".
The data directory is structured as follows. Each directory mail.
Setup details of the web servers:
OS: Debian Stretch 9.11.6
Services:
Apache2
PHP7
Exim 4.89
Horde 5.2.22
OkayCMS 2.3.4
Suricata
ClamAV
MariaDB
Setup details of user machines:
OS: Ubuntu Bionic
Services:
Chromium
Firefox
User host machines are assigned to web servers in the following way:
mail.cup.com is accessed by users from host machines user-{0, 1, 2, 6}
mail.spiral.com is accessed by users from host machines user-{3, 5, 8}
mail.insect.com is accessed by users from host machines user-{4, 9}
mail.onion.com is accessed by users from host machines user-{7, 10}
The following attacks are launched against the web servers (different starting times for each web server, please check the labels for exact attack times):
Attack 1: multi-step attack with sequential execution of the following attacks:
nmap scan
nikto scan
smtp-user-enum tool for account enumeration
hydra brute force login
webshell upload through Horde exploit (CVE-2019-9858)
privilege escalation through Exim exploit (CVE-2019-10149)
Attack 2: webshell injection through malicious cookie (CVE-2019-16885)
Attacks are launched from the following user host machines. In each of the corresponding directories user-
user-6 attacks mail.cup.com
user-5 attacks mail.spiral.com
user-4 attacks mail.insect.com
user-7 attacks mail.onion.com
The log data collected from the web servers includes
Apache access and error logs
syscall logs collected with the Linux audit daemon
suricata logs
exim logs
auth logs
daemon logs
mail logs
syslogs
user logs
Note that due to their large size, the audit/audit.log files of each server were compressed in a .zip-archive. In case that these logs are needed for analysis, they must first be unzipped.
Labels are organized in the same directory structure as logs. Each file contains two labels for each log line separated by a comma, the first one based on the occurrence time, the second one based on similarity and ordering. Note that this does not guarantee correct labeling for all lines and that no manual corrections were conducted.
Version history and related data sets:
AIT-LDS-v1.0: Four datasets, logs from single host, fine-granular audit logs, mail/CMS.
AIT-LDS-v1.1: Removed carriage return of line endings in audit.log files.
AIT-LDS-v2.0: Eight datasets, logs from all hosts, system logs and network traffic, mail/CMS/cloud/web.
Acknowledgements: Partially funded by the FFG projects INDICAETING (868306) and DECEPT (873980), and the EU project GUARD (833456).
If you use the dataset, please cite the following publication:
[1] M. Landauer, F. Skopik, M. Wurzenberger, W. Hotwagner and A. Rauber, "Have it Your Way: Generating Customized Log Datasets With a Model-Driven Simulation Testbed," in IEEE Transactions on Reliability, vol. 70, no. 1, pp. 402-415, March 2021, doi: 10.1109/TR.2020.3031317. [PDF]
Introducing Job Posting Datasets: Uncover labor market insights!
Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.
Job Posting Datasets Source:
Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.
Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.
StackShare: Access StackShare datasets to make data-driven technology decisions.
Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.
Choose your preferred dataset delivery options for convenience:
Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.
Why Choose Oxylabs Job Posting Datasets:
Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.
Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.
Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.
Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In practical media distribution systems, visual content usually undergoes multiple stages of quality degradation along the delivery chain, but the pristine source content is rarely available at most quality monitoring points along the chain to serve as a reference for quality assessment. As a result, full-reference (FR) and reduced-reference (RR) image quality assessment (IQA) methods are generally infeasible. Although no-reference (NR) methods are readily applicable, their performance is often not reliable. On the other hand, intermediate references of degraded quality are often available, e.g., at the input of video transcoders, but how to make the best use of them in proper ways has not been deeply investigated.This database is associated with a research project whose main goal is to make one of the first attempts to establish a new IQA paradigm named degraded-reference IQA (DR IQA). We initiate work on DR IQA by restricting ourselves to a two-stage distortion pipeline. Most IQA research projects rely on the availability of appropriate quality-annotated datasets. However, we find that only a few small-scale subject-rated datasets of multiply distorted images exist at the moment. These datasets contain a few hundreds of images and include the LIVE Multiply Distorted (LIVE MD), Multiply Distorted IVL (MD IVL), and LIVE Wild Compressed (LIVE WCmp) databases. Such small-scale data is not only insufficient to develop robust machine learning based IQA models, it is also not enough to perform multiple distortions behavior analysis, i.e., to study how multiple distortions behave in conjunction with each other when impacting visual content simultaneously. Surprisingly, such detailed analysis is lacking even for the case of two simultaneous distortions.We address the above-mentioned and other issues in our research project titled Degraded Reference Image Quality Assessment. As part of this project, we address the scarcity of data by constructing two large-scale datasets called DR IQA database Version 1 (V1) and DR IQA database Version 2 (V2). Each of these datasets contains 34 pristine reference (PR) images, 1,122 singly distorted degraded reference (DR) images, and 31,790 multiply distorted final distorted (FD) images, making them the largest datasets constructed in this particular area of IQA to-date. These datasets formed the basis of multiple distortion behavior analysis and DR IQA model development conducted in the above-mentioned project. We hope that the IQA research community will find them useful. Here we are releasing DR IQA database V2, while DR IQA database V1 has been separately released, also on IEEE DataPort. If you use this database in your research then please cite the following paper (Details about the DR IQA project can also be found in this paper):S. Athar and Z. Wang, "Degraded Reference Image Quality Assessment," Accepted for publication in IEEE Transactions on Image Processing, 2022.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank
This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.
For more information, see the World Bank website.
Fork this kernel to get started with this dataset.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population
http://data.worldbank.org/data-catalog/ed-stats
https://cloud.google.com/bigquery/public-data/world-bank-education
Citation: The World Bank: Education Statistics
Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @till_indeman from Unplash.
Of total government spending, what percentage is spent on education?
Abstract copyright UK Data Service and data collection copyright owner.
Beginning March 1, 2022, the "COVID-19 Case Surveillance Public Use Data" will be updated on a monthly basis. This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data. CDC has three COVID-19 case surveillance datasets: COVID-19 Case Surveillance Public Use Data with Geography: Public use, patient-level dataset with clinical data (including symptoms), demographics, and county and state of residence. (19 data elements) COVID-19 Case Surveillance Public Use Data: Public use, patient-level dataset with clinical and symptom data and demographics, with no geographic data. (12 data elements) COVID-19 Case Surveillance Restricted Access Detailed Data: Restricted access, patient-level dataset with clinical and symptom data, demographics, and state and county of residence. Access requires a registration process and a data use agreement. (32 data elements) The following apply to all three datasets: Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. Some data cells are suppressed to protect individual privacy. The datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensures that time-dependent outcome data are accurately captured. Datasets are updated monthly. Datasets are created using CDC’s operational Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. For more information about data collection and reporting, please see https://wwwn.cdc.gov/nndss/data-collection.html For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html Overview The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification. The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported volun
I am new to kaggle. I have uploaded this project since i chose this topic for my final year project. No other platform other than kaggle would be great for me where I could share my work.
The dataset is taken over 2-month period in India. It has 400 rows with 25 features like red blood cells, pedal edema, sugar,etc. The aim is to classify whether a patient has chronic kidney disease or not. The classification is based on a attribute named 'classification' which is either 'ckd'(chronic kidney disease) or 'notckd. I've performed cleaning of the dataset which includes mapping the text to numbers and some other changes. After the cleaning I've done some EDA(Exploratory Data Analysis) and then I've divided the dataset int training and testing and applied the models on them. It is observed that the classification results are not much satisfying initially. So, instead of dropping the rows with Nan values I've used the lambda function to replace them with mode for each column. After that I've divided the dataset again into training and testing sets and applied models on them. This time the results are better and we see that the random forest and decision trees are the best performers with an accuracy of 1.0 and 0 misclassifications. The performance of the classification is measured by printing confusion matrix, classification report and accuracy.
The dataset can be downloaded from https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease
I want to understand the approach to data science projects and work on different projects to expand my knowledge.
This dataset represents the extent of urbanization (for the year indicated) predicted by the model SLEUTH, developed by Dr. Keith C. Clarke, at the University of California, Santa Barbara, Department of Geography and modified by David I. Donato of the United States Geological Survey (USGS) Eastern Geographic Science Center (EGSC). Further model modification and implementation was performed at the Biodiversity and Spatial Information Center at North Carolina State University. Purpose: Urban growth probability extents throughout the 21st century for the Southeast Regional Assessment Project, which encompasses the states of Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, Tennessee and Virginia and parts of the states of Arkansas, Illinois, Indiana, Louisiana, Maryland, Missouri, Ohio and West Virginia. Credit: Southeast Regional Assessment Project; Biodiversity and Spatial Information Center, North Carolina State University, Raleigh, North Carolina 27695, Curtis M. Belyea. Use Limitation: This data set is not intended for site-specific analyses. Interpretations derived from its use are suited for regional and planning purposes only. These data are not intended to be used at scales larger than 1:100,000. Acknowledgment of Biodiversity and Spatial Analysis Center at North Carolina State University is appreciated.
This dataset combines the work of several different projects to create a seamless data set for the contiguous United States. Data from four regional Gap Analysis Projects and the LANDFIRE project were combined to make this dataset. In the northwestern United States (Idaho, Oregon, Montana, Washington and Wyoming) data in this map came from the Northwest Gap Analysis Project. In the southwestern United States (Colorado, Arizona, Nevada, New Mexico, and Utah) data used in this map came from the Southwest Gap Analysis Project. The data for Alabama, Florida, Georgia, Kentucky, North Carolina, South Carolina, Mississippi, Tennessee, and Virginia came from the Southeast Gap Analysis Project and the California data was generated by the updated California Gap land cover project. The Hawaii Gap Analysis project provided the data for Hawaii. In areas of the county (central U.S., Northeast, Alaska) that have not yet been covered by a regional Gap Analysis Project, data from the Landfire project was used. Similarities in the methods used by these projects made possible the combining of the data they derived into one seamless coverage. They all used multi-season satellite imagery (Landsat ETM+) from 1999-2001 in conjunction with digital elevation model (DEM) derived datasets (e.g. elevation, landform) to model natural and semi-natural vegetation. Vegetation classes were drawn from NatureServe's Ecological System Classification (Comer et al. 2003) or classes developed by the Hawaii Gap project. Additionally, all of the projects included land use classes that were employed to describe areas where natural vegetation has been altered. In many areas of the country these classes were derived from the National Land Cover Dataset (NLCD). For the majority of classes and, in most areas of the country, a decision tree classifier was used to discriminate ecological system types. In some areas of the country, more manual techniques were used to discriminate small patch systems and systems not distinguishable through topography. The data contains multiple levels of thematic detail. At the most detailed level natural vegetation is represented by NatureServe's Ecological System classification (or in Hawaii the Hawaii GAP classification). These most detailed classifications have been crosswalked to the five highest levels of the National Vegetation Classification (NVC), Class, Subclass, Formation, Division and Macrogroup. This crosswalk allows users to display and analyze the data at different levels of thematic resolution. Developed areas, or areas dominated by introduced species, timber harvest, or water are represented by other classes, collectively refered to as land use classes; these land use classes occur at each of the thematic levels. Raster data in both ArcGIS Grid and ERDAS Imagine format is available for download at http://gis1.usgs.gov/csas/gap/viewer/land_cover/Map.aspx Six layer files are included in the download packages to assist the user in displaying the data at each of the Thematic levels in ArcGIS. In adition to the raster datasets the data is available in Web Mapping Services (WMS) format for each of the six NVC classification levels (Class, Subclass, Formation, Division, Macrogroup, Ecological System) at the following links. http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Class_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Subclass_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Formation_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Division_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_NVC_Macrogroup_Landuse/MapServer http://gis1.usgs.gov/arcgis/rest/services/gap/GAP_Land_Cover_Ecological_Systems_Landuse/MapServer
The Utility Energy Registry (UER) is a database platform that provides streamlined public access to aggregated community-scale energy data. The UER is intended to promote and facilitate community-based energy planning and energy use awareness and engagement. On April 19, 2018, the New York State Public Service Commission (PSC) issued the Order Adopting the Utility Energy Registry under regulatory CASE 17-M-0315. The order requires utilities and CCA administrators under its regulation to develop and report community energy use data to the UER. This dataset includes electricity and natural gas usage data reported by utilities at the county level. Other UER datasets include energy use data reported at the city, town, and village, and ZIP code level. Data in the UER can be used for several important purposes such as planning community energy programs, developing community greenhouse gas emissions inventories, and relating how certain energy projects and policies may affect a particular community. It is important to note that the data are subject to privacy screening and fields that fail the privacy screen are withheld. The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, and reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit nyserda.ny.gov or follow us on X, Facebook, YouTube, or Instagram.
Our NFL Data product offers extensive access to historic and current National Football League statistics and results, available in multiple formats. Whether you're a sports analyst, data scientist, fantasy football enthusiast, or a developer building sports-related apps, this dataset provides everything you need to dive deep into NFL performance insights.
Key Benefits:
Comprehensive Coverage: Includes historic and real-time data on NFL stats, game results, team performance, player metrics, and more.
Multiple Formats: Datasets are available in various formats (CSV, JSON, XML) for easy integration into your tools and applications.
User-Friendly Access: Whether you are an advanced analyst or a beginner, you can easily access and manipulate data to suit your needs.
Free Trial: Explore the full range of data with our free trial before committing, ensuring the product meets your expectations.
Customizable: Filter and download only the data you need, tailored to specific seasons, teams, or players.
API Access: Developers can integrate real-time NFL data into their apps with API support, allowing seamless updates and user engagement.
Use Cases:
Fantasy Football Players: Use the data to analyze player performance, helping to draft winning teams and make better game-day decisions.
Sports Analysts: Dive deep into historical and current NFL stats for research, articles, and game predictions.
Developers: Build custom sports apps and dashboards by integrating NFL data directly through API access.
Betting & Prediction Models: Use data to create accurate predictions for NFL games, helping sportsbooks and bettors alike.
Media Outlets: Enhance game previews, post-game analysis, and highlight reels with accurate, detailed NFL stats.
Our NFL Data product ensures you have the most reliable, up-to-date information to drive your projects, whether it's enhancing user experiences, creating predictive models, or simply enjoying in-depth football analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this data set, 6 objects including 2 targets and 4 non-targets lay on the sea sand bottom. Upon this experiment, the transmitted signal is Wide-Band Linear Frequency Modulated Pulse (WLFM) which covers frequency range 5-110 KHz. Targets lay on the bottom rotate 180 degrees with 1 degree accuracy via electromotor. Off target to 10 meters backscattered echoes are accumulated. Fine dataset takes key role in sonar target classification. Regarding massive raw data obtained from previous stage, above massive calculation will be expected. To reduce calculation burden relating to classifying and extracting feature, it is essential to detect targets out of total received data. To implement this, the intensity of the received signal is used. It is inevitable to consider multi-path propagation, secondary reflections, and reverberation due to shoal of the region. The researcher attempts to eliminate artifact tract after detecting stage and before extracting feature by the use of a matched filter.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Final Datasets is a dataset for object detection tasks - it contains annotations for 1,600 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of 22 data set of 50+ requirements each, expressed as user stories.
The dataset has been created by gathering data from web sources and we are not aware of license agreements or intellectual property rights on the requirements / user stories. The curator took utmost diligence in minimizing the risks of copyright infringement by using non-recent data that is less likely to be critical, by sampling a subset of the original requirements collection, and by qualitatively analyzing the requirements. In case of copyright infringement, please contact the dataset curator (Fabiano Dalpiaz, f.dalpiaz@uu.nl) to discuss the possibility of removal of that dataset [see Zenodo's policies]
The data sets have been originally used to conduct experiments about ambiguity detection with the REVV-Light tool: https://github.com/RELabUU/revv-light
This collection has been originally published in Mendeley data: https://data.mendeley.com/datasets/7zbk8zsd8y/1
The following text provides a description of the datasets, including links to the systems and websites, when available. The datasets are organized by macro-category and then by identifier.
g02-federalspending.txt
(2018) originates from early data in the Federal Spending Transparency project, which pertain to the website that is used to share publicly the spending data for the U.S. government. The website was created because of the Digital Accountability and Transparency Act of 2014 (DATA Act). The specific dataset pertains a system called DAIMS or Data Broker, which stands for DATA Act Information Model Schema. The sample that was gathered refers to a sub-project related to allowing the government to act as a data broker, thereby providing data to third parties. The data for the Data Broker project is currently not available online, although the backend seems to be hosted in GitHub under a CC0 1.0 Universal license. Current and recent snapshots of federal spending related websites, including many more projects than the one described in the shared collection, can be found here.
g03-loudoun.txt
(2018) is a set of extracted requirements from a document, by the Loudoun County Virginia, that describes the to-be user stories and use cases about a system for land management readiness assessment called Loudoun County LandMARC. The source document can be found here and it is part of the Electronic Land Management System and EPlan Review Project - RFP RFQ issued in March 2018. More information about the overall LandMARC system and services can be found here.
g04-recycling.txt
(2017) concerns a web application where recycling and waste disposal facilities can be searched and located. The application operates through the visualization of a map that the user can interact with. The dataset has obtained from a GitHub website and it is at the basis of a students' project on web site design; the code is available (no license).
g05-openspending.txt
(2018) is about the OpenSpending project (www), a project of the Open Knowledge foundation which aims at transparency about how local governments spend money. At the time of the collection, the data was retrieved from a Trello board that is currently unavailable. The sample focuses on publishing, importing and editing datasets, and how the data should be presented. Currently, OpenSpending is managed via a GitHub repository which contains multiple sub-projects with unknown license.
g11-nsf.txt
(2018) refers to a collection of user stories referring to the NSF Site Redesign & Content Discovery project, which originates from a publicly accessible GitHub repository (GPL 2.0 license). In particular, the user stories refer to an early version of the NSF's website. The user stories can be found as closed Issues.
g08-frictionless.txt
(2016) regards the Frictionless Data project, which offers an open source dataset for building data infrastructures, to be used by researchers, data scientists, and data engineers. Links to the many projects within the Frictionless Data project are on GitHub (with a mix of Unlicense and MIT license) and web. The specific set of user stories has been collected in 2016 by GitHub user @danfowler and are stored in a Trello board.
g14-datahub.txt
(2013) concerns the open source project DataHub, which is currently developed via a GitHub repository (the code has Apache License 2.0). DataHub is a data discovery platform which has been developed over multiple years. The specific data set is an initial set of user stories, which we can date back to 2013 thanks to a comment therein.
g16-mis.txt
(2015) is a collection of user stories that pertains a repository for researchers and archivists. The source of the dataset is a public Trello repository. Although the user stories do not have explicit links to projects, it can be inferred that the stories originate from some project related to the library of Duke University.
g17-cask.txt
(2016) refers to the Cask Data Application Platform (CDAP). CDAP is an open source application platform (GitHub, under Apache License 2.0) that can be used to develop applications within the Apache Hadoop ecosystem, an open-source framework which can be used for distributed processing of large datasets. The user stories are extracted from a document that includes requirements regarding dataset management for Cask 4.0, which includes the scenarios, user stories and a design for the implementation of these user stories. The raw data is available in the following environment.
g18-neurohub.txt
(2012) is concerned with the NeuroHub platform, a neuroscience data management, analysis and collaboration platform for researchers in neuroscience to collect, store, and share data with colleagues or with the research community. The user stories were collected at a time NeuroHub was still a research project sponsored by the UK Joint Information Systems Committee (JISC). For information about the research project from which the requirements were collected, see the following record.
g22-rdadmp.txt
(2018) is a collection of user stories from the Research Data Alliance's working group on DMP Common Standards. Their GitHub repository contains a collection of user stories that were created by asking the community to suggest functionality that should part of a website that manages data management plans. Each user story is stored as an issue on the GitHub's page.
g23-archivesspace.txt
(2012-2013) refers to ArchivesSpace: an open source, web application for managing archives information. The application is designed to support core functions in archives administration such as accessioning; description and arrangement of processed materials including analog, hybrid, and
born digital content; management of authorities and rights; and reference service. The application supports collection management through collection management records, tracking of events, and a growing number of administrative reports. ArchivesSpace is open source and its