Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.
This statistic displays the results of a survey on the share of individuals expressing privacy concerns regarding their personal data on the internet in Italy in 2016. During the survey period, it was found that 91.5 percent of the respondents reported that the use of the internet exposes each one to be tracked and followed up while 66.3 percent stated that privacy was not a real problem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Jira is an issue tracking system that supports software companies (among other types of companies) with managing their projects, community, and processes. This dataset is a collection of public Jira repositories downloaded from the internet using the Jira API V2. We collected data from 16 pubic Jira repositories containing 1822 projects and 2.7 million issues. Included in this data are historical records of 32 million changes, 8 million comments, and 1 million issue links that connect the issues in complex ways. This artefact repository contains the data as a MongoDB dump, the scripts used to download the data, the scripts used to interpret the data, and qualitative work conducted to make the data more approachable.
This statistic shows the countries where American and European organizations face regulatory challenges involving cross-border data issues in 2019. During the survey, 24 percent of respondents mentioned they faced a challenge involving cross-border data issues in the United States.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Problem is a dataset for object detection tasks - it contains Problem annotations for 2,923 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains information on possible data issues caused by external drivers (e.g. tethered balloon artefacts in the observations) related to the MOSAiC Cloudnet data set. Flagged data must be handled with care and should be excluded from statistical analyses. Issues tracking flags are identified by tethered balloon operation periods and experienced-eye observations of MOSAiC staff.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States SBOI: sa: Most Pressing Problem: A Year Ago: Others data was reported at 5.000 % in Mar 2025. This records a decrease from the previous number of 6.000 % for Feb 2025. United States SBOI: sa: Most Pressing Problem: A Year Ago: Others data is updated monthly, averaging 7.000 % from Jan 2014 (Median) to Mar 2025, with 131 observations. The data reached an all-time high of 11.000 % in May 2023 and a record low of 3.000 % in Jul 2024. United States SBOI: sa: Most Pressing Problem: A Year Ago: Others data remains active status in CEIC and is reported by National Federation of Independent Business. The data is categorized under Global Database’s United States – Table US.S042: NFIB Index of Small Business Optimism. [COVID-19-IMPACT]
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
GIRT-Data is the first and largest dataset of issue report templates (IRTs) in both YAML and Markdown format. This dataset and its corresponding open-source crawler tool are intended to support research in this area and to encourage more developers to use IRTs in their repositories. The stable version of the dataset, containing 1_084_300 repositories, that 50_032 of them support IRTs.
For more details see the GitHub page of the dataset: https://github.com/kargaranamir/girt-data
The dataset is accepted for MSR 2023 conference, under the title of "GIRT-Data: Sampling GitHub Issue Report Templates" Search in Google Scholar.
This table shows the 10 most frequently recorded incident problem types as recorded by communications personnel for each fiscal year presented.
Housing Problems by Type of Issue and Community, 2019
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
IntelligentMonitor: Empowering DevOps Environments With Advanced Monitoring and Observability aims to improve monitoring and observability in complex, distributed DevOps environments by leveraging machine learning and data analytics. This repository contains a sample implementation of the IntelligentMonitor system proposed in the research paper, presented and published as part of the 11th International Conference on Information Technology (ICIT 2023).
If you use this dataset and code or any herein modified part of it in any publication, please cite these papers:
P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.
For any questions and research queries - please reach out via Email.
Abstract - In the dynamic field of software development, DevOps has become a critical tool for enhancing collaboration, streamlining processes, and accelerating delivery. However, monitoring and observability within DevOps environments pose significant challenges, often leading to delayed issue detection, inefficient troubleshooting, and compromised service quality. These issues stem from DevOps environments' complex and ever-changing nature, where traditional monitoring tools often fall short, creating blind spots that can conceal performance issues or system failures. This research addresses these challenges by proposing an innovative approach to improve monitoring and observability in DevOps environments. Our solution, Intelligent-Monitor, leverages realtime data collection, intelligent analytics, and automated anomaly detection powered by advanced technologies such as machine learning and artificial intelligence. The experimental results demonstrate that IntelligentMonitor effectively manages data overload, reduces alert fatigue, and improves system visibility, thereby enhancing performance and reliability. For instance, the average CPU usage across all components showed a decrease of 9.10%, indicating improved CPU efficiency. Similarly, memory utilization and network traffic showed an average increase of 7.33% and 0.49%, respectively, suggesting more efficient use of resources. By providing deep insights into system performance and facilitating rapid issue resolution, this research contributes to the DevOps community by offering a comprehensive solution to one of its most pressing challenges. This fosters more efficient, reliable, and resilient software development and delivery processes.
Components The key components that would need to be implemented are:
Implementation Details The core of the implementation would involve the following: - Setting up the data collection pipelines. - Building and training anomaly detection ML models on historical data. - Developing a real-time data processing pipeline. - Creating an alerting framework that ties into the ML models. - Building visualizations and dashboards.
The code would need to handle scaled-out, distributed execution for production environments.
Proper code documentation, logging, and testing would be added throughout the implementation.
Usage Examples Usage examples could include:
References The implementation would follow the details provided in the original research paper: P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.
Any additional external libraries or sources used would be properly cited.
Tags - DevOps, Software Development, Collaboration, Streamlini...
The ICOW project is currently collecting data on territorial issues in all regions of the world since 1816, compiled into several related data files. The first file identifies territorial claims, or explicit claims by official government representatives of at least two sovereign nation-states to the same piece of territory, and includes basic claim-level information such as the overall beginning and ending of the claim and the form of claim termination. The second file is organized by claim-dyad-years and includes one data point for each year of each claimant dyad, with information on details such as the characteristics of the claimed territory. The third and final data file covers attempts to settle these territorial claims through bilateral negotiations or with third party assistance (good offices, mediation, inquiry, conciliation, arbitration, or adjudication), and includes details such as the beginning, ending, and effectiveness of each settlement attempt.
Problems reported, comments and satisfaction surveys submitted by the general public through focused citizen engagement applications.
This dataset was created by DimaVinn
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
The dataset and source code for paper "Automating Intention Mining".
The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.
By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.
Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.
Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.
The New Security Issues, State and Local Governments tables (1.45) are updated monthly. Data were previously published in the Supplement to the Federal Reserve Bulletin, which ceased publication in December 2008. Data sources have included: Mergent, beginning November 2011; Securities Data Company, from January 1990 to October 2011; and Investment Dealers Digest before then.
Intervention Issue Resolved or Closed dataset contains data to assist with analysis of the Intervention records not resolved timely. This dataset contains the following data: All Intervention records where resolution is Resolved prior to Intervention, Resolution Closed - Inactivity, Resolution Closed - Prolonged Duration, Resolutions Closed - Escalated to ALJ, Resolved With Conference, Resolved without Conference (excluding records from deleted cases), Received date, Case Type, Case Status, Payment Status, Extent of Injury, Injury Type, Office of the User who closed Intervention, User who closed Intervention, RCE of the Case at the time when the Intervention record was closed. Intervention issue resolved or closed within 90 days; target 85% (OPS Plan Goal Language).
The Department of Housing Preservation and Development (HPD) records complaints that are made by the public for conditions which violate the New York City Housing Maintenance Code (HMC) or the New York State Multiple Dwelling Law (MDL).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Score on Action When a Problem Arises represents a measurement of how establishments respond to issues during the production process, encompassing actions taken to rectify problems and prevent future occurrences.
Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.