Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Problem is a dataset for object detection tasks - it contains Problem annotations for 2,923 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This page contains the files necessary to reproduce all the empirical analysis found in the Journal of Elections, Public Opinion and Parties article.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
the focus of this dataset is to provid an open-loop solution for a stochastic problem with imperfect state information andchance-constraints adjusted by an optimal gain.
Renewable energy resources, including solar and wind energy, play a significant role in sustainable energy systems. However, the inherent uncertainty and intermittency of renewable generation pose challenges to the safe and efficient operation of power systems. Recognizing the importance of short-term (hours ahead) renewable generation forecasting in power systems operation, it becomes crucial to address the potential inaccuracies in these forecasts. To systematically evaluate the performance of controllers in the presence of imperfect forecasts, we generate synthetic forecasts using actual renewable generation profiles (one from solar and one from wind). These synthetic forecasts incorporate different levels of statistical error, allowing us to control and manipulate the accuracy of the predictions. The primary objective is to employ synthetic forecasts with controlled yet realistic error levels to systematically investigate how controllers adapt to variations in forecast accuracy, providing valuable insights into their robustness and effectiveness under real-world conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Problem folders including all the input files necessary to reproduce the computations of the results related to the Reduced Order Models Chapter of N.C. Clementi PhD Thesis.
The Department of Housing Preservation and Development (HPD) records complaints that are made by the public for conditions which violate the New York City Housing Maintenance Code (HMC) or the New York State Multiple Dwelling Law (MDL).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States SBOI: sa: Most Pressing Problem: A Year Ago: Others data was reported at 5.000 % in Mar 2025. This records a decrease from the previous number of 6.000 % for Feb 2025. United States SBOI: sa: Most Pressing Problem: A Year Ago: Others data is updated monthly, averaging 7.000 % from Jan 2014 (Median) to Mar 2025, with 131 observations. The data reached an all-time high of 11.000 % in May 2023 and a record low of 3.000 % in Jul 2024. United States SBOI: sa: Most Pressing Problem: A Year Ago: Others data remains active status in CEIC and is reported by National Federation of Independent Business. The data is categorized under Global Database’s United States – Table US.S042: NFIB Index of Small Business Optimism. [COVID-19-IMPACT]
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All the randomly generated problems in this data set involve a number A of aircraft passing through a square multi-sector area (MSA) of side 600 km. This MSA is composed of four square adjacent sectors of side 300 km. The aircraft use four different flight levels that belong to the same MSA. The aircraft trajectories are randomly generated in such a way that all aircraft are either flying from bottom to upper MSA borders, or from left to right borders. Taking the origin at the bottom left corner of the MSA, the distance between the first waypoint and the origin is randomly generated using the continuous uniform distribution U[75 km, 595 km]. Each trajectory is composed of three waypoints located on the MSA edges. The first waypoint is located on either the bottom or the left MSA border. The other two waypoints are generated randomly along the opposing sector borders using a uniform distribution. The cruise speeds of the aircraft are randomly generated using the continuous uniform distribution U[458 knots, 506 knots]. The time at which the aircraft enters the MSA follows the continuous uniform distribution U[20 min, 90 min]. The flight level used for each trajectory is randomly generated using a discrete uniform distribution U{1, K}. A constant flight level is used by 90% of the aircraft. The others undergo one flight level change at the internal boundary. For these aircraft, the second flight level is randomly generated using U{1, K} while excluding the first sector flight level.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset has two files. One file contains data from students where they are marked based on how they performed each question (physics concept). Another file contains an analysis based on how students followed each of the seven PS steps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive statistics of sexual violence victim-survivors in the Crime Survey for England and Wales (CSEW) and Rape Crisis England & Wales (RCEW) datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is supplementary data to "Parameter Estimation for Water Distribution Networks with Multiple Head Loss Formulae" in ASCE Journal of Water Resources and Planning Management. Any use of this dataset must credit the authors.
BWFLnet is an operational network in Bristol, UK, operated by Bristol Water. The data provided is a the product of a long term research partnership between Bristol Water and Infrasense Labs at Imperial College London. All data provided is genuine recorded data with locations and names anonymised. The authors hope that the publication of this dataset can be a useful contribution for hydraulic model calibration as well as wider research purposes in the water distribution sector.
Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Matlab codes for examples
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Score on Action When a Problem Arises represents a measurement of how establishments respond to issues during the production process, encompassing actions taken to rectify problems and prevent future occurrences.
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
The dataset and source code for paper "Automating Intention Mining".
The code is based on dennybritz's implementation of Yoon Kim's paper Convolutional Neural Networks for Sentence Classification.
By default, the code uses Tensorflow 0.12. Some errors might be reported when using other versions of Tensorflow due to the incompatibility of some APIs.
Running 'online_prediction.py', you can input any sentence and check the classification result produced by a pre-trained CNN model. The model uses all sentences of the four Github projects as training data.
Running 'play.py', you can get the evaluation result of cross-project prediction. Please check the code for more details of the configuration. By default, it will use the four Github projects as training data to predict the sentences in DECA dataset, and in this setting, the category 'aspect evaluation' and 'others' are dropped since DECA dataset does not contain these two categories.
This statistic displays the results of a survey on the share of individuals expressing privacy concerns regarding their personal data on the internet in Italy in 2016. During the survey period, it was found that **** percent of the respondents reported that the use of the internet exposes each one to be tracked and followed up while **** percent stated that privacy was not a real problem.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains the instances used in the paper "Exact algorithms for a parallel machine scheduling problem with workforce and contiguity constraints" by Giulia Caselli, Maxence Delorme, Manuel Iori, and Carlo Alberto Magni.
Complete dataset used in the research study on Gaming and Problem-Solving: Enhancing Critical Thinking by Dr. Daniel Hall
Prognostics, which deals with predicting remaining useful life of components, subsystems, and systems, is a key tech- nology for systems health management that leads to improved safety and reliability with reduced costs. The prognostics problem is often approached from a component-centric view. However, in most cases, it is not specifically component life- times that are important, but, rather, the lifetimes of the sys- tems in which these components reside. The system-level prognostics problem can be quite difficult due to the increased scale and scope of the prognostics problem and the rela- tive lack of scalability and efficiency of typical prognostics approaches. In order to address these issues, we develop a distributed solution to the system-level prognostics prob- lem, based on the concept of structural model decomposi- tion. The system model is decomposed into independent submodels. Independent local prognostics subproblems are then formed based on these local submodels, resulting in a scalable, efficient, and flexible distributed approach to the system-level prognostics problem. We provide a formulation of the system-level prognostics problem and demonstrate the approach on a four-wheeled rover simulation testbed. The re- sults show that the system-level prognostics problem can be accurately and efficiently solved in a distributed fashion.
Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.