This statistic represents the pain points for data storage in 2016 and 2017, according to IT decision-makers. It reveals that 47 percent IT decision-makers had problems that stemmed from the growth of data and capacity.
The statistic shows the problems caused by poor quality data for enterprises in North America, according to a survey of North American IT executives conducted by 451 Research in 2015. As of 2015, 44 percent of respondents indicated that having poor quality data can result in extra costs for the business.
MATH is a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ â ĂĂ·) to reach the⊠See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
A challenge set for elementary-level Math Word Problems (MWP). An MWP consists of a short Natural Language narrative that describes a state of the world and poses a question about some unknown quantities.
The examples in SVAMP test a model across different aspects of solving MWPs: 1) Is the model question sensitive? 2) Does the model have robust reasoning ability? 3) Is it invariant to structural alterations?
With the creation of the first drug court in Miami-Dade County, Florida in 1989, problem-solving courts emerged as an innovative effort to close the revolving door of recidivism. Designed to target the social and psychological problems underlying certain types of criminal behavior, the problem-solving model boasts a community-based, therapeutic approach. As a result of the anecdotal successes of early drug courts, states expanded the problem-solving court model by developing specialized courts or court dockets to address a number of social problems. Although the number and types of problem-solving courts has been expanding, the formal research and statistical information regarding the operations and models of these programs has not grown at the same rate. Multiple organizations have started mapping the variety of problem-solving courts in the county; however, a national catalogue of problem-solving court infrastructure is lacking. As evidence of this, different counts of problem-solving courts have been offered by different groups, and a likely part of the discrepancy lies in disagreements about how to define and identify a problem-solving court. What is known about problem-solving courts is therefore limited to evaluation or outcome analyses of specific court programs. In 2010, the Bureau of Justice Statistics awarded the National Center for State Courts a grant to develop accurate and reliable national statistics regarding problem-solving court operations, staffing, and participant characteristics. The NCSC, with assistance from the National Drug Court Institute (NDCI), produced the resulting Census of Problem-Solving Courts which captures information on over 3,000 problem-solving courts that were operational in 2012.
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
The Department of Housing Preservation and Development (HPD) records complaints that are made by the public for conditions which violate the New York City Housing Maintenance Code (HMC) or the New York State Multiple Dwelling Law (MDL).
Evaluate a natural language code generation model on real data science pedagogical notebooks! Data Science Problems (DSP) includes well-posed data science problems in Markdown along with unit tests to verify correctness and a Docker environment for reproducible execution. About 1/3 of notebooks in this benchmark also include data dependencies, so this benchmark not only can test a model's ability to chain together complex tasks, but also evaluate the solutions on real data! See our paper Training and Evaluating a Jupyter Notebook Data Science Assistant for more details about state of the art results and other properties of the dataset.
SSA's basic IT Service Management tool used to identify and track authorized changes to the Production IT environment; identify and track Incidents and Problems within that environment; support Service Desk interactions with internal users; as well as manage and track IT assets and configurable items within the Agency's CMDB. It runs on Hewlett Packard's Service Manager software.
Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.
The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around 53 percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Log of Volvo IT problem management (closed problems) Parent item: BPI Challenge 2013 Logs of Volvo IT incident and problem management
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All the randomly generated problems in this data set involve a number A of aircraft passing through a square multi-sector area (MSA) of side 600 km. This MSA is composed of four square adjacent sectors of side 300 km. The aircraft use four different flight levels that belong to the same MSA. The aircraft trajectories are randomly generated in such a way that all aircraft are either flying from bottom to upper MSA borders, or from left to right borders. Taking the origin at the bottom left corner of the MSA, the distance between the first waypoint and the origin is randomly generated using the continuous uniform distribution U[75 km, 595 km]. Each trajectory is composed of three waypoints located on the MSA edges. The first waypoint is located on either the bottom or the left MSA border. The other two waypoints are generated randomly along the opposing sector borders using a uniform distribution. The cruise speeds of the aircraft are randomly generated using the continuous uniform distribution U[458 knots, 506 knots]. The time at which the aircraft enters the MSA follows the continuous uniform distribution U[20 min, 90 min]. The flight level used for each trajectory is randomly generated using a discrete uniform distribution U{1, K}. A constant flight level is used by 90% of the aircraft. The others undergo one flight level change at the internal boundary. For these aircraft, the second flight level is randomly generated using U{1, K} while excluding the first sector flight level.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets presented here were partially used in âFormulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleaningsâ (Toscano, A., Ferreira, D. , Morabito, R. , Computers & Chemical Engineering) [1], in âA decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaningâ (Toscano, A., Ferreira, D. , Morabito, R. , Flexible Services and Manufacturing Journal) [2], and in âA heuristic approach to optimize the production scheduling of fruit-based beveragesâ (Toscano et al., GestĂŁo & Produção, 2020) [3]. In fruit-based production processes, there are two production stages: preparation tanks and production lines. This production process has some process-specific characteristics, such as temporal cleanings and synchrony between the two production stages, which make optimized production planning and scheduling even more difficult. In this sense, some papers in the literature have proposed different methods to solve this problem. To the best of our knowledge, there are no standard datasets used by researchers in the literature in order to verify the accuracy and performance of proposed methods or to be a benchmark for other researchers considering this problem. The authors have been using small data sets that do not satisfactorily represent different scenarios of production. Since the demand in the beverage sector is seasonal, a wide range of scenarios enables us to evaluate the effectiveness of the proposed methods in the scientific literature in solving real scenarios of the problem. The datasets presented here include data based on real data collected from five beverage companies. We presented four datasets that are specifically constructed assuming a scenario of restricted capacity and balanced costs. These dataset is supplementary data for the submitted paper to Data in Brief [4]. [1] Toscano, A., Ferreira, D., Morabito, R., Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings, Computers & Chemical Engineering. 142 (2020) 107038. Doi: 10.1016/j.compchemeng.2020.107038. [2] Toscano, A., Ferreira, D., Morabito, R., A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning, Flexible Services and Manufacturing Journal. 31 (2019) 142-173. Doi: 10.1007/s10696-017-9303-9. [3] Toscano, A., Ferreira, D., Morabito, R., Trassi, M. V. C., A heuristic approach to optimize the production scheduling of fruit-based beverages. GestĂŁo & Produção, 27(4), e4869, 2020. https://doi.org/10.1590/0104-530X4869-20. [4] Piñeros, J., Toscano, A., Ferreira, D., Morabito, R., Datasets for lot sizing and scheduling problems in the fruit-based beverage production process. Data in Brief (2021).
Problems reported, comments and satisfaction surveys submitted by the general public through focused citizen engagement applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The collection of datasets in Table 1 is extended, and their more meaningful and, thus, recommended, descriptions based on multiplicative means and multiplicative standard errors or standard deviations are given. Some comparisons appear to be of interest. Necessarily, arithmetic means exceed multiplicative ones, starting from some 15% for small s*s around 1.7 up to more than the sevenfold for s* >7. The lower limits of the 95% ranges, relative to the means, turn increasingly negative with s* growing for the classical version, but remain positive and get smaller for the multiplicative description. Turning to upper limits, the multiplicative limit exceeds the additive one by some 17% for s* â=â 1.7. With s* â=â 2.5, the difference is about 25%. For s* â=â 4.2, there is no difference, and for s* â=â 7, the additive mean is only half the multiplicative one.
These interview data are part of the project "Looking for data: information seeking behaviour of survey data users", a study of secondary data usersâ information-seeking behaviour. The overall goal of this study was to create evidence of actual information practices of users of one particular retrieval system for social science data in order to inform the development of research data infrastructures that facilitate data sharing. In the project, data were collected based on a mixed methods design. The research design included a qualitative study in the form of expert interviews and â building on the results found therein â a quantitative web survey of secondary survey data users. For the qualitative study, expert interviews with six reference persons of a large social science data archive have been conducted. They were interviewed in their role as intermediaries who provide guidance for secondary users of survey data. The knowledge from their reference work was expected to provide a condensed view of goals, practices, and problems of people who are looking for survey data. The anonymized transcripts of these interviews are provided here. They can be reviewed or reused upon request. The survey dataset from the quantitative study of secondary survey data users is downloadable through this data archive after registration. The core result of the Looking for data study is that community involvement plays a pivotal role in survey data seeking. The analyses show that survey data communities are an important determinant in survey data users' information seeking behaviour and that community involvement facilitates data seeking and has the capacity of reducing problems or barriers. The qualitative part of the study was designed and conducted using constructivist grounded theory methodology as introduced by Kathy Charmaz (2014). In line with grounded theory methodology, the interviews did not follow a fixed set of questions, but were conducted based on a guide that included areas of exploration with tentative questions. This interview guide can be obtained together with the transcript. For the Looking for data project, the data were coded and scrutinized by constant comparison, as proposed by grounded theory methodology. This analysis resulted in core categories that make up the "theory of problem-solving by community involvement". This theory was exemplified in the quantitative part of the study. For this exemplification, the following hypotheses were drawn from the qualitative study: (1) The data seeking hypotheses: (1a) When looking for data, information seeking through personal contact is used more often than impersonal ways of information seeking. (1b) Ways of information seeking (personal or impersonal) differ with experience. (2) The experience hypotheses: (2a) Experience is positively correlated with having ambitious goals. (2b) Experience is positively correlated with having more advanced requirements for data. (2c) Experience is positively correlated with having more specific problems with data. (3) The community involvement hypothesis: Experience is positively correlated with community involvement. (4) The problem solving hypothesis: Community involvement is positively correlated with problem solving strategies that require personal interactions.
When data and analytics leaders throughout Europe and the United States were asked what the top challenges were with using data to drive business value at their companies, 41 percent indicated that the lack of analytical skills among employees was the top challenge as of 2021. Other challenges with using data included data democratization and organizational silos.
This data corresponds to the data and experiments described in Section 5 of the following paper: Two-sided profile-based optimality in the stable marriage problem Authors: Frances Cooper and David Manlove The paper is located at: https://arxiv.org/abs/1905.06626 The data is located at: https://doi.org/10.5281/zenodo.2542703 The software is located at: https://doi.org/10.5281/zenodo.2545798 See the README for more information. Version 1.0.2 updates: * Updated README
This statistic represents the pain points for data storage in 2016 and 2017, according to IT decision-makers. It reveals that 47 percent IT decision-makers had problems that stemmed from the growth of data and capacity.