Facebook
TwitterThe statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Hello colleagues,
Here is the data scrapped from Art Of Problem Solving website. In the dataset you have the 8818 solutions for the 3711 problems with extracted answer (and answer letter (A,B,C,D,E) if the problem was stated like a test with answer samples).
In my opinion this data could be really helpful for us in different applications: - Bechmarking different LLMs - Prompt engineering (including few-shot pipelines) - SFT tasks - ...
The links which was fully parsed: - AMC 8 / AJHSME Problems and Solutions - AMC 10 Problems and Solutions - AMC 12 Problems and Solutions - AHSME Problems and Solutions - AIME Problems and Solutions - USAMO Problems and Solutions - USAJMO Problems and Solutions
Important note: currently I have processed about the half of the solutions, which have answer in \\boxed{} or almost boxed manner. Other solution need more careful and manual answers extraction so stay tuned for further updates of the dataset.
❤️ I hope that this dataset can help us to beat the 20/50 score on public LB.
Facebook
TwitterThe statistic shows the problems caused by poor quality data for enterprises in North America, according to a survey of North American IT executives conducted by 451 Research in 2015. As of 2015, ** percent of respondents indicated that having poor quality data can result in extra costs for the business.
Facebook
TwitterIn pursuing multiple goals, enterprises are often limited by resource allocation and scheduling planning, which must maximize the use of resources and minimize the cost. Therefore, managers must be more cautious when executing scheduling planning. The extension of gig scheduling is just the extension of enterprise problems, which is the topic most scholars discuss. The Problem discussed here is the Jobshop Problem; its contents are all integers. The first line expresses the number of jobs and the number of machines, and the rest is all time, and the final answer we solve is marksman!💯
Facebook
TwitterWhen data and analytics leaders throughout Europe and the United States were asked what the top challenges were with using data to drive business value at their companies, ** percent indicated that the lack of analytical skills among employees was the top challenge as of 2021. Other challenges with using data included data democratization and organizational silos.
Facebook
TwitterA common problem in clinical trials is the missing data that occurs when patients do not complete the study and drop out without further measurements. Missing data cause the usual statistical analysis of complete or all available data to be subject to bias. There are no universally applicable methods for handling missing data. We recommend the following: (1) Report reasons for dropouts and proportions for each treatment group; (2) Conduct sensitivity analyses to encompass different scenarios of assumptions and discuss consistency or discrepancy among them; (3) Pay attention to minimize the chance of dropouts at the design stage and during trial monitoring; (4) Collect post-dropout data on the primary endpoints, if at all possible; and (5) Consider the dropout event itself an important endpoint in studies with many.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Students in statistics, data science, analytics, and related fields study the theory and methodology of data-related topics. Some, but not all, are exposed to experiential learning courses that cover essential parts of the life cycle of practical problem-solving. Experiential learning enables students to convert real-world issues into solvable technical questions and effectively communicate their findings to clients. We describe several experiential learning course designs in statistics, data science, and analytics curricula. We present findings from interviews with faculty from the U.S., Europe, and the Middle East and surveys of former students. We observe that courses featuring live projects and coaching by experienced faculty have a high career impact, as reported by former participants. However, such courses are labor-intensive for both instructors and students. We give estimates of the required effort to deliver courses with live projects and the perceived benefits and tradeoffs of such courses. Overall, we conclude that courses offering live-project experiences, despite being more time-consuming than traditional courses, offer significant benefits for students regarding career impact and skill development, making them worthwhile investments. Supplementary materials for this article are available online.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Data with 100.000 diverse problems from International Math Olympiads (AIME, IMO etc).
You can use it for example for RAG systems or just to fine-tune model. If you like it, please upvote. Have a good work with this data!
Facebook
TwitterPeer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains information about a number of participants (participants.csv) to a workshop that need to be assigned to a number of rooms (rooms.csv).
Restrictions: 1.- The workshop has 5 different activities 2.- Each participant has indicated their first, second and third preferences for the activities available (Priority1, Priority2 and Priority3 columns in participants.csv) 3.- Participants are part of teams (Team column in participant.csv) and should be assigned together 4.- Each Activity lasts for half a day, and each participant will take part in one activity in the morning and one activity in the afternoon. 5.- Each Room must contain the SAME activity in the morning and in the afternoon.
Requirements A.- Define the way i which each participant should be assigned through a csv file in the format Name;ActivityAM;RoomAM, ActivityPM;RoomPM B.- Maximize the number of people getting their 1st and 2nd preferences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
City Problems is a dataset for object detection tasks - it contains City Test annotations for 878 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is a comprehensive collection of mathematical word problems spanning multiple domains with rich metadata and natural language variations. The problems contain 1 - 5 steps of mathematical operations that are specifically designed to encourage showing work and maintaining appropriate decimal precision throughout calculations.
The available data is a sample of 1,000 problems, and commerical options are available to procure datasets of 100,000 - 10 million problems, or to license the templating system that created the data for magnitudes more data or customizations like the number of mathematical steps involved, and the addition of domains. Contact hello@cephalopod.studio for more information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Problem is a dataset for object detection tasks - it contains Problem annotations for 2,923 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Facebook
TwitterSimple data created for practicing regression problems. Consist of three columns: Price , Feature 1 and Feature 2. Try to predict price using feature1 and feature2.The data is clean and data cleaning is not required.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
All the randomly generated problems in this data set involve a number A of aircraft passing through a square multi-sector area (MSA) of side 600 km. This MSA is composed of four square adjacent sectors of side 300 km. The aircraft use four different flight levels that belong to the same MSA. The aircraft trajectories are randomly generated in such a way that all aircraft are either flying from bottom to upper MSA borders, or from left to right borders. Taking the origin at the bottom left corner of the MSA, the distance between the first waypoint and the origin is randomly generated using the continuous uniform distribution U[75 km, 595 km]. Each trajectory is composed of three waypoints located on the MSA edges. The first waypoint is located on either the bottom or the left MSA border. The other two waypoints are generated randomly along the opposing sector borders using a uniform distribution. The cruise speeds of the aircraft are randomly generated using the continuous uniform distribution U[458 knots, 506 knots]. The time at which the aircraft enters the MSA follows the continuous uniform distribution U[20 min, 90 min]. The flight level used for each trajectory is randomly generated using a discrete uniform distribution U{1, K}. A constant flight level is used by 90% of the aircraft. The others undergo one flight level change at the internal boundary. For these aircraft, the second flight level is randomly generated using U{1, K} while excluding the first sector flight level.
Facebook
TwitterThis table shows the 10 most frequently recorded incident problem types as recorded by communications personnel for each fiscal year presented.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sets of the article "Constraint-aware neural networks for Riemann problems", consisting of training and test data sets for Riemann solutions of the cubic flux model, an isothermal two-phase model, and the Euler equations for an ideal gas. You can find detailed information in the README.md.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset contains research output from studying the Transit Network Design Problem (TNDP). At a high level, the dataset includes: a novel transit network based on the Brisbane transport infrastructure, and results from the testing of new methods on the Brisbane network and existing benchmark networks (Mandl and Mumford).
This dataset contains four subsets of data, and are related to Joshua Rosentreter's PhD Thesis. These are outlined below:
Further descriptions of the data are contained in the subfolders within.
Facebook
TwitterThe statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.