Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available. How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight. Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself. Anomalies/Faults: This is a document category classification problem.
The ARPA-E Grid Optimization (GO) Competition Challenge 1, from 2018 to 2019, focused on the basic Security Constrained AC Optimal Power Flow problem (SCOPF) for a single time period. The Challenge utilized sets of unique datasets generated by the ARPA-E GRID DATA program. Each dataset consisted of a collection of power system network models of different sizes with associated operating scenarios (snapshots in time defining instantaneous power demand, renewable generation, generator and line availability, etc.). The datasets were of two types: Real-Time, which included starting-point information, and Online, which did not. Week-Ahead data is also provided for some cases but was not used in the Competition. Although most datasets were synthetic and generated by GRIDDATA, a few came from industry and were only used in the Final Event. All synthetic Input Data and Team Results for the GO Competition Challenge 1 for the Sandbox, Trial Events 1 to 3, and the Final Event along with problem, format, scoring and rules descriptions are available here. Data for industry scenarios will not be made public. Challenge 1, a minimization problem, required two computational steps. Solver 1 or Code 1 solved the base SCOPF problem under a strict wall clock time limit, as would be the case in industry, and reported the base case operating point as output, which was used to compute the Objective Function value that was used as the scenario score. The feasibility of the solution was provided by the Solver 2 or Code 2, which solves the power flow problem for all contingencies based on the results from Solver 1. This is not normally done in industry, so the time limits were relaxed. In fact, there were no time limits for Trial Event 1. This proved to be a mistake, with some codes running for more than 90 hours, and a time limit of 2 seconds per contingency was imposed for all other events. Entrants were free to use their own Solver 2 or use an open-source version provided by the Competition. Containers, such as Docker, were considered to improve the portability of codes, but none that could reliably support a multi-node parallel computing environment, e.g., MPI, could be found. For more information on the competition and challenge see the "GO Competition Challenge 1 Information" and "GO Competition Challenge 1 Additional Information" resources below.
The M4 dataset is a collection of 100,000 time series used for the fourth edition of the Makridakis forecasting Competition. The M4 dataset consists of time series of yearly, quarterly, monthly and other (weekly, daily and hourly) data, which are divided into training and test sets. The minimum numbers of observations in the training test are 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series. The participants were asked to produce the following numbers of forecasts beyond the available data that they had been given: six for yearly, eight for quarterly, 18 for monthly series, 13 for weekly series and 14 and 48 forecasts respectively for the daily and hourly ones.
The M4 dataset was created by selecting a random sample of 100,000 time series from the ForeDeCk database. The selected series were then scaled to prevent negative observations and values lower than 10, thus avoiding possible problems when calculating various error measures. The scaling was performed by simply adding a constant to the series so that their minimum value was equal to 10 (29 occurrences across the whole dataset). In addition, any information that could possibly lead to the identification of the original series was removed so as to ensure the objectivity of the results. This included the starting dates of the series, which did not become available to the participants until the M4 had ended.
Comprehensive dataset of 2 Taekwondo competition areas in Oregon, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This is a multivariate data set recorded from a patient in the sleep laboratory of the Beth Israel Hospital (now the Beth Israel Deaconess Medical Center) in Boston, Massachusetts. This data set was extracted from record slp60 of the MIT-BIH Polysomnographic Database, and it was submitted to the Santa Fe Time Series Competition in 1991 by our group. The data are presented in text form and have been split into two sequential parts. Each line contains simultaneous samples of three parameters; the interval between samples in successive lines is 0.5 seconds. The first column is the heart rate, the second is the chest volume (respiration force), and the third is the blood oxygen concentration (measured by ear oximetry). The sampling frequency for each measurement is 2 Hz (i.e., the time interval between measurements in successive rows is 0.5 seconds).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This package contains the datasets and supplementary materials used in the IPIN 2023 Competition.
Contents
We provide additional information on the competition at: https://evaal.aaloa.org/2023/call-for-competition
Citation Policy
Please cite the following works when using the datasets included in this package:
Torres-Sospedra, J.; et al. Datasets and Supporting Materials for the IPIN 2023
Competition Track 3 (Smartphone-based, off-site), Zenodo 2023
http://dx.doi.org/10.5281/zenodo.8362205
Check the updated citation policy at: http://dx.doi.org/10.5281/zenodo.8362205
Contact
For any further questions about the database and this competition track, please contact:
Joaquín Torres-Sospedra
Centro ALGORITMI,
Universidade do Minho, Portugal
info@jtorr.es - jtorres@algoritmi.uminho.pt
Antonio R. Jiménez
Centre of Automation and Robotics (CAR)-CSIC/UPM, Spain
antonio.jimenez@csic.es
Antoni Pérez-Navarro
Faculty of Computer Sciences, Multimedia and Telecommunication, Universitat Oberta de Catalunya, Barcelona, Spain
aperezn@uoc.edu
Acknowledgements
We thank Maximilian Stahlke and Christopher Mutschler at Fraunhofer ISS, as well as Miguel Ortiz and Ziyou Li at Université Gustave Eiffel, for their invaluable support in collecting the datasets. And last but certainly not least, Antonino Crivello and Francesco Potortì for their huge effort in georeferencing the competition venue and evaluation points.
We extend our appreciation to the staff at the Museum for Industrial Culture (Museum Industriekultur) for their unwavering patience and invaluable support throughout our collection days.
We are also grateful to Francesco Potortì, the ISTI-CNR team (Paolo, Michele & Filippo), and the Fraunhofer IIS team (Chris, Tobi, Max, ...) for their invaluable commitment to organizing and promoting the IPIN competition.
This work and competition belong to the IPIN 2023 Conference in Nuremberg (Germany).
Parts of this work received the financial support received from projects and grants:
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
We collected data on almost the complete population of the merger control decisions by the Directorate-General Competition’s (DG COMP) of the European Commission. We started the data collection with the first year of common European merger control, 1990, and included all years up to 2014. This amounts to 25 years of data on European merger control. With regard to the scope of the decisions, we collected data in all cases where a legal decision document exists. This includes all cases settled in the first phase of an investigation (Art. 6(1)(a), 6(1)(b), 6(1)(c) and 6(2)) and all cases decided in the second phase of an investigation (Art. 8(1), 8(2), and 8(3)). Note that this also includes all cases settled under a ‘simplified procedure’, provided that a legal decision document exists. Furthermore, we also intended to collect data on cases that were either referred back to member states by DG COMP or aborted by the merging parties. While we have collected some data on such cases, data on these cases is not always available. Therefore, we cannot guarantee that the final dataset covers all of these cases. The level of observation is not a particular merger case but a particular product/geographic market combination concerned by a merger. In total, the final dataset contains 5,196 DG COMP merger decisions. For each of this decision, we record a number of observations equal to the number of product/geographic markets identified in the specific transaction. Hence, the total dataset contains 31,451 observations.
Models and external data of 3rd place efficiency solution for https://www.kaggle.com/competitions/pii-detection-removal-from-educational-data competition.
See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-process-external-data for links to external data and processing code
See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-train for training code that generated models.
See https://www.kaggle.com/code/devinanzelmo/piidd-efficiency-3rd-inference for inference code
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and code for the paper Competition and Defaults in Online Search.This paper offers the first systematic quantitative assessment of default-option interventions designed to mitigate Google’s search dominance. By analyzing interventions in the European Economic Area, Russia, and Turkey, we find that, across all three cases, changes to default settings effectively reduced Google’s market share. The causal impact amounts to less than 1 percentage point in the EEA and over 10 percentage points in Russia and Turkey. Differences arise from intervention nuances, including the size of the targeted users’ group, local market characteristics, and remedy designs. We discuss the complexity of assessing the interventions’ impact on welfare deriving from quality responses.
Competition data, including nominal and faulty scenarios, for Diagnostic Problem I of the Third International Diagnostic Competition. Three file formats are provided, tab-delimited .txt files, Matlab .mat files, and tab-delimited .scn files. The scenario (.scn) files are read by the DXC framework. See the DXC'11 Industrial Track Sample Data resource page for additional documentation, including system catalogs and schematics. There were no DA entries for Diagnostic Problem II so we are withholding the data for use in a future Diagnostic Competition.
https://the-market.us/privacy-policy/https://the-market.us/privacy-policy/
Global Database Management System Market is estimated to be valued US$ XX.X million in 2019. The report on Database Management System Market provides qualitative as well as quantitative analysis in terms of market dynamics, competition scenarios, opportunity analysis, market growth, etc. for the forecast year up to 2029. The global database management system market is segmented on the basis of type, application, and geography.
In 2019, the North America market is valued US$ XX.X million and the market share is estimated X.X%, and it is expected to be US$ XX.X million and X.X% in 2029, with a CAGR X.X% from 2020 to 2029. Read More
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This data is downloaded from the link shared in the PlaygroundS4E06 episode on the data page. We add a column id to keep consistency with the competition data and upload herewith.
Please feel free to use this dataset as part of your pipeline.
Key links:-
1. Competition - https://www.kaggle.com/competitions/playground-series-s4e6
2. Data page- https://www.kaggle.com/competitions/playground-series-s4e6/data
3. Original dataset link- https://archive.ics.uci.edu/dataset/697/predict+students+dropout+and+academic+success
This is a .csv file. Please use pandas.read_csv() or polars.scan_csv() to read in the file
Best regards!
Comprehensive dataset of 1 Taekwondo competition areas in Wisconsin, United States as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
https://data.gov.tw/licensehttps://data.gov.tw/license
Describe the names of awarded teaching plans in the National Legal Database Creative Teaching Competition.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper analyzes how the establishment of the United States Court of Appeals for the Federal Circuit in 1982 has affected the relationship between innovation and competition. Using the same dataset as Aghion et al. (Competition and innovation: an inverted-u relationship. Quarterly Journal of Economics 2005; 120(2):701-728) I find a structural break in the early 1980s. Taking this break into consideration, the inverted-U empirical relationship between innovation and competition found by Aghion et al. does not hold. In fact, I find that there is a positive innovation-competition relationship during the period 1973-1982 and no relationship at all in the 1983-1994 period.
This dataset was created by Maha Kosksi
Well 58-32 (previously labeled MU-ESW1) was drilled near Milford Utah during Phase 2B of the FORGE Project to confirm geothermal reservoir characteristics met requirements for the final FORGE site. Well Accord-1 was drilled decades ago for geothermal exploration purposes. While the conditions encountered in the well were not suitable for developing a conventional hydrothermal system, the information obtained suggested the region may be suitable for an enhanced geothermal system. Geophysical well logs were collected in both wells to obtain useful information regarding there nature of the subsurface materials. For the recent testing of 58-32, the Utah FORGE Project contracted with the well services company Schlumberger to collect the well logs.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The M4 competition which is a continuation of the Makridakis Competitions for forecasting and was conducted in 2018. This competion includes the prediction of both Point Forecasts and Prediction Intervals.
Paper describing the competition and the various benchmarks and approaches was published in a special edition of the International Journal of Forecasting and is available for open access and can be found here
The code for various benchmarks on this dataset can be found at the following github repository
The data is available at both the github link and the official website of MOFC
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Saurabh Shahane
Released under CC0: Public Domain
Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the competition are provided here, as well as competition details and baseline solutions. The data are derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.
Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining algorithms for document classification. The documents in question were aviation safety reports that documented one or more problems that occurred during certain flights. The goal was to label the documents with respect to the types of problems that were described. This is a subset of the Aviation Safety Reporting System (ASRS) dataset, which is publicly available. How Data Was Acquired: The data for this competition came from human generated reports on incidents that occurred during a flight. Sample Rates, Parameter Description, and Format: There is one document per incident. The datasets are in raw text format. All documents for each set will be contained in a single file. Each row in this file corresponds to a single document. The first characters on each line of the file are the document number and a tilde separats the document number from the text itself. Anomalies/Faults: This is a document category classification problem.