Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Akshat is new to the market and is unaware of the prices of multiple products. he wants to be assured of the price before making any purchase! Help Akshat out in predicting the prices for Products and derive some conclusive evidences!
In the dataset, there are products/images/dimensions/prices/ratings and much more for Feature Engineering.
This type of business problem is typical for any new products being launched into the market and sites like Trivago/Policy Bazaar compare same product over multiple sites to reach a conclusive rate. Can you achieve the same!?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of the following columns:Data description
The objective of this competition is to create a model to predict the number of retweets a tweet will get on Twitter. The data used to train the model will be approximately 2,400 tweets each from 38 major banks and mobile network operators across Africa.
A machine learning model to predict retweets would be valuable to any business that uses social media to share important information and messages to the public. This model can be used as a tool to help businesses better tailor their tweets to ensure maximum impact and outreach to clients and non-clients.
The data has been split into a test and training set.
train.json (zipped) is the dataset that you will use to train your model. This dataset includes about 2,400 consecutive tweets from each of the companies listed below, for a total of 96,562 tweets.
test_questions.json (zipped) is the dataset to which you will apply your model to test how well it performs. Use your model and this dataset to predict the number of retweets a tweet will receive. The test set are the consecutive tweets that followed the first tweets provided in the training sets. There are a maximum of 800 tweets per company in this test set. This dataset includes the same fields as train.json except for the retweet_count and favorite_count variables.
sample_submission.csv is a table to provide an example of what your submission file should look like.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset aims to publish the files that I will use on the Kaggle challenge called Predict Future Sales
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, with the continuous improvement of the financial system and the rapid development of the banking industry, the competition of the banking industry itself has intensified. At the same time, with the rapid development of information technology and Internet technology, customers’ choice of financial products is becoming more and more diversified, and customers’ dependence and loyalty to banking institutions is becoming less and less, and the problem of customer churn in commercial banks is becoming more and more prominent. How to predict customer behavior and retain existing customers has become a major challenge for banks to solve. Therefore, this study takes a bank’s business data on Kaggle platform as the research object, uses multiple sampling methods to compare the data for balancing, constructs a bank customer churn prediction model for churn identification by GA-XGBoost, and conducts interpretability analysis on the GA-XGBoost model to provide decision support and suggestions for the banking industry to prevent customer churn. The results show that: (1) The applied SMOTEENN is more effective than SMOTE and ADASYN in dealing with the imbalance of banking data. (2) The F1 and AUC values of the model improved and optimized by XGBoost using genetic algorithm can reach 90% and 99%, respectively, which are optimal compared to other six machine learning models. The GA-XGBoost classifier was identified as the best solution for the customer churn problem. (3) Using Shapley values, we explain how each feature affects the model results, and analyze the features that have a high impact on the model prediction, such as the total number of transactions in the past year, the amount of transactions in the past year, the number of products owned by customers, and the total sales balance. The contribution of this paper is mainly in two aspects: (1) this study can provide useful information from the black box model based on the accurate identification of churned customers, which can provide reference for commercial banks to improve their service quality and retain customers; (2) it can provide reference for customer churn early warning models of other related industries, which can help the banking industry to maintain customer stability, maintain market position and reduce corporate losses.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The repository includes the datasets used to organise CMI-PB 2nd Challenge. Read more here: https://www.cmi-pb.org/blog/learn-about-project/#A%20community%20prediction%20challenge
This data is related to HackerEarth's Customer Churn Rate Prediction Challenge
Contains 3 Files. For more info regarding data click on it
HackerEarth
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set underlying this challenge consisted of phosphoprotein and cytokine concentrations in response to 49 combinatoric perturbations of seven protein-specific inhibitors and seven stimuli.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The Dataset is from Machine Hack.
Buyers spend a significant amount of time surfing an e-commerce store, since the pandemic the e-commerce has seen a boom in the number of users across the domains. In the meantime, the store owners are also planning to attract customers using various algorithms to leverage customer behavior patterns
Tracking customer activity is also a great way of understanding customer behavior and figuring out what can actually be done to serve them better. Machine learning and AI has already played a significant role in designing various recommendation engines to lure customers by predicting their buying patterns
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prediction errors for spatial application.
ExploreSA: The Gawler Challenge is a global online competition from the Government of South Australia. The challenge is to identify or predict areas of potential mineralisation within the Gawler region, using any technique. This dataset contains a... ExploreSA: The Gawler Challenge is a global online competition from the Government of South Australia. The challenge is to identify or predict areas of potential mineralisation within the Gawler region, using any technique. This dataset contains a list of all team submissions, with links to video pitch, submitted data packages and highlighting all winners in each category for the Unearthed ExploreSA The Gawler challenge: https://unearthed.solutions/u/challenge/gawler-challenge.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The YJMob100K human mobility datasets (YJMob100K_dataset1.csv.gz and YJMob100K_dataset1.csv.gz) contain the movement of a total of 100,000 individuals across a 75 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of 80,000 individuals across a 75-day business-as-usual period, while the second dataset contains the movement of 20,000 individuals across a 75-day period (including the last 15 days during an emergency) with unusual behavior.
While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (cell_POIcat.csv.gz). The list of 85 POI categories can be found in POI_datacategories.csv.
For details of the dataset, see Data Descriptor:
Yabe, T., Tsubouchi, K., Shimizu, T., Sekimoto, Y., Sezaki, K., Moro, E., & Pentland, A. (2024). YJMob100K: City-scale and longitudinal dataset of anonymized human mobility trajectories. Scientific Data, 11(1), 397. https://www.nature.com/articles/s41597-024-03237-9
--- Details about the Human Mobility Prediction Challenge 2023 (ended November 13, 2023) ---
The challenge takes place in a mid-sized and highly populated metropolitan area, somewhere in Japan. The area is divided into 500 meters x 500 meters grid cells, resulting in a 200 x 200 grid cell space.
The human mobility datasets (task1_dataset.csv.gz and task2_dataset.csv.gz) contain the movement of a total of 100,000 individuals across a 90 day period, discretized into 30-minute intervals and 500 meter grid cells. The first dataset contains the movement of a 75 day business-as-usual period, while the second dataset contains the movement of a 75 day period during an emergency with unusual behavior.
There are 2 tasks in the Human Mobility Prediction Challenge.
In task 1, participants are provided with the full time series data (75 days) for 80,000 individuals, and partial (only 60 days) time series movement data for the remaining 20,000 individuals (task1_dataset.csv.gz). Given the provided data, Task 1 of the challenge is to predict the movement patterns of the individuals in the 20,000 individuals during days 60-74. Task 2 is similar task but uses a smaller dataset of 25,000 individuals in total, 2,500 of which have the locations during days 60-74 masked and need to be predicted (task2_dataset.csv.gz).
While the name or location of the city is not disclosed, the participants are provided with points-of-interest (POIs; e.g., restaurants, parks) data for each grid cell (~85 dimensional vector) as supplementary information (which is optional for use in the challenge) (cell_POIcat.csv.gz).
For more details, see https://connection.mit.edu/humob-challenge-2023
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Following the success of the first Computers in Cardiology Challenge, we are pleased to offer a new challenge from PhysioNet and Computers in Cardiology 2001. The challenge is to develop a fully automated method to predict the onset of paroxysmal atrial fibrillation/flutter (PAF), based on the ECG prior to the event. The goal of the contest is to stimulate effort and advance the state of the art in this clinically significant problem, and to foster both friendly competition and wide-ranging collaborations.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Predictive modeling for toxicity can help reduce risks in a range of applications and potentially serve as the basis for regulatory decisions. However, the utility of these predictions can be limited if the associated uncertainty is not adequately quantified. With recent studies showing great promise for deep learning-based models also for toxicity predictions, we investigate the combination of deep learning-based predictors with the conformal prediction framework to generate highly predictive models with well-defined uncertainties. We use a range of deep feedforward neural networks and graph neural networks in a conformal prediction setting and evaluate their performance on data from the Tox21 challenge. We also compare the results from the conformal predictors to those of the underlying machine learning models. The results indicate that highly predictive models can be obtained that result in very efficient conformal predictors even at high confidence levels. Taken together, our results highlight the utility of conformal predictors as a convenient way to deliver toxicity predictions with confidence, adding both statistical guarantees on the model performance as well as better predictions of the minority class compared to the underlying models.
Our goal is to explore the feasibility and usefulness of using a combination of covering arrays and machine learning models for predicting results of an agent- based simulation model within the vast parameter value combination space. The challenge is to select parameter values that are representative of the overall behavior of the model, so that we can train the machine learning model to be able to correctly predict behavior on previously untested areas of the parameter space. We have chosen Wilensky's Heat Bugs model in NetLogo for our study. It is a simple model, amenable to quick data generation, with a limited number of outputs to predict, and with emergent behavior. This model therefore allows exploration of this new approach.We utilize covering arrays to reduce the parameter value space systematically, run the model for each parameter set in the 2-way and 3-way covering arrays, train a random forest model on the 2-way data (33, 351 parameter combinations), and test its ability to predict the outcome of the simulation on the significantly larger 3-way data that was not seen during the training of the model (3, 971, 955 parameter combinations).
Accurate and interpretable forecasting models predicting spatially and temporally fine-grained changes in the numbers of intrastate conflict casualties are of crucial importance for policymakers and international non-governmental organisations (NGOs). Using a count data approach, we propose a hierarchical hurdle regression model to address the corresponding prediction challenge at the monthly PRIO-grid level. More precisely, we model the intensity of local armed conflict at a specific point in time as a three-stage process. Stages one and two of our approach estimate whether we will observe any casualties at the country- and grid-cell-level, respectively, while stage three applies a regression model for truncated data to predict the number of such fatalities conditional upon the previous two stages. Within this modelling framework, we focus on the role of governmental arms imports as a processual factor allowing governments to intensify or deter from fighting. We further argue that a grid cell's geographic remoteness is bound to moderate the effects of these military buildups. Out-of-sample predictions corroborate the effectiveness of our parsimonious and theory-driven model, which enables full transparency combined with accuracy in the forecasting process.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Prediction in natural environments is a challenging task, and there is a lack of clarity around how a myopic organism can make short-term predictions given limited data availability and cognitive resources. In this context, we may ask what kind of resources are available to the organism to help it address the challenge of short-term prediction within its own cognitive limits. We point to one potentially important resource: ordinal patterns, which are extensively used in physics but not in the study of cognitive processes. We explain the potential importance of ordinal patterns for short-term prediction, and how natural constraints imposed through (1) ordinal pattern types, (2) their transition probabilities and (3) their irreversibility signature may support short-term prediction. Having tested these ideas on a massive data set of Bitcoin prices representing a highly fluctuating environment, we provide preliminary empirical support showing how organisms characterized by bounded rationality may generate short-term predictions by relying on ordinal patterns.
Methods The data file holds 60000 samples of 62 minutes of trade prices in permutations form of the bitcoin exchange bitstamp
The readme files contain the explanation of the code for the article.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
External conditions can drive biological rates in ectotherms by directly influencing body temperatures. While estimating the temperature dependence of performance-traits such as growth and development rate is feasible under controlled laboratory settings, predictions in nature are difficult. One major challenge lies in translating performance under constant conditions to fluctuating environments. Using the butterfly Pieris napi as model system, we show that development rate, an important fitness trait, can be accurately predicted in the field using models parameterized under constant laboratory temperatures. Additionally, using a factorial design, we show that accurate predictions can be made across microhabitats, but critically hinge on adequate consideration of nonlinearity in reaction norms, spatial heterogeneity in microclimate, and temporal variation in temperature. Our empirical results are also supported by a comparison of published and simulated data. Conclusively, our combined results suggest that, discounting direct effects of temperature, insect development rates are generally unaffected by thermal fluctuations. Methods Thermal performance in development rate was measured at 8 constant temperatures in the butterfly Pieris napi. Measurements were made for eggs and larvae separately, as well as for the full ontogonetic development between oviposition and pupation (eggs and larvae combined). Thermal performance curves were fit to the data. Prediction models were parameterized based on this data, and validated through field transplants. For the field transplants, microclimate temperatures were frequently sampled at multiple sites. These temperatures were used to predict development times. For comparison, weather station data was also used in the prediction model. Transplanted individuals were monitored and their development times in the field were compared to predictions.
All raw data necessary to reproduce these results are available here, and compressed to "von_Schmalensee_et_al_2021_ecol_lett_scripts_and_data.rar". Additionally, the scripts used to produce the results and the five main figures are available, with annotation. See the "0_readme.txt" file for more information, and the main manuscript and supporting information for a detailed description of the methods.
Download Free Sample
Upon thorough analysis and research, the following factors has been identified as the critical augmented reality (ar) market challenges during the forecast period 2020-2024:
privacy concerns over AR technology
The augmented reality (ar) market report also provides several other key information including:
CAGR of the market during the forecast period 2020-2024
Detailed information on factors that will drive augmented reality (ar) market growth during the next five years
Precise estimation of the augmented reality (ar) market size and its contribution to the parent market
Accurate predictions on upcoming trends and changes in consumer behavior
The growth of the augmented reality (ar) market industry across APAC, Europe, MEA, North America, and South America
A thorough analysis of the market’s competitive landscape and detailed information on vendors
Comprehensive details of factors that will challenge the growth of augmented reality (ar) market vendors
This dataset was created by Rajnish Singh
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Akshat is new to the market and is unaware of the prices of multiple products. he wants to be assured of the price before making any purchase! Help Akshat out in predicting the prices for Products and derive some conclusive evidences!
In the dataset, there are products/images/dimensions/prices/ratings and much more for Feature Engineering.
This type of business problem is typical for any new products being launched into the market and sites like Trivago/Policy Bazaar compare same product over multiple sites to reach a conclusive rate. Can you achieve the same!?