8 datasets found
  1. AMEX Competition

    • kaggle.com
    Updated Nov 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hotson Honet (2021). AMEX Competition [Dataset]. https://www.kaggle.com/hotsonhonet/amex-competition/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hotson Honet
    Description

    About the Competition

    • American Express and HackerEarth present “AmExpert 2021 CODELAB – Machine Learning Hackathon”, an amazing opportunity to showcase your analytical abilities and talent!
    • Get a chance to be interviewed with American Express for analytical and machine learning roles and other exciting prizes!

    Tasks

    Credit card default risk is the chance that companies or Individuals will not be able to return the money lent on time.

    You are given relevant information about the customers of a company.

    You are required to build a machine learning model that can predict if there will be credit card default.

    Dataset description

    The dataset folder contains the following files:

    • train.csv: 45528 x 19
    • test.csv: 11383 x 18
    • sample_submission.csv: 5 x 2

    The columns provided in the dataset are as follows:

    Content

    Column name

    Description

    customer_idRepresents the unique identification of a customer
    nameRepresents the name of a customer
    ageRepresents the age of a customer ( in years )
    genderRepresents the gender of a customer( F means Female and M means Male )
    owns_carRepresents whether a customer owns a car ( Y means Yes and N means No )
    owns_houseRepresents whether a customer owns a house ( Y means Yes and N means No )
    no_of_childrenRepresents the number of children of a customer
    net_yearly_incomeRepresents the net yearly income of a customer ( in USD )
    no_of_days_employedRepresents the no of days employed
    occupation_typeRepresents the occupation type of a customer ( IT staff, Managers, Accountants, Cooking staff, etc )
    total_family_membersRepresents the number of family members of a customer
    migrant_workerRepresents whether a customer is a migrant worker( 1 means Yes and 0 means No )
    yearly_debt_paymentsRepresents the yearly debt payments of a customer ( in USD )
    credit_limitRepresents the credit limit of a customer ( in USD )
    credit_limit_used(%)Represents the percentage of credit limit used by a customer
    credit_scoreRepresents the credit score of a customer
    prev_defaultsRepresents the number of previous defaults
    default_in_last_6monthsRepresents whether a customer has defaulted in the last 6 months ( 1 means Yes and 0 means No )
    credit_card_defaultRepresents whether there will be credit card default ( 1 means Yes and 0 means No )

    Evaluation metric

    score = 100*(metrics.f1_score(actual, predicted, average= "macro" ))

    Result submission guidelines

    • The index is "customer_id" and the target is the "credit_card_default" column.
    • The submission file must be submitted in .csv format only.
    • The size of this submission file must be 11383 x 2.

    Note: Ensure that your submission file contains the following:

    • Correct index values as per the test file
    • Correct names of columns as provided in the sample_submission.csv file

    Rules:

    • Entries submitted after the contest is closed, will not be considered.
    • You are expected to solve the problem on your own.
    • Multiple IDs of user leads to disqualification from the contest.
    • Use of external data is not allowed.
    • Participant must update their profile details and upload their latest CV.
    • Decision on the winners and runners-up made by American Express will be final and binding.
    • Throughout the hackathon, you are expected to respect fellow hackers and act with high integrity.
    • HackerEarth and American Express hold the right to disqualify any participant at any stage of the competition if the participant(s) are deemed to be acting fraudulently.
    • Existing American Express employees are not allowed to participate in the competition.
    • It is an individual participation Hackathon and not a team event.
    • Prizes will be shipped to India ...
  2. Hacklive AV

    • kaggle.com
    Updated Sep 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Gupta (2020). Hacklive AV [Dataset]. https://www.kaggle.com/datasets/akash14/hacklive-av/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Akash Gupta
    Description

    Context

    AV HackLive - Guided Community Hackathon!

    Content

    Data Science competitions can be daunting for someone who has never participated in one. Some of them have hundreds of competitors with top notch industry knowledge and splendid past record in such hackathons.

    Thus a lot of beginners are apprehensive about getting started with these hackathons

    The top 3 questions that are commonly asked:

    Is it even worth it if I have minimal chance of winning? How do I start? How can I improve my rank in the future? Let’s answer the first question before we go further.

  3. OEMC Hackathon 2023: Global FAPAR Modeling Dataset (including raster data)

    • zenodo.org
    • data.niaid.nih.gov
    csv, png, sh, zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julia Hackländer; Julia Hackländer; Tomislav Hengl; Tomislav Hengl (2024). OEMC Hackathon 2023: Global FAPAR Modeling Dataset (including raster data) [Dataset]. http://doi.org/10.5281/zenodo.13852127
    Explore at:
    png, zip, csv, shAvailable download formats
    Dataset updated
    Sep 28, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julia Hackländer; Julia Hackländer; Tomislav Hengl; Tomislav Hengl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset organized by the Open-Earth-Monitor (OEMC) project within the context of Hackathon 2023.

    The dataset contains monthly mean FAPAR values aggregated by each ground station. FAPAR represents the fraction of the incoming (photosynthetic active) radiation that is absorbed by vegetation, and is given in the range 0-1. It is a measure of vegetation health and ecosystem functioning, and a key parameter in light use efficiency models that model primary productivity.

    For each monthly FAPAR value, a set of covariates / features were extracted from 32 raster spatial layers, including including satellite (spectral bands and indices) and temperature images (land surface temperature), climate images (precipitation) and digital terrain model (slope and elevation). The features are organized by columns, unique data points in time are identified by the sample_id column, and data points points belonging to the same location are identified by station_number.

    Column names:

    • sample_id: unique identifier of datapoint
    • station: ground station number
    • fapar: monthly mean FAPAR
    • month: month of measurement
    • modis_{..}: NDVI, EVI, reflectance bands 1 (red), 2 (near-infrared), 3 (blue), and 7 (mid-infrared) based on MOD13Q1
    • modis_lst_day_p{..}: Land surface temperatures daytime of percentiles 5th, 50th and 95th based on MOD11A2
    • modis_lst_night_p{..}: Land surface temperatures nighttime of percentiles 5th, 50th and 95th based on MOD11A2
    • wv_yearly_p{..}: Water vapour aggregated yearly by percentiles 25th, 50th and 75th based on derived from MCD19A2
    • wv_monthly_lt_p{..}: Water vapour aggregated long-term monthly by percentiles 25th, 50th and 75th based on MCD19A2
    • wv_monthly_lt_sd: Water vapour aggregated long-term monthly standard deviation based on MCD19A2
    • wv_monthly_ts_raw: Water vapour monthly time series based on MCD19A2
    • wv_monthly_ts_smooth: Water vapour monthly time series smoothed using the Whittaker method based on MCD19A2
    • accum_pr_monthly: Monthly accumulated precipitation based on CHELSA timeseries
    • dtm_{..}: Several DTM derivatives (Elevation, Slope, aspect (sine, cosine), curvature (up- and downslope), openness (negative, positive), compound topographic index (cti), valley bottom flatness (vbf)) based on MERIT DEM

    Files

    • train.csv: Training set with 3,461 rows and 36 columns, including sample id (sample_id - index column), ground station (station), reference month (month), measured FAPAR (fapar), and 32 features / covariates
    • test.csv: Test set with 4,939 rows and 34 columns, including sample id (sample_id - index column), ground station (station), reference month (month) and 32 features / covariates
    • sample_submission.csv: a sample submission file with 4,939 rows and 2 columns, including sample id (sample_id - index column) and measured FAPAR (fapar)
  4. OEMC Hackathon 2023: EU Land Cover Classification Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip, csv +1
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leandro Parente; Leandro Parente; Martijn Witjes; Martijn Witjes; Hengl Tomislav; Hengl Tomislav (2024). OEMC Hackathon 2023: EU Land Cover Classification Dataset [Dataset]. http://doi.org/10.5281/zenodo.8306554
    Explore at:
    application/gzip, png, csvAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Leandro Parente; Leandro Parente; Martijn Witjes; Martijn Witjes; Hengl Tomislav; Hengl Tomislav
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset organized by the Open-Earth-Monitor (OEMC) project within the context of Hackathon 2023.

    The dataset (both train and test) was produced by stratified sampling of the ground-truth data provided by LUCAS Survey, funded by the European Commission. The target land cover considered level-3 classes from the harmonized legend, resulting in 72 classes distributed over 5 years (2006, 2009, 2012, 2015, 2018):

    All samples were overlaid with 416 raster spatial layers, including satellite (spectral bands and indices) and temperature images (land surface temperature), climate images (precipitation, air temperature), accessibility and distance maps (highways, water bodies, burned areas), digital terrain model (slope and elevation) and other existing maps (population count and snow covering). The result values were organized in columns, one for each spatial layers, which combined represent the feature space available for ML modeling.

    Column names:

    The columns are formed by six metadata fields separated by _:

    • Example: red_landsat.glad.ard_p50_30m_jun25_sep12
    • Metadata fields:
      • F1 - Variable name: red
      • F2 - Variable procedure including product name: landsat.glad.ard
      • F3 - Position in the probability distribution: p50
      • F4 - Spatial resolution: 30m
      • F5 - Start date: jun25
      • F6 - End date: sep12

    Column description:

    All the columns can be aggregated in six thematic groups according to F1 and F2:

  5. House Price Prediction Challenge

    • kaggle.com
    zip
    Updated Oct 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaswinder Singh (2020). House Price Prediction Challenge [Dataset]. https://www.kaggle.com/jassican/house-price-prediction-challenge-machine-hack
    Explore at:
    zip(2233190 bytes)Available download formats
    Dataset updated
    Oct 1, 2020
    Authors
    Jaswinder Singh
    Description

    Context

    Overview Welcome to the House Price Prediction Challenge, you will test your regression skills by designing an algorithm to accurately predict the house prices in India. Accurately predicting house prices can be a daunting task. The buyers are just not concerned about the size(square feet) of the house and there are various other factors that play a key role to decide the price of a house/property. It can be extremely difficult to figure out the right set of attributes that are contributing to understanding the buyer's behavior as such. This dataset has been collected across various property aggregators across India. In this competition, provided the 12 influencing factors your role as a data scientist is to predict the prices as accurately as possible.

    Also, in this competition, you will get a lot of room for feature engineering and mastering advanced regression techniques such as Random Forest, Deep Neural Nets, and various other ensembling techniques.

    Content

    Train.csv - 29451 rows x 12 columns Test.csv - 68720 rows x 11 columns Sample Submission - Acceptable submission format. (.csv/.xlsx file with 68720 rows)

    Columns Description

    POSTED_BY - Category marking who has listed the property UNDER_CONSTRUCTION - Under Construction or Not RERA - Rera approved or Not BHK_NO - Number of Rooms BHK_OR_RK - Type of property SQUARE_FT - Total area of the house in square feet READY_TO_MOVE - Category marking Ready to move or Not RESALE - Category marking Resale or not ADDRESS - Address of the property LONGITUDE - Longitude of the property LATITUDE - Latitude of the property

    Evaluation

    What is the Metric In this competition? How is the Leaderboard Calculated ?? The submission will be evaluated using the RMSLE (Root Mean Squared Logarithmic Error) metric. One can use np.sqrt(mean_squared_log_error( actual, predicted)) This hackathon supports private and public leaderboards The public leaderboard is evaluated on 30% of Test data The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data

    Acknowledgements

    This is a data Shared by Machine Hack you can participate in Hackathon and submit your own submissions Link to Machine Hack, Hackathon- https://www.machinehack.com/hackathons/house_price_prediction_beat_the_benchmark/overview

  6. Enhanced Pizza Sales Data (2024–2025)

    • kaggle.com
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    akshay gaikwad (2025). Enhanced Pizza Sales Data (2024–2025) [Dataset]. https://www.kaggle.com/datasets/akshaygaikwad448/pizza-delivery-data-with-enhanced-features/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    akshay gaikwad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is a realistic and structured pizza sales dataset covering the time span from **2024 to 2025. ** Whether you're a beginner in data science, a student working on a machine learning project, or an experienced analyst looking to test out time series forecasting and dashboard building, this dataset is for you.

    📁 What’s Inside? The dataset contains rich details from a pizza business including:

    ✅ Order Dates & Times ✅ Pizza Names & Categories (Veg, Non-Veg, Classic, Gourmet, etc.) ✅ Sizes (Small, Medium, Large, XL) ✅ Prices ✅ Order Quantities ✅ Customer Preferences & Trends

    It is neatly organized in Excel format and easy to use with tools like Python (Pandas), Power BI, Excel, or Tableau.

    💡** Why Use This Dataset?** This dataset is ideal for:

    📈 Sales Analysis & Reporting 🧠 Machine Learning Models (demand forecasting, recommendations) 📅 Time Series Forecasting 📊 Data Visualization Projects 🍽️ Customer Behavior Analysis 🛒 Market Basket Analysis 📦 Inventory Management Simulations

    🧠 Perfect For: Data Science Beginners & Learners BI Developers & Dashboard Designers MBA Students (Marketing, Retail, Operations) Hackathons & Case Study Competitions

    pizza, sales data, excel dataset, retail analysis, data visualization, business intelligence, forecasting, time series, customer insights, machine learning, pandas, beginner friendly

  7. AmExpert 2021 – Machine Learning Hackathon

    • kaggle.com
    Updated Nov 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Sharma (2021). AmExpert 2021 – Machine Learning Hackathon [Dataset]. https://www.kaggle.com/adityasharma95/amexpert-2021-machine-learning-hackathon/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Aditya Sharma
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction:

    American Express and Analytics Vidhya present “AmExpert 2021 – Machine Learning Hackathon”, an amazing opportunity to showcase your analytical abilities and talent!

    Get a taste of the kind of challenges we face here at American Express on a day-to-day basis.

    Acknowledgements:

    https://datahack.analyticsvidhya.com/contest/amexpert-2021-machine-learning-hackathon/

    Problem Statement:

    XYZ Bank is a mid-sized private bank that includes a variety of banking products, such as savings accounts, current accounts, investment products, credit products, and home loans.

    The Bank wants to predict the next set of products for a set of customers to optimize their marketing and communication campaigns.

    The data available in this problem contains the following information: * User Demographic Details : Gender, Age, Vintage, Customer Category etc. * Current Product Holdings * Product Holding in Next 6 Months (only for Train dataset)

    Here, our task is to predict the next set of products (upto 3 products) for a set of customers (Test data) based on their demographics and current product holdings.

    Data Description:

    Train csv -

    Customer_ID - Unique ID for the customer

    Gender - Gender of the Customer

    Age - Age of the Customer (in Years)

    Vintage - Vintage for the Customer (In Months)

    Is_Active - Activity Index, 0 : Less frequent customer, 1 : More frequent customer

    City_Category - Encoded Category of customer's city

    Customer_Category - Encoded Category of the customer

    Product_Holding_B1 - Current Product Holding (Encoded)

    Product_Holding_B2 - Product Holding in next six months (Encoded) - Target Column

    Test csv -

    Customer_ID - Unique ID for the customer

    Gender - Gender of the Customer

    Age - Age of the Customer (in Years)

    Vintage - Vintage for the Customer (In Months)

    Is_Active - Activity Index, 0 : Less frequent customer, 1 : More frequent customer

    City_Category - Encoded Category of customer's city

    Customer_Category - Encoded Category of the customer

    Product_Holding_B1 - Current Product Holding (Encoded)

    Evaluation Metric:

    The evaluation metric is Mean Average Precision (MAP) at K (K = 3). MAP is a well-known metric used to evaluate ranked retrieval results. E.g. Let’s say for a given customer, we recommended 3 products and only 1st and 3rd products are correct. So, the result would look like — 1, 0, 1

    In this case, The precision at 1 will be: 1*1/1 = 1 The precision at 2 will be: 0*1/2 The precision at 3 will be: 1*2/3 = 0.67 Average Precision will be: (1 + 0 + 0.67)/3 = 0.556.

  8. SIH 2024: PS with Winning Teams & Solutions

    • kaggle.com
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adharshini Kumaresan (2025). SIH 2024: PS with Winning Teams & Solutions [Dataset]. https://www.kaggle.com/datasets/adharshinikumar/sih-2024-ps-with-winning-teams-and-solutions/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Adharshini Kumaresan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset compiles the official problem statements from Smart India Hackathon (SIH) 2024 along with the corresponding winning teams and their accepted solutions. By merging these two perspectives — what was asked, and what was delivered — this dataset provides a holistic view of India’s largest innovation challenge for students.

    You’ll find:

    ✅ Problem titles and detailed descriptions ✅ Categories (Hardware/Software) ✅ Technology domains (e.g., MedTech, Sustainability, etc.) ✅ Winning team details — names, institutes, city/state ✅ Organizing departments/ministries and nodal centers

    Whether you are a student preparing for future hackathons, a mentor guiding innovation challenges, or a policymaker interested in problem-solving trends, this dataset gives you a clear, data-backed lens on how real-world challenges are solved by India’s top student innovators.

    💡 Use Cases Analyze which technology domains attracted the most winning solutions

    Map regional innovation patterns (which states/institutes are winning most often)

    Visualize which organizations posed the most impactful challenges

    Build EDA or dashboards to track hackathon outcomes

    Use as a reference to prepare for SIH 2025 or similar competitions

    📄 License This dataset is released under CC BY 4.0. You’re free to use, modify, and share it with attribution.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hotson Honet (2021). AMEX Competition [Dataset]. https://www.kaggle.com/hotsonhonet/amex-competition/code
Organization logo

AMEX Competition

AmExpert 2021 CODELAB – Machine Learning Hackathon

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hotson Honet
Description

About the Competition

  • American Express and HackerEarth present “AmExpert 2021 CODELAB – Machine Learning Hackathon”, an amazing opportunity to showcase your analytical abilities and talent!
  • Get a chance to be interviewed with American Express for analytical and machine learning roles and other exciting prizes!

Tasks

Credit card default risk is the chance that companies or Individuals will not be able to return the money lent on time.

You are given relevant information about the customers of a company.

You are required to build a machine learning model that can predict if there will be credit card default.

Dataset description

The dataset folder contains the following files:

  • train.csv: 45528 x 19
  • test.csv: 11383 x 18
  • sample_submission.csv: 5 x 2

The columns provided in the dataset are as follows:

Content

Column name

Description

customer_idRepresents the unique identification of a customer
nameRepresents the name of a customer
ageRepresents the age of a customer ( in years )
genderRepresents the gender of a customer( F means Female and M means Male )
owns_carRepresents whether a customer owns a car ( Y means Yes and N means No )
owns_houseRepresents whether a customer owns a house ( Y means Yes and N means No )
no_of_childrenRepresents the number of children of a customer
net_yearly_incomeRepresents the net yearly income of a customer ( in USD )
no_of_days_employedRepresents the no of days employed
occupation_typeRepresents the occupation type of a customer ( IT staff, Managers, Accountants, Cooking staff, etc )
total_family_membersRepresents the number of family members of a customer
migrant_workerRepresents whether a customer is a migrant worker( 1 means Yes and 0 means No )
yearly_debt_paymentsRepresents the yearly debt payments of a customer ( in USD )
credit_limitRepresents the credit limit of a customer ( in USD )
credit_limit_used(%)Represents the percentage of credit limit used by a customer
credit_scoreRepresents the credit score of a customer
prev_defaultsRepresents the number of previous defaults
default_in_last_6monthsRepresents whether a customer has defaulted in the last 6 months ( 1 means Yes and 0 means No )
credit_card_defaultRepresents whether there will be credit card default ( 1 means Yes and 0 means No )

Evaluation metric

score = 100*(metrics.f1_score(actual, predicted, average= "macro" ))

Result submission guidelines

  • The index is "customer_id" and the target is the "credit_card_default" column.
  • The submission file must be submitted in .csv format only.
  • The size of this submission file must be 11383 x 2.

Note: Ensure that your submission file contains the following:

  • Correct index values as per the test file
  • Correct names of columns as provided in the sample_submission.csv file

Rules:

  • Entries submitted after the contest is closed, will not be considered.
  • You are expected to solve the problem on your own.
  • Multiple IDs of user leads to disqualification from the contest.
  • Use of external data is not allowed.
  • Participant must update their profile details and upload their latest CV.
  • Decision on the winners and runners-up made by American Express will be final and binding.
  • Throughout the hackathon, you are expected to respect fellow hackers and act with high integrity.
  • HackerEarth and American Express hold the right to disqualify any participant at any stage of the competition if the participant(s) are deemed to be acting fraudulently.
  • Existing American Express employees are not allowed to participate in the competition.
  • It is an individual participation Hackathon and not a team event.
  • Prizes will be shipped to India ...
Search
Clear search
Close search
Google apps
Main menu