8 datasets found

AMEX Competition

kaggle.com

Updated Nov 26, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Hotson Honet (2021). AMEX Competition [Dataset]. https://www.kaggle.com/hotsonhonet/amex-competition/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 26, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Hotson Honet

Description

About the Competition

American Express and HackerEarth present “AmExpert 2021 CODELAB – Machine Learning Hackathon”, an amazing opportunity to showcase your analytical abilities and talent!
Get a chance to be interviewed with American Express for analytical and machine learning roles and other exciting prizes!

Tasks

Credit card default risk is the chance that companies or Individuals will not be able to return the money lent on time.

You are given relevant information about the customers of a company.

You are required to build a machine learning model that can predict if there will be credit card default.

Dataset description

The dataset folder contains the following files:

train.csv: 45528 x 19
test.csv: 11383 x 18
sample_submission.csv: 5 x 2

The columns provided in the dataset are as follows:

Content

Column name	Description
customer_id	Represents the unique identification of a customer
name	Represents the name of a customer
age	Represents the age of a customer ( in years )
gender	Represents the gender of a customer( F means Female and M means Male )
owns_car	Represents whether a customer owns a car ( Y means Yes and N means No )
owns_house	Represents whether a customer owns a house ( Y means Yes and N means No )
no_of_children	Represents the number of children of a customer
net_yearly_income	Represents the net yearly income of a customer ( in USD )
no_of_days_employed	Represents the no of days employed
occupation_type	Represents the occupation type of a customer ( IT staff, Managers, Accountants, Cooking staff, etc )
total_family_members	Represents the number of family members of a customer
migrant_worker	Represents whether a customer is a migrant worker( 1 means Yes and 0 means No )
yearly_debt_payments	Represents the yearly debt payments of a customer ( in USD )
credit_limit	Represents the credit limit of a customer ( in USD )
credit_limit_used(%)	Represents the percentage of credit limit used by a customer
credit_score	Represents the credit score of a customer
prev_defaults	Represents the number of previous defaults
default_in_last_6months	Represents whether a customer has defaulted in the last 6 months ( 1 means Yes and 0 means No )
credit_card_default	Represents whether there will be credit card default ( 1 means Yes and 0 means No )

Evaluation metric

score = 100*(metrics.f1_score(actual, predicted, average= "macro" ))

Result submission guidelines

The index is "customer_id" and the target is the "credit_card_default" column.
The submission file must be submitted in .csv format only.
The size of this submission file must be 11383 x 2.

Note: Ensure that your submission file contains the following:

Correct index values as per the test file
Correct names of columns as provided in the sample_submission.csv file

Rules:

Entries submitted after the contest is closed, will not be considered.
You are expected to solve the problem on your own.
Multiple IDs of user leads to disqualification from the contest.
Use of external data is not allowed.
Participant must update their profile details and upload their latest CV.
Decision on the winners and runners-up made by American Express will be final and binding.
Throughout the hackathon, you are expected to respect fellow hackers and act with high integrity.
HackerEarth and American Express hold the right to disqualify any participant at any stage of the competition if the participant(s) are deemed to be acting fraudulently.
Existing American Express employees are not allowed to participate in the competition.
It is an individual participation Hackathon and not a team event.
Prizes will be shipped to India ...

Hacklive AV
kaggle.com
Updated Sep 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Gupta (2020). Hacklive AV [Dataset]. https://www.kaggle.com/datasets/akash14/hacklive-av/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akash Gupta
Description
Context

AV HackLive - Guided Community Hackathon!

Content

Data Science competitions can be daunting for someone who has never participated in one. Some of them have hundreds of competitors with top notch industry knowledge and splendid past record in such hackathons.

Thus a lot of beginners are apprehensive about getting started with these hackathons

The top 3 questions that are commonly asked:

Is it even worth it if I have minimal chance of winning? How do I start? How can I improve my rank in the future? Let’s answer the first question before we go further.
OEMC Hackathon 2023: Global FAPAR Modeling Dataset (including raster data)
zenodo.org
data.niaid.nih.gov
csv, png, sh, zip
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julia Hackländer; Julia Hackländer; Tomislav Hengl; Tomislav Hengl (2024). OEMC Hackathon 2023: Global FAPAR Modeling Dataset (including raster data) [Dataset]. http://doi.org/10.5281/zenodo.13852127
Explore at:
png, zip, csv, shAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13852127
Dataset updated
Sep 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julia Hackländer; Julia Hackländer; Tomislav Hengl; Tomislav Hengl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset organized by the Open-Earth-Monitor (OEMC) project within the context of Hackathon 2023.

The dataset contains monthly mean FAPAR values aggregated by each ground station. FAPAR represents the fraction of the incoming (photosynthetic active) radiation that is absorbed by vegetation, and is given in the range 0-1. It is a measure of vegetation health and ecosystem functioning, and a key parameter in light use efficiency models that model primary productivity.

For each monthly FAPAR value, a set of covariates / features were extracted from 32 raster spatial layers, including including satellite (spectral bands and indices) and temperature images (land surface temperature), climate images (precipitation) and digital terrain model (slope and elevation). The features are organized by columns, unique data points in time are identified by the sample_id column, and data points points belonging to the same location are identified by station_number.

Column names:

sample_id: unique identifier of datapoint

station: ground station number

fapar: monthly mean FAPAR

month: month of measurement

modis_{..}: NDVI, EVI, reflectance bands 1 (red), 2 (near-infrared), 3 (blue), and 7 (mid-infrared) based on MOD13Q1

modis_lst_day_p{..}: Land surface temperatures daytime of percentiles 5th, 50th and 95th based on MOD11A2

modis_lst_night_p{..}: Land surface temperatures nighttime of percentiles 5th, 50th and 95th based on MOD11A2

wv_yearly_p{..}: Water vapour aggregated yearly by percentiles 25th, 50th and 75th based on derived from MCD19A2

wv_monthly_lt_p{..}: Water vapour aggregated long-term monthly by percentiles 25th, 50th and 75th based on MCD19A2

wv_monthly_lt_sd: Water vapour aggregated long-term monthly standard deviation based on MCD19A2

wv_monthly_ts_raw: Water vapour monthly time series based on MCD19A2

wv_monthly_ts_smooth: Water vapour monthly time series smoothed using the Whittaker method based on MCD19A2

accum_pr_monthly: Monthly accumulated precipitation based on CHELSA timeseries

dtm_{..}: Several DTM derivatives (Elevation, Slope, aspect (sine, cosine), curvature (up- and downslope), openness (negative, positive), compound topographic index (cti), valley bottom flatness (vbf)) based on MERIT DEM

Files

train.csv: Training set with 3,461 rows and 36 columns, including sample id (sample_id - index column), ground station (station), reference month (month), measured FAPAR (fapar), and 32 features / covariates

test.csv: Test set with 4,939 rows and 34 columns, including sample id (sample_id - index column), ground station (station), reference month (month) and 32 features / covariates

sample_submission.csv: a sample submission file with 4,939 rows and 2 columns, including sample id (sample_id - index column) and measured FAPAR (fapar)

OEMC Hackathon 2023: EU Land Cover Classification Dataset

zenodo.org
data.niaid.nih.gov
+1more

application/gzip, csv +1

Updated Jul 11, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Leandro Parente; Leandro Parente; Martijn Witjes; Martijn Witjes; Hengl Tomislav; Hengl Tomislav (2024). OEMC Hackathon 2023: EU Land Cover Classification Dataset [Dataset]. http://doi.org/10.5281/zenodo.8306554

Explore at:

application/gzip, png, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.8306554

Dataset updated

Jul 11, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Leandro Parente; Leandro Parente; Martijn Witjes; Martijn Witjes; Hengl Tomislav; Hengl Tomislav

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset organized by the Open-Earth-Monitor (OEMC) project within the context of Hackathon 2023.

The dataset (both train and test) was produced by stratified sampling of the ground-truth data provided by LUCAS Survey, funded by the European Commission. The target land cover considered level-3 classes from the harmonized legend, resulting in 72 classes distributed over 5 years (2006, 2009, 2012, 2015, 2018):

All samples were overlaid with 416 raster spatial layers, including satellite (spectral bands and indices) and temperature images (land surface temperature), climate images (precipitation, air temperature), accessibility and distance maps (highways, water bodies, burned areas), digital terrain model (slope and elevation) and other existing maps (population count and snow covering). The result values were organized in columns, one for each spatial layers, which combined represent the feature space available for ML modeling.

Column names:

The columns are formed by six metadata fields separated by _:

Example: red_landsat.glad.ard_p50_30m_jun25_sep12
Metadata fields:
- F1 - Variable name: red
- F2 - Variable procedure including product name: landsat.glad.ard
- F3 - Position in the probability distribution: p50
- F4 - Spatial resolution: 30m
- F5 - Start date: jun25
- F6 - End date: sep12

Column description:

All the columns can be aggregated in six thematic groups according to F1 and F2:

Satellite images (spectral reflectance & vegetation indices):

blue_landsat.glad.ard_{..}: Quarterly time-series of Landsat blue band (Witjes et al., 2023)
blue_mod13q1_{..}: Monthly time-series of MOD13Q1 blue band (EarthData)
evi_mod13q1.stl.trend.ols.alpha_{..}: Alpha coefficient / intercept (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
evi_mod13q1.stl.trend.ols.beta_{..}: Beta coefficient / trend (derived by OLS) over the deseasonalized monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
evi_mod13q1.stl.trend_{..}: Deseasonalized monthly time-series (trend component of STL) for MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
evi_mod13q1_{..}: Monthly time-series of MOD13Q1 Enhanced Vegetation Index (EVI) index (EarthData)
green_landsat.glad.ard_{..}: Quarterly time-series of Landsat green band (Witjes et al., 2023)
mir_mod13q1_{..}: Monthly time-series of MOD13Q1 mid-infrared band (EarthData)
ndvi_mod13q1_{..}: Monthly time-series of MOD13Q1 normalized vegetation index (NDVI) (EarthData)
nir_landsat.glad.ard_{..}: Quarterly time-series of Landsat near-infrared band (Witjes et al., 2023)
nir_mod13q1_{..}: Monthly time-series of MOD13Q1 near-infrared band (EarthData)
red_landsat.glad.ard_{..}: Quarterly time-series of Landsat red band (Witjes et al., 2023)
red_mod13q1_{..}: Monthly time-series of MOD13Q1 red band (EarthData)
swir1_landsat.glad.ard_{..}: Quarterly time-series of Landsat short-wave infrared-1 band (Witjes et al., 2023)
swir2_landsat.glad.ard_{..}: Quarterly time-series of Landsat short-wave infrared-1 band (Witjes et al., 2023)

</li>
<li><strong>Temperature images:</strong>
<ul>
  <li><code>lst_mod11a2.daytime_{..}</code>: Monthly time-series of MOD13Q1 day time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.daytime.{month}_{..}</code>: Long-term monthly aggregation (2000—2022) for MOD13Q1 day time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.daytime.trend_{..}</code>: Deseasonalized monthly time-series (trend component of <a href="https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html#statsmodels.tsa.seasonal.STL">STL</a>) for MOD13Q1 day time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.daytime.trend.ols.alpha_{..}</code>: Alpha coefficient / intercept (derived by <a href="https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html">OLS</a>) over the deseasonalized monthly time-series of MOD13Q1 day time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.daytime.trend.ols.beta_{..}</code>: Beta coefficient / trend (derived by <a href="https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html">OLS</a>) over the deseasonalized monthly time-series of MOD13Q1 day time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.nighttime_{..}</code>: Monthly time-series of MOD13Q1 night time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.nighttime.{month}_{..}</code>: Long-term monthly aggregation (2000—2022) for MOD13Q1 day time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.nighttime.trend_{..}</code>: Deseasonalized monthly time-series (trend component of <a href="https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html#statsmodels.tsa.seasonal.STL">STL</a>) for MOD13Q1 night time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.nighttime.trend.ols.alpha_{..}</code>: Alpha coefficient / intercept (derived by <a href="https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html">OLS</a>) over the deseasonalized monthly time-series of MOD13Q1 night time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>lst_mod11a2.nighttime.trend.ols.beta_{..}</code>: Beta coefficient / trend (derived by <a href="https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html">OLS</a>) over the deseasonalized monthly time-series of MOD13Q1 night time land surface temperature (<a href="https://lpdaac.usgs.gov/products/mod11a2v006/">EarthData</a>)</li>
  <li><code>thermal_landsat.glad.ard_{..}</code>: Quarterly time-series of Landsat thermal band (<a href="https://doi.org/10.7717/peerj.15478">Witjes et al., 2023</a>)</li>
</ul>
</li>
<li><strong>Climate layers:</strong>
<ul>
  <li><code>accum.precipitation_chelsa.annual_{..}</code>: Accumulated precipitation over the entire year according to CHELSA timeseries in <code>mm</code> of water (<a href="https://doi.org/10.1038/sdata.2017.122">Karger et al., 2017</a>)</li>
  <li><code>accum.precipitation_chelsa.annual.3years.dif_{..}</code>: 3-years difference considering the yearly accumulated precipitation according to CHELSA timeseries in <code>mm</code> of water (<a href="https://doi.org/10.1038/sdata.2017.122">Karger et al., 2017</a>)</li>
  <li><code>accum.precipitation_chelsa.annual.log.csum_{..}</code>: Cumulative sum, in logarithmic space, consdering the yearly accumulated precipitation according to CHELSA timeseries (<a href="https://doi.org/10.1038/sdata.2017.122">Karger et al., 2017</a>)</li>
  <li><code>accum.precipitation_chelsa.montlhy_{..}</code>: Accumulated precipitation for each month according to CHELSA timeseries in <code>mm</code> of water (<a href="https://doi.org/10.1038/sdata.2017.122">Karger et al., 2017</a>)</li>
  <li><code>bioclim.var_chelsa.{variable_code}_{..}</code>: Bioclimatic variables derived variables from the monthly mean, max, mean temperature, and mean precipitation values. For <code>variable_code</code> descriptions see <a href="https://chelsa-climate.org/bioclim/">chelsa-climate.org</a> (<a href="https://doi.org/10.1038/sdata.2017.122">Karger et al.,

House Price Prediction Challenge
kaggle.com
zip
Updated Oct 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaswinder Singh (2020). House Price Prediction Challenge [Dataset]. https://www.kaggle.com/jassican/house-price-prediction-challenge-machine-hack
Explore at:
zip(2233190 bytes)Available download formats
Dataset updated
Oct 1, 2020
Authors
Jaswinder Singh
Description
Context

Overview Welcome to the House Price Prediction Challenge, you will test your regression skills by designing an algorithm to accurately predict the house prices in India. Accurately predicting house prices can be a daunting task. The buyers are just not concerned about the size(square feet) of the house and there are various other factors that play a key role to decide the price of a house/property. It can be extremely difficult to figure out the right set of attributes that are contributing to understanding the buyer's behavior as such. This dataset has been collected across various property aggregators across India. In this competition, provided the 12 influencing factors your role as a data scientist is to predict the prices as accurately as possible.

Also, in this competition, you will get a lot of room for feature engineering and mastering advanced regression techniques such as Random Forest, Deep Neural Nets, and various other ensembling techniques.

Content

Train.csv - 29451 rows x 12 columns Test.csv - 68720 rows x 11 columns Sample Submission - Acceptable submission format. (.csv/.xlsx file with 68720 rows)

Columns Description

POSTED_BY - Category marking who has listed the property UNDER_CONSTRUCTION - Under Construction or Not RERA - Rera approved or Not BHK_NO - Number of Rooms BHK_OR_RK - Type of property SQUARE_FT - Total area of the house in square feet READY_TO_MOVE - Category marking Ready to move or Not RESALE - Category marking Resale or not ADDRESS - Address of the property LONGITUDE - Longitude of the property LATITUDE - Latitude of the property

Evaluation

What is the Metric In this competition? How is the Leaderboard Calculated ?? The submission will be evaluated using the RMSLE (Root Mean Squared Logarithmic Error) metric. One can use np.sqrt(mean_squared_log_error( actual, predicted)) This hackathon supports private and public leaderboards The public leaderboard is evaluated on 30% of Test data The private leaderboard will be made available at the end of the hackathon which will be evaluated on 100% Test data

Acknowledgements

This is a data Shared by Machine Hack you can participate in Hackathon and submit your own submissions Link to Machine Hack, Hackathon- https://www.machinehack.com/hackathons/house_price_prediction_beat_the_benchmark/overview
Enhanced Pizza Sales Data (2024–2025)
kaggle.com
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
akshay gaikwad (2025). Enhanced Pizza Sales Data (2024–2025) [Dataset]. https://www.kaggle.com/datasets/akshaygaikwad448/pizza-delivery-data-with-enhanced-features/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
akshay gaikwad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a realistic and structured pizza sales dataset covering the time span from **2024 to 2025. ** Whether you're a beginner in data science, a student working on a machine learning project, or an experienced analyst looking to test out time series forecasting and dashboard building, this dataset is for you.

📁 What’s Inside? The dataset contains rich details from a pizza business including:

✅ Order Dates & Times ✅ Pizza Names & Categories (Veg, Non-Veg, Classic, Gourmet, etc.) ✅ Sizes (Small, Medium, Large, XL) ✅ Prices ✅ Order Quantities ✅ Customer Preferences & Trends

It is neatly organized in Excel format and easy to use with tools like Python (Pandas), Power BI, Excel, or Tableau.

💡** Why Use This Dataset?** This dataset is ideal for:

📈 Sales Analysis & Reporting 🧠 Machine Learning Models (demand forecasting, recommendations) 📅 Time Series Forecasting 📊 Data Visualization Projects 🍽️ Customer Behavior Analysis 🛒 Market Basket Analysis 📦 Inventory Management Simulations

🧠 Perfect For: Data Science Beginners & Learners BI Developers & Dashboard Designers MBA Students (Marketing, Retail, Operations) Hackathons & Case Study Competitions

pizza, sales data, excel dataset, retail analysis, data visualization, business intelligence, forecasting, time series, customer insights, machine learning, pandas, beginner friendly
AmExpert 2021 – Machine Learning Hackathon
kaggle.com
Updated Nov 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Sharma (2021). AmExpert 2021 – Machine Learning Hackathon [Dataset]. https://www.kaggle.com/adityasharma95/amexpert-2021-machine-learning-hackathon/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aditya Sharma
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Introduction:

American Express and Analytics Vidhya present “AmExpert 2021 – Machine Learning Hackathon”, an amazing opportunity to showcase your analytical abilities and talent!

Get a taste of the kind of challenges we face here at American Express on a day-to-day basis.

Acknowledgements:

https://datahack.analyticsvidhya.com/contest/amexpert-2021-machine-learning-hackathon/

Problem Statement:

XYZ Bank is a mid-sized private bank that includes a variety of banking products, such as savings accounts, current accounts, investment products, credit products, and home loans.

The Bank wants to predict the next set of products for a set of customers to optimize their marketing and communication campaigns.

The data available in this problem contains the following information: * User Demographic Details : Gender, Age, Vintage, Customer Category etc. * Current Product Holdings * Product Holding in Next 6 Months (only for Train dataset)

Here, our task is to predict the next set of products (upto 3 products) for a set of customers (Test data) based on their demographics and current product holdings.

Data Description:

Train csv -

Customer_ID - Unique ID for the customer

Gender - Gender of the Customer

Age - Age of the Customer (in Years)

Vintage - Vintage for the Customer (In Months)

Is_Active - Activity Index, 0 : Less frequent customer, 1 : More frequent customer

City_Category - Encoded Category of customer's city

Customer_Category - Encoded Category of the customer

Product_Holding_B1 - Current Product Holding (Encoded)

Product_Holding_B2 - Product Holding in next six months (Encoded) - Target Column

Test csv -

Customer_ID - Unique ID for the customer

Gender - Gender of the Customer

Age - Age of the Customer (in Years)

Vintage - Vintage for the Customer (In Months)

Is_Active - Activity Index, 0 : Less frequent customer, 1 : More frequent customer

City_Category - Encoded Category of customer's city

Customer_Category - Encoded Category of the customer

Product_Holding_B1 - Current Product Holding (Encoded)

Evaluation Metric:

The evaluation metric is Mean Average Precision (MAP) at K (K = 3). MAP is a well-known metric used to evaluate ranked retrieval results. E.g. Let’s say for a given customer, we recommended 3 products and only 1st and 3rd products are correct. So, the result would look like — 1, 0, 1

In this case, The precision at 1 will be: 1*1/1 = 1 The precision at 2 will be: 0*1/2 The precision at 3 will be: 1*2/3 = 0.67 Average Precision will be: (1 + 0 + 0.67)/3 = 0.556.
SIH 2024: PS with Winning Teams & Solutions
kaggle.com
Updated Aug 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adharshini Kumaresan (2025). SIH 2024: PS with Winning Teams & Solutions [Dataset]. https://www.kaggle.com/datasets/adharshinikumar/sih-2024-ps-with-winning-teams-and-solutions/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Adharshini Kumaresan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset compiles the official problem statements from Smart India Hackathon (SIH) 2024 along with the corresponding winning teams and their accepted solutions. By merging these two perspectives — what was asked, and what was delivered — this dataset provides a holistic view of India’s largest innovation challenge for students.

You’ll find:

✅ Problem titles and detailed descriptions ✅ Categories (Hardware/Software) ✅ Technology domains (e.g., MedTech, Sustainability, etc.) ✅ Winning team details — names, institutes, city/state ✅ Organizing departments/ministries and nodal centers

Whether you are a student preparing for future hackathons, a mentor guiding innovation challenges, or a policymaker interested in problem-solving trends, this dataset gives you a clear, data-backed lens on how real-world challenges are solved by India’s top student innovators.

💡 Use Cases Analyze which technology domains attracted the most winning solutions

Map regional innovation patterns (which states/institutes are winning most often)

Visualize which organizations posed the most impactful challenges

Build EDA or dashboards to track hackathon outcomes

Use as a reference to prepare for SIH 2025 or similar competitions

📄 License This dataset is released under CC BY 4.0. You’re free to use, modify, and share it with attribution.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hotson Honet (2021). AMEX Competition [Dataset]. https://www.kaggle.com/hotsonhonet/amex-competition/code

AMEX Competition

AmExpert 2021 CODELAB – Machine Learning Hackathon

Explore at:

3 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 26, 2021

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Hotson Honet

Description

About the Competition

American Express and HackerEarth present “AmExpert 2021 CODELAB – Machine Learning Hackathon”, an amazing opportunity to showcase your analytical abilities and talent!
Get a chance to be interviewed with American Express for analytical and machine learning roles and other exciting prizes!

Tasks

Credit card default risk is the chance that companies or Individuals will not be able to return the money lent on time.

You are given relevant information about the customers of a company.

You are required to build a machine learning model that can predict if there will be credit card default.

Dataset description

The dataset folder contains the following files:

train.csv: 45528 x 19
test.csv: 11383 x 18
sample_submission.csv: 5 x 2

The columns provided in the dataset are as follows:

Content

Column name	Description
customer_id	Represents the unique identification of a customer
name	Represents the name of a customer
age	Represents the age of a customer ( in years )
gender	Represents the gender of a customer( F means Female and M means Male )
owns_car	Represents whether a customer owns a car ( Y means Yes and N means No )
owns_house	Represents whether a customer owns a house ( Y means Yes and N means No )
no_of_children	Represents the number of children of a customer
net_yearly_income	Represents the net yearly income of a customer ( in USD )
no_of_days_employed	Represents the no of days employed
occupation_type	Represents the occupation type of a customer ( IT staff, Managers, Accountants, Cooking staff, etc )
total_family_members	Represents the number of family members of a customer
migrant_worker	Represents whether a customer is a migrant worker( 1 means Yes and 0 means No )
yearly_debt_payments	Represents the yearly debt payments of a customer ( in USD )
credit_limit	Represents the credit limit of a customer ( in USD )
credit_limit_used(%)	Represents the percentage of credit limit used by a customer
credit_score	Represents the credit score of a customer
prev_defaults	Represents the number of previous defaults
default_in_last_6months	Represents whether a customer has defaulted in the last 6 months ( 1 means Yes and 0 means No )
credit_card_default	Represents whether there will be credit card default ( 1 means Yes and 0 means No )

Evaluation metric

score = 100*(metrics.f1_score(actual, predicted, average= "macro" ))

Result submission guidelines

The index is "customer_id" and the target is the "credit_card_default" column.
The submission file must be submitted in .csv format only.
The size of this submission file must be 11383 x 2.

Note: Ensure that your submission file contains the following:

Correct index values as per the test file
Correct names of columns as provided in the sample_submission.csv file

Rules:

Entries submitted after the contest is closed, will not be considered.
You are expected to solve the problem on your own.
Multiple IDs of user leads to disqualification from the contest.
Use of external data is not allowed.
Participant must update their profile details and upload their latest CV.
Decision on the winners and runners-up made by American Express will be final and binding.
Throughout the hackathon, you are expected to respect fellow hackers and act with high integrity.
HackerEarth and American Express hold the right to disqualify any participant at any stage of the competition if the participant(s) are deemed to be acting fraudulently.
Existing American Express employees are not allowed to participate in the competition.
It is an individual participation Hackathon and not a team event.
Prizes will be shipped to India ...

Clear search

Close search

Google apps

Main menu

AMEX Competition

About the Competition

Tasks

Dataset description

Content

Evaluation metric

Result submission guidelines

Rules:

Hacklive AV

Context

Content

OEMC Hackathon 2023: Global FAPAR Modeling Dataset (including raster data)

OEMC Hackathon 2023: EU Land Cover Classification Dataset

House Price Prediction Challenge

Context

Content

Columns Description

Evaluation

Acknowledgements

Enhanced Pizza Sales Data (2024–2025)

AmExpert 2021 – Machine Learning Hackathon

Introduction:

Acknowledgements:

Problem Statement:

Data Description:

Train csv -

Test csv -

Evaluation Metric:

SIH 2024: PS with Winning Teams & Solutions

AMEX Competition

AmExpert 2021 CODELAB – Machine Learning Hackathon

About the Competition

Tasks

Dataset description

Content

Evaluation metric

Result submission guidelines

Rules: