41 datasets found

PROHACK Hackathon
kaggle.com
zip
Updated Jun 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andriy Samoshyn (2020). PROHACK Hackathon [Dataset]. https://www.kaggle.com/mrmorj/prohack-hackathon
Explore at:
zip(1563989 bytes)Available download formats
Dataset updated
Jun 19, 2020
Authors
Andriy Samoshyn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
FAQ

The solutions are evaluated on two criteria: predicted future Index values and allocated energy from a newly discovered star 1. Index predictions are evaluated using RMSE metric 2. Energy allocation is also evaluated using RMSE metric and has a set of known factors that need to be taken into account

Every galaxy has a certain limited potential for improvement in the index described by the following function:

Potential for increase in the Index = -np.log(Index+0.01)+3

Likely index increase dependent on potential for improvement and on extra energy availability is described by the following function:

Likely increase in the Index = extra energy * Potential for increase in the Index **2 / 1000

Constraints

in total there are 50000 zillion DSML available for allocation no galaxy should be allocated more than 100 zillion DSML or less than 0 zillion DSML galaxies with low existence expectancy index below 0.7 should be allocated at least 10% of the total energy available

Submit format

Variable Description
Index Unique index from the test dataset in the ascending order
pred Prediction for the index of interest
pred_opt Optimal energy allocation
Hackathon Participants Data
kaggle.com
Updated Jun 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanshu Sethi (2023). Hackathon Participants Data [Dataset]. https://www.kaggle.com/datasets/priyanshusethi/high-school-hackathon-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Priyanshu Sethi
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Hackathons are a great way for people to not only learn more about technology but also showcase their existing skills by making projects often in a few hours. This dataset contains data collected from 200 participants of a hackathon conducted for high school students. A lot of columns have been deleted but the remaining columns can be useful to understand the demographic and interests of someone participating in these kind of events.
ML HACK Dataset
kaggle.com
Updated Nov 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinav Padmawar (2020). ML HACK Dataset [Dataset]. https://www.kaggle.com/abhinavpadmawar20/ml-hack-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abhinav Padmawar
Description
Given below are three files that you will be using for the challenge. Download all the files. The training file has a labelled data set. However, the test file shall only have the features. Work out your algorithm for the same and make predictions on the test file after which you have to create a submissions.csv file that will be evaluated. You may refer to the sample_submission.csv file in order to understand the overall structure of your submission. The dataset consists of overall stats of players in ODIs only.

File descriptions:

train.csv - the training set test.csv - the test set sampleSubmission.csv - a sample submission file in the correct format Data fields id - an anonymous id unique to the player Name - Name of the player. Age - Age 100s - Number of centuries of the player 50s - Number of half centuries of the player 6s - Total number of sixes hit by the player Balls - Number of balls bowled by the player Bat_Average - Average batting score Bowl_Strike_Rate - average number of balls bowled per wicket taken Balls faced - Number of balls faced Economy - average number of runs conceded for each over bowled. Innings - Number of innings played Overs/strong> - Number of overs bowled Maidens - Overs when no run was conceded Runs - Total runs scored by the player Wickets - Number of wickets taken Ratings - Final rating of the player
Care to Share: Dataset and resources for Dutch National Open Science...
zenodo.org
bin, pdf
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren Cadwallader; Lauren Cadwallader; Mirela Volaj; Mirela Volaj (2024). Care to Share: Dataset and resources for Dutch National Open Science Festival hackathon [Dataset]. http://doi.org/10.5281/zenodo.13960085
Explore at:
pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13960085
Dataset updated
Oct 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lauren Cadwallader; Lauren Cadwallader; Mirela Volaj; Mirela Volaj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the materials used in the session "Care to Share? Investigating Open Science practices adoption among researchers: a hackathon" presented at the Dutch National Open Science Festival on 22nd October 2024.

The data files are derived from: Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686 ad contains two additional fields (Dimensions_Country and Dimensions_FoR) from Dimensions obtained on 15 October 2024, from Digital Science’s Dimensions platform, available at https://app.dimensions.ai.

File list:

PLOS-Dataset-for-Hackathon.xlsx

Data pertaining to the PLOS corpus of articles derived from Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686 with additional data from Dimensions.ai.

Comparator-Dataset-for-Hackathon.xlsx

Data pertaining to the Comparator corpus of articles derived from Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686 with additional data from Dimensions.ai.

Care to share resource sheet.pdf

Document outlining the questions to be investigated during the hackathon as well as key information about the dataset.

OSI-Column-Descriptions_v3_Dec23.pdf
This file is taken from Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686. It describes the fields used in the two data files with the exception of Dimensions_Country and Dimensions_FoR. Descriptions for these are listed in the README tabs of the data files.
ESA' Mars Express orbiter telemetry data
kaggle.com
Updated May 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2017). ESA' Mars Express orbiter telemetry data [Dataset]. https://www.kaggle.com/datasets/fornaxai/dataadventures/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 27, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Description
Courtesy of the European Space Agency.

License: ESA CC BY-SA 3.0 IGO
R
MOOD - News AMR dataset - Hackathon 2022
entrepot.recherche.data.gouv.fr
pdf, tsv
Updated Nov 16, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ARINIK Nejat; ARINIK Nejat; Van BORTEL Wim; Van BORTEL Wim; BOUDOUA Bahdja; BUSANI Luca; BUSANI Luca; DECOUPES Rémy; DECOUPES Rémy; INTERDONATO Roberto; INTERDONATO Roberto; Van KLEEF Ester; Van KLEEF Ester; KAFANDO Rodrique; KAFANDO Rodrique; ROCHE Mathieu; ROCHE Mathieu; SYED Mehtab Alam; SYED Mehtab Alam; Maguelonne TEISSEIRE; Maguelonne TEISSEIRE; BOUDOUA Bahdja (2022). MOOD - News AMR dataset - Hackathon 2022 [Dataset]. http://doi.org/10.57745/MPNSPH
Explore at:
tsv(28823), tsv(29272), tsv(89646), tsv(30011), tsv(1036597), tsv(1034657), tsv(30633), tsv(85698), pdf(136642)Available download formats
Unique identifier
https://doi.org/10.57745/MPNSPH
Dataset updated
Nov 16, 2022
Dataset provided by
Recherche Data Gouv
Authors
ARINIK Nejat; ARINIK Nejat; Van BORTEL Wim; Van BORTEL Wim; BOUDOUA Bahdja; BUSANI Luca; BUSANI Luca; DECOUPES Rémy; DECOUPES Rémy; INTERDONATO Roberto; INTERDONATO Roberto; Van KLEEF Ester; Van KLEEF Ester; KAFANDO Rodrique; KAFANDO Rodrique; ROCHE Mathieu; ROCHE Mathieu; SYED Mehtab Alam; SYED Mehtab Alam; Maguelonne TEISSEIRE; Maguelonne TEISSEIRE; BOUDOUA Bahdja
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
This dataset has been collected from four Epidemiological Surveillance Systems (EBS) to be used in an hackathon dedicated to AMR (antimicrobial resistance) for the MOOD summer school in June 2022. The choosen EBS sources are ProMED, PADI-web, Healthmap and MedISys. The collected data are news dealing with epidemiological information or event. This dataset is composed of 4 sub-datasets for each chosen EBS. Each sub-dataset is annotated according to 3 main classes (New Information, General Information, Not Relevant). For each news labeled as New Information or General Information, another annotation is provided concerning host classification with 7 classes (Humans, Human-animal, Animals, Human-food, Food, Environment, and All). This second annotation provided 4 sub-datasets. The aim of the annotation task is to recognize epidemiological information related to AMR. An annotation guideline is provided in order to ensure an unified annotation and to help the annotators. This dataset can be used to train or evaluate classification approaches to automatically identify text on AMR events and types of AMR issues (e.g. animal, food, etc.) in unstructured data (e.g. news, tweets) and classify these events by relevance for epidemic intelligence purposes.
c
AV : Healthcare Analytics II Dataset
cubig.ai
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). AV : Healthcare Analytics II Dataset [Dataset]. https://cubig.ai/store/products/184/av-healthcare-analytics-ii-dataset
Explore at:
Dataset updated
May 2, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • Hospital length of stay dataset is part of a hackathon organized by Analytics Vidhya, focusing on healthcare management challenges, particularly in optimizing hospital patient length of stay. This dataset includes detailed information on patient demographics, hospital attributes, and treatment details, which are critical for managing healthcare efficiency.

2) Data Utilization (1) Hospital length of stay data has characteristics that: • The dataset is structured to provide insights into various factors that affect the length of hospital stays. It contains data on numerous variables including patient age, medical conditions, previous admissions, and the type of hospital and care involved. • It supports predictive modeling to help hospitals improve service delivery by accurately forecasting patient stay durations and managing hospital bed occupancy and staffing needs more effectively. (2) Hospital length of stay data can be used to: • Hospital Management: The data can assist in strategic planning and resource allocation, helping hospitals reduce costs while maintaining high care standards. • Research in Healthcare Systems: It serves as a foundational dataset for academic and commercial research aimed at understanding and improving healthcare systems efficiency.
Hacklive AV
kaggle.com
Updated Sep 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akash Gupta (2020). Hacklive AV [Dataset]. https://www.kaggle.com/datasets/akash14/hacklive-av/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akash Gupta
Description
Context

AV HackLive - Guided Community Hackathon!

Content

Data Science competitions can be daunting for someone who has never participated in one. Some of them have hundreds of competitors with top notch industry knowledge and splendid past record in such hackathons.

Thus a lot of beginners are apprehensive about getting started with these hackathons

The top 3 questions that are commonly asked:

Is it even worth it if I have minimal chance of winning? How do I start? How can I improve my rank in the future? Let’s answer the first question before we go further.
Seer Breast Cancer Data
zenodo.org
ieee-dataport.org
+2more
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhandos Sembay; Zhandos Sembay (2021). Seer Breast Cancer Data [Dataset]. http://doi.org/10.5281/zenodo.5120960
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5120960
Dataset updated
Jul 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhandos Sembay; Zhandos Sembay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract:

This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program of the NCI, which provides information on population-based cancer statistics. The dataset involved female patients with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3) diagnosed in 2006-2010. Patients with unknown tumor size, examined regional LNs, regional positive LNs, and patients whose survival months were less than 1 month were excluded; thus, 4024 patients were ultimately included.

Inspiration:

This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

https://cancer.ubrite.org/hackathon-2021/

Acknowledgements

JING TENG, January 18, 2019, "SEER Breast Cancer Data", IEEE Dataport, doi: https://dx.doi.org/10.21227/a9qy-ph35.

https://ieee-dataport.org/open-access/seer-breast-cancer-data

U-BRITE last update date: 07/21/2021
A
‘Electricity Consumption’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Electricity Consumption’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-electricity-consumption-4b9e/fdf80460/?iid=007-581&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Electricity Consumption’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/utathya/electricity-consumption on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Company of Electrolysia supplies electricity to the city. It is looking to optimise its electricity production based on the historical electricity consumption of the people of Electrovania.

The company has hired you as a Data Scientist to investigate the past consumption and the weather information to come up with a model that catches the trend as accurate as possible. You have to bear in mind that there are many factors that affect electricity consumption and not all can be measured. Electrolysia has provided you this data on hourly data spanning five years.

For this competition, the training set is comprised of the first 23 days of each month and the test set is the 24th to the end of the month, where the public leaderboard is based on the first two days of test, whereas the private leaderboard considers the rest of the days. Your task is to predict the electricity consumption on hourly basis.

Note that you cannot use future information to model past consumption. For example, you cannot use February 2017 data to predict last week of January 2017 information.

Content

It represents a fictitious time period wherein we are to predict future electricity consumption.

Acknowledgements

This data is from Analytics Vidya hackathon. The hackathon is closed now.

--- Original source retains full ownership of the source dataset ---
Mammograms-Breast Cancer Images
zenodo.org
ieee-dataport.org
+1more
zip
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhandos Sembay; Zhandos Sembay (2021). Mammograms-Breast Cancer Images [Dataset]. http://doi.org/10.5281/zenodo.5120965
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5120965
Dataset updated
Jul 22, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zhandos Sembay; Zhandos Sembay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

This is a small dataset as a part of huge dataset of breast cancer images. The images are mammograms.

Instructions:

One can use these images for experimentation on detection and analysis of breast cancer.

Inspiration:

This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

https://cancer.ubrite.org/hackathon-2021/

Acknowledgements

G R Sinha, Bhagwati Charan Patel, December 27, 2019, "Mammograms-Breast Cancer Images", IEEE Dataport, doi: https://dx.doi.org/10.21227/9f0p-qx37.

https://ieee-dataport.org/documents/mammograms-breast-cancer-images

U-BRITE last update date: 07/21/2021
A
‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2
analyst-2.ai
Updated Dec 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-uhack-sentiments-2-0-decode-code-words-ce3a/88e2b3fd/?iid=004-204&v=presentation
Explore at:
Dataset updated
Dec 28, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘uHack Sentiments 2.0: Decode Code Words’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/manishtripathi86/uhack-sentiments-20-decode-code-words on 28 January 2022.

--- Dataset description provided by original source is as follows ---

The challenge here is to analyze and deep dive into the natural language text (reviews) and bucket them based on their topics of discussion. Furthermore, analyzing the overall sentiment will also help the business to make tangible decisions.

The data set provided to you has a mix of customer reviews for products across categories and retailers. We would like you to model on the data

to bucket the future reviews in their respective topics (Note: A review can talk about multiple topics)

Overall polarity (positive/negative sentiment)

Train: 6136 rows x 14 columns

Test: 2631 rows x 14 columns

Topics (Components, Delivery and Customer Support, Design and Aesthetics, Dimensions, Features, Functionality, Installation, Material, Price, Quality and Usability) Polarity (Positive/Negative) Note: The target variables are all encoded in the train dataset for convenience. Please submit the test results in the similar encoded fashion for us to evaluate your results.

| | Field Name Data Type Purpose Variable type Id Integer Unique identifier for each review Input Review String Review written by customers on a retail website Input Components String 1: aspects related to components Target 0: None Delivery and Customer Support String 1: some aspects related to delivery, return, exchange and customer support Target 0: None Design and Aesthetics String 1: some aspects related to components Target 0: None Dimensions String 1: related to product dimension and size Target 0: None Features String 1: related to product features Target 0 : None
Functionality String 1: related to working of a product Target 0: None Installation String 1: related to installation of the product Target 0: None Material String 1: related to material of the product Target 0: None Price String 1: related to pricing details of a product Target 0: None Quality String 1: related to quality aspects of a product Target 0: None Usability String 1: related to usability of a product Target 0: None Polarity Integer 1: Positive sentiment; Target 0: Negative Sentiment | | | --- | --- | | | | | | | --- | --- | | | |

Skills: Text Pre-processing – Lemmatization , Tokenization, N-Grams and other relevant methods Multi-Class Classification, Multi-label Classification Optimizing Log Loss

Overview Ugam, a Merkle company, is a leading analytics and technology services company. Our customer-centric approach delivers impactful business results for large corporations by leveraging data, technology, and expertise.

We consistently deliver superior, impactful results through the right blend of human intelligence and AI. With 3300+ people spread across locations worldwide, we successfully deploy our services to create success stories across industries like Retail & Consumer Brands, High Tech, BFSI, Distribution, and Market Research & Consulting. Over the past 21 years, Ugam has been recognized by several firms including Forrester and Gartner, named the No.1 data science company in India by Analytics Insight, and certified as a Great Place to Work®.

Problem Statement: The last two decades have witnessed a significant change in how consumers purchase products and express their experience/opinions in reviews, posts, and content across platforms. These online reviews are not only useful to reflect customers’ sentiment towards a product but also help businesses fix gaps and find potential opportunities which could further influence future purchases.

Participants need develop a machine learning model that can analyse customers’ sentiments based on their reviews and feedback.

NOTE: The prize money will be for the interested candidates who are willing to get interviewed or hired by Ugam. Winner are requested to come to the Machine Leaning Developers Summit2022, happening at Bangalore, for receiving the prize money.

dataset link: https://machinehack.com/hackathon/uhack_sentiments_20_decode_code_words/overview

--- Original source retains full ownership of the source dataset ---
h
ordfts-hackathon-pneuma-vehicles-segmentation
huggingface.co
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlos Vivar (2024). ordfts-hackathon-pneuma-vehicles-segmentation [Dataset]. http://doi.org/10.57967/hf/3028
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/3028
Dataset updated
Sep 9, 2024
Authors
Carlos Vivar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ORD for the Sciences Hackathon - Vehicles Detection

[!CAUTION] This project is an example of a hackathon project. The quality of the data produced has not been evaluated. Its goal is to provide an example on how a dataset can be update to Hugginface.

This is an example of a hackathon project presented to ORD for the sciences hackathon using the openly available pNeuma vision dataset.

Go here if you wanna know more about the hackathon EPFL pNeuma project… See the full description on the dataset page: https://huggingface.co/datasets/katospiegel/ordfts-hackathon-pneuma-vehicles-segmentation.
BIGTARGET hackathon
kaggle.com
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrii Samoshyn (2020). BIGTARGET hackathon [Dataset]. https://www.kaggle.com/mrmorj/bigtarget/kernels
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 1, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andrii Samoshyn
Description
Participants needed to develop a solution that improves the effectiveness of SMS targeting in such a way that they only send messages to customers who are motivated to make a purchase.
Z
Benign Breast Tumor Dataset
data.niaid.nih.gov
ieee-dataport.org
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhandos Sembay (2024). Benign Breast Tumor Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5084115
Explore at:
Dataset updated
Jul 18, 2024
Dataset authored and provided by
Zhandos Sembay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

This dataset has information of 83 patients from India. This dataset contains patients’ clinical history, histopathological features, and mammogram. The distinctive aspect of this dataset lies in its collection of mammograms that have benign tumors and used in subclassification of benign tumors.

Instructions:

This datasest contains a zip folder of 80 mammograms and an excel file having mammographic features, histopathological features as well as clinical fatures of all the patients.

Inspiration:

This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

https://cancer.ubrite.org/hackathon-2021/

Acknowledgements

Manish Joshi, Aparna Bhale, Unmesh Takalkar, May 9, 2021, "Benign Breast Tumor Dataset", IEEE Dataport, doi: https://dx.doi.org/10.21227/6sda-hn78.

https://ieee-dataport.org/open-access/benign-breast-tumor-dataset

U-BRITE last update date: 07/09/2021
Z
University of Manitoba Breast Microwave Imaging Dataset (UM-BMID)
data.niaid.nih.gov
ieee-dataport.org
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhandos Sembay (2021). University of Manitoba Breast Microwave Imaging Dataset (UM-BMID) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5120980
Explore at:
Dataset updated
Jul 22, 2021
Dataset authored and provided by
Zhandos Sembay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Manitoba
Description
ABSTRACT

Microwave-based breast cancer detection is a growing field that has been investigated as a potential novel method for breast cancer detection. Breast microwave sensing (BMS) systems use low-powered, non-ionizing microwave signals to interrogate the breast tissues. While some BMS systems have been evaluated in clinical trials, many challenges remain before these systems can be used as a viable clinical option, and breast phantoms (breast models) allow for rigorous and controlled experimental investigations. This dataset, the University of Manitoba Breast Microwave Imaging Dataset (UM-BMID), contains S-parameter measurements from experimental scans of MRI-derived breast phantoms, obtained with a pre-clinical breast microwave sensing system operating over 1-8 GHz. The dataset consists of measurements from over 1250 scans of a diverse array of phantoms. The phantom array consists of phantoms of various sizes and breast densities. The .stl files used to produce the 3D-printed phantoms are also included in the dataset. We hope that this dataset can serve as a resource for researchers in breast microwave sensing to evaluate signal processing, image reconstruction, and tumour detection methods.

Inspiration:

This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

https://cancer.ubrite.org/hackathon-2021/

Acknowledgements

Tyson Reimer, Jordan Krenkevich, Stephen Pistorius, June 16, 2021, "University of Manitoba Breast Microwave Imaging Dataset (UM-BMID)", IEEE Dataport, doi: https://dx.doi.org/10.21227/1y0z-8t98.

https://ieee-dataport.org/open-access/university-manitoba-breast-microwave-imaging-dataset-um-bmid

U-BRITE last update date: 07/21/2021
h
GDSC-2024
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Witold Fracek, GDSC-2024 [Dataset]. https://huggingface.co/datasets/Endercold/GDSC-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Witold Fracek
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
GDSC 2024

This dataset contains the results from the Capgemini Global Data Science Challenge (GDSC) 2024 Arena Battles, where AI education policy experts competed to provide the best answers to questions about global education trends and literacy.Quick Links:

Case study GDSC Overview GDSC 7 Overview Video (Short) GDSC 7 Overview Video (Long) GDSC Website

Background

The Capgemini Global Data Science Challenge (GDSC) is an annual, purpose-driven hackathon that… See the full description on the dataset page: https://huggingface.co/datasets/Endercold/GDSC-2024.
World Climate Risk Index Data
figshare.com
txt
Updated Sep 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cj lortie (2019). World Climate Risk Index Data [Dataset]. http://doi.org/10.6084/m9.figshare.9876413.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9876413.v1
Dataset updated
Sep 18, 2019
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
cj lortie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
World
Description
Data from germanwatch.org site.
Datacon 2020 Dataset
kaggle.com
Updated Nov 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhinav Padmawar (2020). Datacon 2020 Dataset [Dataset]. https://www.kaggle.com/abhinavpadmawar20/haahahahahha/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 18, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abhinav Padmawar
Description
File descriptions

train.csv : the training set

test.csv : the test set

sampleSubmission.csv : a sample submission file in the correct format Data fields

id = video id video_duration = duration of video coding_standard = coding standard used for the video width = width of video in pixels height = height of video in pixels bitrate = video bitrate framerate = actual video frame rate i_frames = number of i-frames in the video p_frames = number of p-frames in the video b_frames = number of b-frames in the video frames = number of frames in video i_size = total size in byte of i videos p_size = total size in byte of p videos b_size = total size in byte of b videos size = total size of video coding_standard_output = output coding standard used for processing bitrate_output = output bitrate used for processing framerate_output = output framerate used for processing output_width = output width in pixel used for processing output_height = output height used in pixel for processing allocated _memory = total coding standard allocated memory for processing total_processing_time = total time taken for processing
Package and Dependency Metadata for CZI Hackathon: Mapping the Impact of...
zenodo.org
application/gzip
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Nesbitt; Andrew Nesbitt (2023). Package and Dependency Metadata for CZI Hackathon: Mapping the Impact of Research Software in Science [Dataset]. http://doi.org/10.5281/zenodo.10042125
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10042125
Dataset updated
Oct 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Nesbitt; Andrew Nesbitt
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
A collection of useful datasets extracted from https://packages.ecosyste.ms and https://repos.ecosyste.ms for use at the CZI Hackathon: Mapping the Impact of Research Software in Science.
All data is provided as NDJSON (new line delimited JSON), each line represents a valid JSON object, and they are separated by newline characters. There are python and R libraries for reading these files, or you can maually read each line and parse each line as a single JSON object.
Each ndjson file has been compressed with gzip (actual command: `tar -czvf`) to reduce download size, they expand to significantly bigger files after extraction.
Package Data
Package names from cran, bioconductor and pypi that have been parsed by the software-mentions project (data: https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c) are collected together with their latest release at time of publishing along with the names of their dependencies, those dependency names have then also been recursively fetched with latest release and dependencies until the full list of transitive dependencies is included.
Note: This approach uses a simplified method of dependency resolution, always picking the latest version of each package rather than taking into account each dependencies specific version range requirements, this is primarily due to time constraints and allows all software ecosystems to be processed in the same way. A future improvement would be to use each package ecosystem's specific dependency resolution algorithm to compute the full transitive dependency tree for each mentioned software package.
GitHub Data
Two different approaches were taken for collecting data for referenced GitHub mentions:
1. `github.ndjson` is metadata for each repository from GitHub, including "manifest" files which are known files that contain dependency information for a project such as requirements.txt, DESCRIPTION and package.json, parsed using https://github.com/ecosyste-ms/bibliothecary, which may include transitive dependencies that have been discovered in a `lockfile` within the repository.
2. `github_packages.ndjson` is metadata for each package that was found on any package manager that references the GitHub url as it's repository url/source/homepage, these packages, like the cran and pypi data above, include the latest release and their direct dependencies. There may be more than one package for each GitHub URL as it is a one to many relationship. `github_packages_with_transitive.ndjson` follows the same format but also includes the extra resolved transitive dependencies of all packages using the same approach as with cran and pypi data above with the same caveats.
There are also many more ecosystems referenced in these files than just cran, bioconductor and pypi, https://packages.ecosyste.ms provides a standardized metadata format for all of them to enable comparison and simplification of automation.
Contact
If you would like any help, support or more data from Ecosyste.ms please do get in touch via email: hello@ecosyste.ms or open an issue on GitHub: https://github.com/ecosyste-ms/packages/issues

Facebook

Twitter

Click to copy link

Link copied

Cite

Andriy Samoshyn (2020). PROHACK Hackathon [Dataset]. https://www.kaggle.com/mrmorj/prohack-hackathon

PROHACK Hackathon

International Data Science Hackathon by McKinsey & Company

Explore at:

zip(1563989 bytes)Available download formats

Dataset updated

Jun 19, 2020

Authors

Andriy Samoshyn

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

FAQ

The solutions are evaluated on two criteria: predicted future Index values and allocated energy from a newly discovered star 1. Index predictions are evaluated using RMSE metric 2. Energy allocation is also evaluated using RMSE metric and has a set of known factors that need to be taken into account

Every galaxy has a certain limited potential for improvement in the index described by the following function:

Potential for increase in the Index = -np.log(Index+0.01)+3

Likely index increase dependent on potential for improvement and on extra energy availability is described by the following function:

Likely increase in the Index = extra energy * Potential for increase in the Index **2 / 1000

Constraints

in total there are 50000 zillion DSML available for allocation no galaxy should be allocated more than 100 zillion DSML or less than 0 zillion DSML galaxies with low existence expectancy index below 0.7 should be allocated at least 10% of the total energy available

Submit format

Variable	Description
Index	Unique index from the test dataset in the ascending order
pred	Prediction for the index of interest
pred_opt	Optimal energy allocation

Clear search

Close search

Google apps

Main menu

PROHACK Hackathon

FAQ

Constraints

Submit format

Hackathon Participants Data

ML HACK Dataset

Care to Share: Dataset and resources for Dutch National Open Science...

File list:

ESA' Mars Express orbiter telemetry data

MOOD - News AMR dataset - Hackathon 2022

AV : Healthcare Analytics II Dataset

Hacklive AV

Context

Content

Seer Breast Cancer Data

‘Electricity Consumption’ analyzed by Analyst-2

Context

Content

Acknowledgements

Mammograms-Breast Cancer Images

‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2

Train: 6136 rows x 14 columns

Test: 2631 rows x 14 columns

ordfts-hackathon-pneuma-vehicles-segmentation

BIGTARGET hackathon

Benign Breast Tumor Dataset

University of Manitoba Breast Microwave Imaging Dataset (UM-BMID)

GDSC-2024

World Climate Risk Index Data

Datacon 2020 Dataset

Package and Dependency Metadata for CZI Hackathon: Mapping the Impact of...

Package Data

GitHub Data

Contact

PROHACK Hackathon

International Data Science Hackathon by McKinsey & Company

FAQ

Constraints

Submit format