41 datasets found
  1. PROHACK Hackathon

    • kaggle.com
    zip
    Updated Jun 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andriy Samoshyn (2020). PROHACK Hackathon [Dataset]. https://www.kaggle.com/mrmorj/prohack-hackathon
    Explore at:
    zip(1563989 bytes)Available download formats
    Dataset updated
    Jun 19, 2020
    Authors
    Andriy Samoshyn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    FAQ

    The solutions are evaluated on two criteria: predicted future Index values and allocated energy from a newly discovered star 1. Index predictions are evaluated using RMSE metric 2. Energy allocation is also evaluated using RMSE metric and has a set of known factors that need to be taken into account

    Every galaxy has a certain limited potential for improvement in the index described by the following function:

    Potential for increase in the Index = -np.log(Index+0.01)+3

    Likely index increase dependent on potential for improvement and on extra energy availability is described by the following function:

    Likely increase in the Index = extra energy * Potential for increase in the Index **2 / 1000

    Constraints

    in total there are 50000 zillion DSML available for allocation no galaxy should be allocated more than 100 zillion DSML or less than 0 zillion DSML galaxies with low existence expectancy index below 0.7 should be allocated at least 10% of the total energy available

    Submit format

    VariableDescription
    IndexUnique index from the test dataset in the ascending order
    predPrediction for the index of interest
    pred_optOptimal energy allocation
  2. Hackathon Participants Data

    • kaggle.com
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanshu Sethi (2023). Hackathon Participants Data [Dataset]. https://www.kaggle.com/datasets/priyanshusethi/high-school-hackathon-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Priyanshu Sethi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Hackathons are a great way for people to not only learn more about technology but also showcase their existing skills by making projects often in a few hours. This dataset contains data collected from 200 participants of a hackathon conducted for high school students. A lot of columns have been deleted but the remaining columns can be useful to understand the demographic and interests of someone participating in these kind of events.

  3. ML HACK Dataset

    • kaggle.com
    Updated Nov 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Padmawar (2020). ML HACK Dataset [Dataset]. https://www.kaggle.com/abhinavpadmawar20/ml-hack-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abhinav Padmawar
    Description

    Given below are three files that you will be using for the challenge. Download all the files. The training file has a labelled data set. However, the test file shall only have the features. Work out your algorithm for the same and make predictions on the test file after which you have to create a submissions.csv file that will be evaluated. You may refer to the sample_submission.csv file in order to understand the overall structure of your submission. The dataset consists of overall stats of players in ODIs only.

    File descriptions:

    train.csv - the training set test.csv - the test set sampleSubmission.csv - a sample submission file in the correct format Data fields id - an anonymous id unique to the player Name - Name of the player. Age - Age 100s - Number of centuries of the player 50s - Number of half centuries of the player 6s - Total number of sixes hit by the player Balls - Number of balls bowled by the player Bat_Average - Average batting score Bowl_Strike_Rate - average number of balls bowled per wicket taken Balls faced - Number of balls faced Economy - average number of runs conceded for each over bowled. Innings - Number of innings played Overs/strong> - Number of overs bowled Maidens - Overs when no run was conceded Runs - Total runs scored by the player Wickets - Number of wickets taken Ratings - Final rating of the player

  4. Care to Share: Dataset and resources for Dutch National Open Science...

    • zenodo.org
    bin, pdf
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Cadwallader; Lauren Cadwallader; Mirela Volaj; Mirela Volaj (2024). Care to Share: Dataset and resources for Dutch National Open Science Festival hackathon [Dataset]. http://doi.org/10.5281/zenodo.13960085
    Explore at:
    pdf, binAvailable download formats
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lauren Cadwallader; Lauren Cadwallader; Mirela Volaj; Mirela Volaj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the materials used in the session "Care to Share? Investigating Open Science practices adoption among researchers: a hackathon" presented at the Dutch National Open Science Festival on 22nd October 2024.

    The data files are derived from: Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686 ad contains two additional fields (Dimensions_Country and Dimensions_FoR) from Dimensions obtained on 15 October 2024, from Digital Science’s Dimensions platform, available at https://app.dimensions.ai.

    File list:

    PLOS-Dataset-for-Hackathon.xlsx

    Data pertaining to the PLOS corpus of articles derived from Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686 with additional data from Dimensions.ai.

    Comparator-Dataset-for-Hackathon.xlsx

    Data pertaining to the Comparator corpus of articles derived from Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686 with additional data from Dimensions.ai.

    Care to share resource sheet.pdf

    Document outlining the questions to be investigated during the hackathon as well as key information about the dataset.

    OSI-Column-Descriptions_v3_Dec23.pdf
    This file is taken from Public Library of Science (2022) PLOS Open Science Indicators. Figshare. Dataset (version 8). https://doi.org/10.6084/m9.figshare.21687686. It describes the fields used in the two data files with the exception of Dimensions_Country and Dimensions_FoR. Descriptions for these are listed in the README tabs of the data files.

  5. ESA' Mars Express orbiter telemetry data

    • kaggle.com
    Updated May 27, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2017). ESA' Mars Express orbiter telemetry data [Dataset]. https://www.kaggle.com/datasets/fornaxai/dataadventures/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Description

    Courtesy of the European Space Agency.

    License: ESA CC BY-SA 3.0 IGO

  6. R

    MOOD - News AMR dataset - Hackathon 2022

    • entrepot.recherche.data.gouv.fr
    pdf, tsv
    Updated Nov 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARINIK Nejat; ARINIK Nejat; Van BORTEL Wim; Van BORTEL Wim; BOUDOUA Bahdja; BUSANI Luca; BUSANI Luca; DECOUPES Rémy; DECOUPES Rémy; INTERDONATO Roberto; INTERDONATO Roberto; Van KLEEF Ester; Van KLEEF Ester; KAFANDO Rodrique; KAFANDO Rodrique; ROCHE Mathieu; ROCHE Mathieu; SYED Mehtab Alam; SYED Mehtab Alam; Maguelonne TEISSEIRE; Maguelonne TEISSEIRE; BOUDOUA Bahdja (2022). MOOD - News AMR dataset - Hackathon 2022 [Dataset]. http://doi.org/10.57745/MPNSPH
    Explore at:
    tsv(28823), tsv(29272), tsv(89646), tsv(30011), tsv(1036597), tsv(1034657), tsv(30633), tsv(85698), pdf(136642)Available download formats
    Dataset updated
    Nov 16, 2022
    Dataset provided by
    Recherche Data Gouv
    Authors
    ARINIK Nejat; ARINIK Nejat; Van BORTEL Wim; Van BORTEL Wim; BOUDOUA Bahdja; BUSANI Luca; BUSANI Luca; DECOUPES Rémy; DECOUPES Rémy; INTERDONATO Roberto; INTERDONATO Roberto; Van KLEEF Ester; Van KLEEF Ester; KAFANDO Rodrique; KAFANDO Rodrique; ROCHE Mathieu; ROCHE Mathieu; SYED Mehtab Alam; SYED Mehtab Alam; Maguelonne TEISSEIRE; Maguelonne TEISSEIRE; BOUDOUA Bahdja
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Description

    This dataset has been collected from four Epidemiological Surveillance Systems (EBS) to be used in an hackathon dedicated to AMR (antimicrobial resistance) for the MOOD summer school in June 2022. The choosen EBS sources are ProMED, PADI-web, Healthmap and MedISys. The collected data are news dealing with epidemiological information or event. This dataset is composed of 4 sub-datasets for each chosen EBS. Each sub-dataset is annotated according to 3 main classes (New Information, General Information, Not Relevant). For each news labeled as New Information or General Information, another annotation is provided concerning host classification with 7 classes (Humans, Human-animal, Animals, Human-food, Food, Environment, and All). This second annotation provided 4 sub-datasets. The aim of the annotation task is to recognize epidemiological information related to AMR. An annotation guideline is provided in order to ensure an unified annotation and to help the annotators. This dataset can be used to train or evaluate classification approaches to automatically identify text on AMR events and types of AMR issues (e.g. animal, food, etc.) in unstructured data (e.g. news, tweets) and classify these events by relevance for epidemic intelligence purposes.

  7. c

    AV : Healthcare Analytics II Dataset

    • cubig.ai
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). AV : Healthcare Analytics II Dataset [Dataset]. https://cubig.ai/store/products/184/av-healthcare-analytics-ii-dataset
    Explore at:
    Dataset updated
    May 2, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • Hospital length of stay dataset is part of a hackathon organized by Analytics Vidhya, focusing on healthcare management challenges, particularly in optimizing hospital patient length of stay. This dataset includes detailed information on patient demographics, hospital attributes, and treatment details, which are critical for managing healthcare efficiency.

    2) Data Utilization (1) Hospital length of stay data has characteristics that: • The dataset is structured to provide insights into various factors that affect the length of hospital stays. It contains data on numerous variables including patient age, medical conditions, previous admissions, and the type of hospital and care involved. • It supports predictive modeling to help hospitals improve service delivery by accurately forecasting patient stay durations and managing hospital bed occupancy and staffing needs more effectively. (2) Hospital length of stay data can be used to: • Hospital Management: The data can assist in strategic planning and resource allocation, helping hospitals reduce costs while maintaining high care standards. • Research in Healthcare Systems: It serves as a foundational dataset for academic and commercial research aimed at understanding and improving healthcare systems efficiency.

  8. Hacklive AV

    • kaggle.com
    Updated Sep 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akash Gupta (2020). Hacklive AV [Dataset]. https://www.kaggle.com/datasets/akash14/hacklive-av/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Akash Gupta
    Description

    Context

    AV HackLive - Guided Community Hackathon!

    Content

    Data Science competitions can be daunting for someone who has never participated in one. Some of them have hundreds of competitors with top notch industry knowledge and splendid past record in such hackathons.

    Thus a lot of beginners are apprehensive about getting started with these hackathons

    The top 3 questions that are commonly asked:

    Is it even worth it if I have minimal chance of winning? How do I start? How can I improve my rank in the future? Let’s answer the first question before we go further.

  9. Seer Breast Cancer Data

    • zenodo.org
    • ieee-dataport.org
    • +2more
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhandos Sembay; Zhandos Sembay (2021). Seer Breast Cancer Data [Dataset]. http://doi.org/10.5281/zenodo.5120960
    Explore at:
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhandos Sembay; Zhandos Sembay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract:

    This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program of the NCI, which provides information on population-based cancer statistics. The dataset involved female patients with infiltrating duct and lobular carcinoma breast cancer (SEER primary cites recode NOS histology codes 8522/3) diagnosed in 2006-2010. Patients with unknown tumor size, examined regional LNs, regional positive LNs, and patients whose survival months were less than 1 month were excluded; thus, 4024 patients were ultimately included.

    Inspiration:

    This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

    https://cancer.ubrite.org/hackathon-2021/

    Acknowledgements

    JING TENG, January 18, 2019, "SEER Breast Cancer Data", IEEE Dataport, doi: https://dx.doi.org/10.21227/a9qy-ph35.

    https://ieee-dataport.org/open-access/seer-breast-cancer-data

    U-BRITE last update date: 07/21/2021

  10. A

    ‘Electricity Consumption’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Electricity Consumption’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-electricity-consumption-4b9e/fdf80460/?iid=007-581&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Electricity Consumption’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/utathya/electricity-consumption on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Company of Electrolysia supplies electricity to the city. It is looking to optimise its electricity production based on the historical electricity consumption of the people of Electrovania.

    The company has hired you as a Data Scientist to investigate the past consumption and the weather information to come up with a model that catches the trend as accurate as possible. You have to bear in mind that there are many factors that affect electricity consumption and not all can be measured. Electrolysia has provided you this data on hourly data spanning five years.

    For this competition, the training set is comprised of the first 23 days of each month and the test set is the 24th to the end of the month, where the public leaderboard is based on the first two days of test, whereas the private leaderboard considers the rest of the days. Your task is to predict the electricity consumption on hourly basis.

    Note that you cannot use future information to model past consumption. For example, you cannot use February 2017 data to predict last week of January 2017 information.

    Content

    It represents a fictitious time period wherein we are to predict future electricity consumption.

    Acknowledgements

    This data is from Analytics Vidya hackathon. The hackathon is closed now.

    --- Original source retains full ownership of the source dataset ---

  11. Mammograms-Breast Cancer Images

    • zenodo.org
    • ieee-dataport.org
    • +1more
    zip
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhandos Sembay; Zhandos Sembay (2021). Mammograms-Breast Cancer Images [Dataset]. http://doi.org/10.5281/zenodo.5120965
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Zhandos Sembay; Zhandos Sembay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT

    This is a small dataset as a part of huge dataset of breast cancer images. The images are mammograms.

    Instructions:

    One can use these images for experimentation on detection and analysis of breast cancer.

    Inspiration:

    This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

    https://cancer.ubrite.org/hackathon-2021/

    Acknowledgements

    G R Sinha, Bhagwati Charan Patel, December 27, 2019, "Mammograms-Breast Cancer Images", IEEE Dataport, doi: https://dx.doi.org/10.21227/9f0p-qx37.

    https://ieee-dataport.org/documents/mammograms-breast-cancer-images

    U-BRITE last update date: 07/21/2021

  12. A

    ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-uhack-sentiments-2-0-decode-code-words-ce3a/88e2b3fd/?iid=004-204&v=presentation
    Explore at:
    Dataset updated
    Dec 28, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘uHack Sentiments 2.0: Decode Code Words’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/manishtripathi86/uhack-sentiments-20-decode-code-words on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    The challenge here is to analyze and deep dive into the natural language text (reviews) and bucket them based on their topics of discussion. Furthermore, analyzing the overall sentiment will also help the business to make tangible decisions.

    The data set provided to you has a mix of customer reviews for products across categories and retailers. We would like you to model on the data

    to bucket the future reviews in their respective topics (Note: A review can talk about multiple topics)

    Overall polarity (positive/negative sentiment)

    Train: 6136 rows x 14 columns

    Test: 2631 rows x 14 columns

    Topics (Components, Delivery and Customer Support, Design and Aesthetics, Dimensions, Features, Functionality, Installation, Material, Price, Quality and Usability) Polarity (Positive/Negative) Note: The target variables are all encoded in the train dataset for convenience. Please submit the test results in the similar encoded fashion for us to evaluate your results.

    | | Field Name Data Type Purpose Variable type Id Integer Unique identifier for each review Input Review String Review written by customers on a retail website Input Components String 1: aspects related to components Target 0: None Delivery and Customer Support String 1: some aspects related to delivery, return, exchange and customer support Target 0: None Design and Aesthetics String 1: some aspects related to components Target 0: None Dimensions String 1: related to product dimension and size Target 0: None Features String 1: related to product features Target 0 : None
    Functionality String 1: related to working of a product Target 0: None Installation String 1: related to installation of the product Target 0: None Material String 1: related to material of the product Target 0: None Price String 1: related to pricing details of a product Target 0: None Quality String 1: related to quality aspects of a product Target 0: None Usability String 1: related to usability of a product Target 0: None Polarity Integer 1: Positive sentiment; Target 0: Negative Sentiment | | | --- | --- | | | | | | | --- | --- | | | |

    Skills: Text Pre-processing – Lemmatization , Tokenization, N-Grams and other relevant methods Multi-Class Classification, Multi-label Classification Optimizing Log Loss

    Overview Ugam, a Merkle company, is a leading analytics and technology services company. Our customer-centric approach delivers impactful business results for large corporations by leveraging data, technology, and expertise.

    We consistently deliver superior, impactful results through the right blend of human intelligence and AI. With 3300+ people spread across locations worldwide, we successfully deploy our services to create success stories across industries like Retail & Consumer Brands, High Tech, BFSI, Distribution, and Market Research & Consulting. Over the past 21 years, Ugam has been recognized by several firms including Forrester and Gartner, named the No.1 data science company in India by Analytics Insight, and certified as a Great Place to Work®.

    Problem Statement: The last two decades have witnessed a significant change in how consumers purchase products and express their experience/opinions in reviews, posts, and content across platforms. These online reviews are not only useful to reflect customers’ sentiment towards a product but also help businesses fix gaps and find potential opportunities which could further influence future purchases.

    Participants need develop a machine learning model that can analyse customers’ sentiments based on their reviews and feedback.

    NOTE: The prize money will be for the interested candidates who are willing to get interviewed or hired by Ugam. Winner are requested to come to the Machine Leaning Developers Summit2022, happening at Bangalore, for receiving the prize money.

    dataset link: https://machinehack.com/hackathon/uhack_sentiments_20_decode_code_words/overview

    --- Original source retains full ownership of the source dataset ---

  13. h

    ordfts-hackathon-pneuma-vehicles-segmentation

    • huggingface.co
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Vivar (2024). ordfts-hackathon-pneuma-vehicles-segmentation [Dataset]. http://doi.org/10.57967/hf/3028
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 9, 2024
    Authors
    Carlos Vivar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ORD for the Sciences Hackathon - Vehicles Detection

    [!CAUTION] This project is an example of a hackathon project. The quality of the data produced has not been evaluated. Its goal is to provide an example on how a dataset can be update to Hugginface.

    This is an example of a hackathon project presented to ORD for the sciences hackathon using the openly available pNeuma vision dataset.

    Go here if you wanna know more about the hackathon EPFL pNeuma project… See the full description on the dataset page: https://huggingface.co/datasets/katospiegel/ordfts-hackathon-pneuma-vehicles-segmentation.

  14. BIGTARGET hackathon

    • kaggle.com
    Updated Jul 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrii Samoshyn (2020). BIGTARGET hackathon [Dataset]. https://www.kaggle.com/mrmorj/bigtarget/kernels
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 1, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andrii Samoshyn
    Description

    Participants needed to develop a solution that improves the effectiveness of SMS targeting in such a way that they only send messages to customers who are motivated to make a purchase.

  15. Z

    Benign Breast Tumor Dataset

    • data.niaid.nih.gov
    • ieee-dataport.org
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhandos Sembay (2024). Benign Breast Tumor Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5084115
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    Zhandos Sembay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ABSTRACT

    This dataset has information of 83 patients from India. This dataset contains patients’ clinical history, histopathological features, and mammogram. The distinctive aspect of this dataset lies in its collection of mammograms that have benign tumors and used in subclassification of benign tumors.

    Instructions:

    This datasest contains a zip folder of 80 mammograms and an excel file having mammographic features, histopathological features as well as clinical fatures of all the patients.

    Inspiration:

    This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

    https://cancer.ubrite.org/hackathon-2021/

    Acknowledgements

    Manish Joshi, Aparna Bhale, Unmesh Takalkar, May 9, 2021, "Benign Breast Tumor Dataset", IEEE Dataport, doi: https://dx.doi.org/10.21227/6sda-hn78.

    https://ieee-dataport.org/open-access/benign-breast-tumor-dataset

    U-BRITE last update date: 07/09/2021

  16. Z

    University of Manitoba Breast Microwave Imaging Dataset (UM-BMID)

    • data.niaid.nih.gov
    • ieee-dataport.org
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhandos Sembay (2021). University of Manitoba Breast Microwave Imaging Dataset (UM-BMID) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5120980
    Explore at:
    Dataset updated
    Jul 22, 2021
    Dataset authored and provided by
    Zhandos Sembay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Manitoba
    Description

    ABSTRACT

    Microwave-based breast cancer detection is a growing field that has been investigated as a potential novel method for breast cancer detection. Breast microwave sensing (BMS) systems use low-powered, non-ionizing microwave signals to interrogate the breast tissues. While some BMS systems have been evaluated in clinical trials, many challenges remain before these systems can be used as a viable clinical option, and breast phantoms (breast models) allow for rigorous and controlled experimental investigations. This dataset, the University of Manitoba Breast Microwave Imaging Dataset (UM-BMID), contains S-parameter measurements from experimental scans of MRI-derived breast phantoms, obtained with a pre-clinical breast microwave sensing system operating over 1-8 GHz. The dataset consists of measurements from over 1250 scans of a diverse array of phantoms. The phantom array consists of phantoms of various sizes and breast densities. The .stl files used to produce the 3D-printed phantoms are also included in the dataset. We hope that this dataset can serve as a resource for researchers in breast microwave sensing to evaluate signal processing, image reconstruction, and tumour detection methods.

    Inspiration:

    This dataset uploaded to U-BRITE for "AI against CANCER DATA SCIENCE HACKATHON"

    https://cancer.ubrite.org/hackathon-2021/

    Acknowledgements

    Tyson Reimer, Jordan Krenkevich, Stephen Pistorius, June 16, 2021, "University of Manitoba Breast Microwave Imaging Dataset (UM-BMID)", IEEE Dataport, doi: https://dx.doi.org/10.21227/1y0z-8t98.

    https://ieee-dataport.org/open-access/university-manitoba-breast-microwave-imaging-dataset-um-bmid

    U-BRITE last update date: 07/21/2021

  17. h

    GDSC-2024

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Witold Fracek, GDSC-2024 [Dataset]. https://huggingface.co/datasets/Endercold/GDSC-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Witold Fracek
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    GDSC 2024

    This dataset contains the results from the Capgemini Global Data Science Challenge (GDSC) 2024 Arena Battles, where AI education policy experts competed to provide the best answers to questions about global education trends and literacy.Quick Links:

    Case study GDSC Overview GDSC 7 Overview Video (Short) GDSC 7 Overview Video (Long) GDSC Website

      Background
    

    The Capgemini Global Data Science Challenge (GDSC) is an annual, purpose-driven hackathon that… See the full description on the dataset page: https://huggingface.co/datasets/Endercold/GDSC-2024.

  18. World Climate Risk Index Data

    • figshare.com
    txt
    Updated Sep 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cj lortie (2019). World Climate Risk Index Data [Dataset]. http://doi.org/10.6084/m9.figshare.9876413.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 18, 2019
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    cj lortie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    Data from germanwatch.org site.

  19. Datacon 2020 Dataset

    • kaggle.com
    Updated Nov 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhinav Padmawar (2020). Datacon 2020 Dataset [Dataset]. https://www.kaggle.com/abhinavpadmawar20/haahahahahha/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abhinav Padmawar
    Description

    File descriptions

    • train.csv : the training set
    • test.csv : the test set
    • sampleSubmission.csv : a sample submission file in the correct format Data fields

    id = video id video_duration = duration of video coding_standard = coding standard used for the video width = width of video in pixels height = height of video in pixels bitrate = video bitrate framerate = actual video frame rate i_frames = number of i-frames in the video p_frames = number of p-frames in the video b_frames = number of b-frames in the video frames = number of frames in video i_size = total size in byte of i videos p_size = total size in byte of p videos b_size = total size in byte of b videos size = total size of video coding_standard_output = output coding standard used for processing bitrate_output = output bitrate used for processing framerate_output = output framerate used for processing output_width = output width in pixel used for processing output_height = output height used in pixel for processing allocated _memory = total coding standard allocated memory for processing total_processing_time = total time taken for processing

  20. Package and Dependency Metadata for CZI Hackathon: Mapping the Impact of...

    • zenodo.org
    application/gzip
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Nesbitt; Andrew Nesbitt (2023). Package and Dependency Metadata for CZI Hackathon: Mapping the Impact of Research Software in Science [Dataset]. http://doi.org/10.5281/zenodo.10042125
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 25, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Nesbitt; Andrew Nesbitt
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    A collection of useful datasets extracted from https://packages.ecosyste.ms and https://repos.ecosyste.ms for use at the CZI Hackathon: Mapping the Impact of Research Software in Science.

    All data is provided as NDJSON (new line delimited JSON), each line represents a valid JSON object, and they are separated by newline characters. There are python and R libraries for reading these files, or you can maually read each line and parse each line as a single JSON object.

    Each ndjson file has been compressed with gzip (actual command: `tar -czvf`) to reduce download size, they expand to significantly bigger files after extraction.

    Package Data

    Package names from cran, bioconductor and pypi that have been parsed by the software-mentions project (data: https://datadryad.org/stash/dataset/doi:10.5061/dryad.6wwpzgn2c) are collected together with their latest release at time of publishing along with the names of their dependencies, those dependency names have then also been recursively fetched with latest release and dependencies until the full list of transitive dependencies is included.

    Note: This approach uses a simplified method of dependency resolution, always picking the latest version of each package rather than taking into account each dependencies specific version range requirements, this is primarily due to time constraints and allows all software ecosystems to be processed in the same way. A future improvement would be to use each package ecosystem's specific dependency resolution algorithm to compute the full transitive dependency tree for each mentioned software package.

    GitHub Data

    Two different approaches were taken for collecting data for referenced GitHub mentions:

    1. `github.ndjson` is metadata for each repository from GitHub, including "manifest" files which are known files that contain dependency information for a project such as requirements.txt, DESCRIPTION and package.json, parsed using https://github.com/ecosyste-ms/bibliothecary, which may include transitive dependencies that have been discovered in a `lockfile` within the repository.

    2. `github_packages.ndjson` is metadata for each package that was found on any package manager that references the GitHub url as it's repository url/source/homepage, these packages, like the cran and pypi data above, include the latest release and their direct dependencies. There may be more than one package for each GitHub URL as it is a one to many relationship. `github_packages_with_transitive.ndjson` follows the same format but also includes the extra resolved transitive dependencies of all packages using the same approach as with cran and pypi data above with the same caveats.

    There are also many more ecosystems referenced in these files than just cran, bioconductor and pypi, https://packages.ecosyste.ms provides a standardized metadata format for all of them to enable comparison and simplification of automation.

    Contact

    If you would like any help, support or more data from Ecosyste.ms please do get in touch via email: hello@ecosyste.ms or open an issue on GitHub: https://github.com/ecosyste-ms/packages/issues

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Andriy Samoshyn (2020). PROHACK Hackathon [Dataset]. https://www.kaggle.com/mrmorj/prohack-hackathon
Organization logo

PROHACK Hackathon

International Data Science Hackathon by McKinsey & Company

Explore at:
zip(1563989 bytes)Available download formats
Dataset updated
Jun 19, 2020
Authors
Andriy Samoshyn
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

FAQ

The solutions are evaluated on two criteria: predicted future Index values and allocated energy from a newly discovered star 1. Index predictions are evaluated using RMSE metric 2. Energy allocation is also evaluated using RMSE metric and has a set of known factors that need to be taken into account

Every galaxy has a certain limited potential for improvement in the index described by the following function:

Potential for increase in the Index = -np.log(Index+0.01)+3

Likely index increase dependent on potential for improvement and on extra energy availability is described by the following function:

Likely increase in the Index = extra energy * Potential for increase in the Index **2 / 1000

Constraints

in total there are 50000 zillion DSML available for allocation no galaxy should be allocated more than 100 zillion DSML or less than 0 zillion DSML galaxies with low existence expectancy index below 0.7 should be allocated at least 10% of the total energy available

Submit format

VariableDescription
IndexUnique index from the test dataset in the ascending order
predPrediction for the index of interest
pred_optOptimal energy allocation
Search
Clear search
Close search
Google apps
Main menu