5 datasets found
  1. A

    ‘Premium Bonds - High Value Winners’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Premium Bonds - High Value Winners’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-premium-bonds-high-value-winners-bdbc/85ef4531/?iid=006-369&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Premium Bonds - High Value Winners’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/samuelcortinhas/premium-bond-winners-december-2021 on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset from https://www.nsandi.com/prize-checker/winners contains all the high value prize winners from this months premium bond draw. Premium bonds are a way to save money in the UK, where you invest an amount up to £50,000 and every month you get put into draw where you can win tax free prizes. Every pound invested counts as 1 raffle so the more money you have invested the more likely you are to win. Prizes can range from £25 to £1,000,000.

    Content

    This dataset contains all the prizes worth £1,000 or more that were awarded each month between Dec 2021 to present. It includes the prize values, the bond numbers, total value holdings, locations and dates of purchases for each winner. (Note: data format for Dec 2021 is slightly different from the rest, format for 2022 onwards will be uniform.)

    Acknowledgements

    Data was collected from https://www.nsandi.com/prize-checker/winners by downloading an excel file and converting it to a csv.

    --- Original source retains full ownership of the source dataset ---

  2. A

    ‘Wine Rating & Price’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Wine Rating & Price’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-wine-rating-price-7612/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Wine Rating & Price’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/budnyak/wine-rating-and-price on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    I was looking for educational wine dataset with understandable features and suitable for creating ML model for my first DS project. I couldn't find anything relevant, so decided to scrap data from Vivino.com

    Content

    Data contains 4 files for each winestyle: red, white, rose and sparkling. Also there is a file with wine varieties for further analysis. Files has 8 columns with quite obvious names, but maybe I should add that NumberOfRatings is the number of people who rated this wine.

    Inspiration

    Analyzing data presented on Vivino.com, I noticed that there are no bottles that have less than 25 ratings, apparently because the company considers the rating of such wines is not accurate enough. So, I had an idea to perfom ML model for predicting the rating of bottles with a small number of ratings. I realised this idea in my project, public notebook with which I also upload here.

    As it turned out, the problem of reviews distribution is exists in many spheres. It consists in the fact that customers are often afraid of choosing a product or service that no one has ever bought before. Due to this, many businesses lose large amounts of money on the unnormal distribution of customers by product. My idea is to create a model for any such business that predicts the rating based on other features, it can help to increase the demand for new, but promising products.

    --- Original source retains full ownership of the source dataset ---

  3. A

    ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Dec 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘uHack Sentiments 2.0: Decode Code Words’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-uhack-sentiments-2-0-decode-code-words-ce3a/88e2b3fd/?iid=004-194&v=presentation
    Explore at:
    Dataset updated
    Dec 28, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘uHack Sentiments 2.0: Decode Code Words’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/manishtripathi86/uhack-sentiments-20-decode-code-words on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    The challenge here is to analyze and deep dive into the natural language text (reviews) and bucket them based on their topics of discussion. Furthermore, analyzing the overall sentiment will also help the business to make tangible decisions.

    The data set provided to you has a mix of customer reviews for products across categories and retailers. We would like you to model on the data

    to bucket the future reviews in their respective topics (Note: A review can talk about multiple topics)

    Overall polarity (positive/negative sentiment)

    Train: 6136 rows x 14 columns

    Test: 2631 rows x 14 columns

    Topics (Components, Delivery and Customer Support, Design and Aesthetics, Dimensions, Features, Functionality, Installation, Material, Price, Quality and Usability) Polarity (Positive/Negative) Note: The target variables are all encoded in the train dataset for convenience. Please submit the test results in the similar encoded fashion for us to evaluate your results.

    | | Field Name Data Type Purpose Variable type Id Integer Unique identifier for each review Input Review String Review written by customers on a retail website Input Components String 1: aspects related to components Target 0: None Delivery and Customer Support String 1: some aspects related to delivery, return, exchange and customer support Target 0: None Design and Aesthetics String 1: some aspects related to components Target 0: None Dimensions String 1: related to product dimension and size Target 0: None Features String 1: related to product features Target 0 : None
    Functionality String 1: related to working of a product Target 0: None Installation String 1: related to installation of the product Target 0: None Material String 1: related to material of the product Target 0: None Price String 1: related to pricing details of a product Target 0: None Quality String 1: related to quality aspects of a product Target 0: None Usability String 1: related to usability of a product Target 0: None Polarity Integer 1: Positive sentiment; Target 0: Negative Sentiment | | | --- | --- | | | | | | | --- | --- | | | |

    Skills: Text Pre-processing – Lemmatization , Tokenization, N-Grams and other relevant methods Multi-Class Classification, Multi-label Classification Optimizing Log Loss

    Overview Ugam, a Merkle company, is a leading analytics and technology services company. Our customer-centric approach delivers impactful business results for large corporations by leveraging data, technology, and expertise.

    We consistently deliver superior, impactful results through the right blend of human intelligence and AI. With 3300+ people spread across locations worldwide, we successfully deploy our services to create success stories across industries like Retail & Consumer Brands, High Tech, BFSI, Distribution, and Market Research & Consulting. Over the past 21 years, Ugam has been recognized by several firms including Forrester and Gartner, named the No.1 data science company in India by Analytics Insight, and certified as a Great Place to Work®.

    Problem Statement: The last two decades have witnessed a significant change in how consumers purchase products and express their experience/opinions in reviews, posts, and content across platforms. These online reviews are not only useful to reflect customers’ sentiment towards a product but also help businesses fix gaps and find potential opportunities which could further influence future purchases.

    Participants need develop a machine learning model that can analyse customers’ sentiments based on their reviews and feedback.

    NOTE: The prize money will be for the interested candidates who are willing to get interviewed or hired by Ugam. Winner are requested to come to the Machine Leaning Developers Summit2022, happening at Bangalore, for receiving the prize money.

    dataset link: https://machinehack.com/hackathon/uhack_sentiments_20_decode_code_words/overview

    --- Original source retains full ownership of the source dataset ---

  4. LTFS Data Science FinHack 3(Analytics Vidhya)

    • kaggle.com
    Updated Feb 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parv619 (2021). LTFS Data Science FinHack 3(Analytics Vidhya) [Dataset]. https://www.kaggle.com/parv619/ltfs-data-science-finhack-3analytics-vidhya/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Parv619
    Description

    This dataset contains extracted data from LTFS Data Science FinHack 3 (Analytics Vidhya)

    LTFS Top-up loan Up-sell prediction

    A loan is when you receive the money from a financial institution in exchange for future repayment of the principal, plus interest. Financial institutions provide loans to the industries, corporates and individuals. The interest received on these loans is one among the main sources of income for the financial institutions.

    A top-up loan, true to its name, is a facility of availing further funds on an existing loan. When you have a loan that has already been disbursed and under repayment and if you need more funds then, you can simply avail additional funding on the same loan thereby minimizing time, effort and cost related to applying again.

    LTFS provides it’s loan services to its customers and is interested in selling more of its Top-up loan services to its existing customers so they have decided to identify when to pitch a Top-up during the original loan tenure. If they correctly identify the most suitable time to offer a top-up, this will ultimately lead to more disbursals and can also help them beat competing offerings from other institutions.

    To understand this behaviour, LTFS has provided data for its customers containing the information whether that particular customer took the Top-up service and when he took such Top-up service, represented by the target variable Top-up Month.

    You are provided with two types of information:

    1. Customer’s Demographics: The demography table along with the target variable & demographic information contains variables related to Frequency of the loan, Tenure of the loan, Disbursal Amount for a loan & LTV.

    2. Bureau data: Bureau data contains the behavioural and transactional attributes of the customers like current balance, Loan Amount, Overdue etc. for various tradelines of a given customer

    As a data scientist, LTFS has tasked you with building a model given the Top-up loan bucket of 128655 customers along with demographic and bureau data, predict the right bucket/period for 14745 customers in the test data.

    Important Note

    Note that feasibility of implementation of top solutions in real production scenario will be considered while adjudging winners and can change the final standing for Prize Eligibility

    Data Dictionary

    Train_Data.zip This zip file contains the train files for demography data and bureau data. The data dictionary is also included here.

    Test_Data.zip This zip file contains information on demography data and bureau data for a different set of customers

    Sample Submission This file contains the exact submission format for the predictions. Please submit CSV file only.

    Variable Definition ID Unique Identifier for a row Top-up Month (Target) bucket/period for the Top-up Loan

    How to Make a Submission?

    All Submissions are to be done at the solution checker tab. For a step by step view on how to make a submission check the below video

    Evaluation

    The evaluation metric for this competition is macro_f1_score across all entries in the test set.

    Public and Private Split Test data is further divided into Public 40% and Private 60%

    Your initial responses will be checked and scored on the Public data. The final rankings would be based on your private score which will be published once the competition is over.

    Guidelines for Final Submission

    Please ensure that your final submission includes the following:

    Solution file containing the predicted Top-up Month bucket in the test dataset (format is given in sample submission CSV) Code file containing the following: Code: Note that it is mandatory to submit your code for a valid final submission Approach: Please share your approach to solve the problem (doc/ppt/pdf format). It should cover the following topics: A brief on the approach, which you have used to solve the problem. What data-preprocessing / feature engineering ideas really worked? How did you discover them? What does your final model look like? How did you reach it?

    How to Set Final Submission?

    Hackathon Rules The final standings would be based on private leaderboard score and presentations made in Online Interview round with LTFS & Analytics Vidhya which will be held after contest close. Setting the final submission is recommended. Without a final submission, the submission corresponding to best public score will be taken as the final submission Use of external data is prohibited You can only make 10 submissions per day Entries submitted after the contest is closed, will not be considered The code file pertaining to your final submission is mandatory while setting final submission Throughout the hackathon, you are expected to respect fellow hackers and act with high integrity. Analytics Vidhya and LTFS hold the right to disqualify any participant at any stage of the compe...

  5. PHM data challenge 2010

    • kaggle.com
    zip
    Updated Nov 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    rabah ba (2021). PHM data challenge 2010 [Dataset]. https://www.kaggle.com/datasets/rabahba/phm-data-challenge-2010/code
    Explore at:
    zip(6804082146 bytes)Available download formats
    Dataset updated
    Nov 21, 2021
    Authors
    rabah ba
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The PHM Data Challenge is a competition open to all potential conference attendees. This year the challenge is focused on RUL estimation for a high-speed CNC milling machine cutters using dynamometer, accelerometer, and acoustic emission data.

    Both Student and Professional teams are encouraged to enter! Winners of the Student and the Professional categories who attend the conference and submit an invited paper to ijPHM on their technique will be awarded a cash prize. Top scoring participants will be invited to present at a special session of the conference.

    The PHM Data Challenge is a competition open to all potential conference attendees. This year the challenge is focused on RUL estimation for a high-speed CNC milling machine cutters using dynamometer, accelerometer, and acoustic emission data.

    Both Student and Professional teams are encouraged to enter! Winners of the Student and the Professional categories who attend the conference and submit an invited paper to ijPHM on their technique will be awarded a cash prize. Top scoring participants will be invited to present at a special session of the conference.

    Participants will be scored based on their ability to estimate the remaining useful life of a 6mm ball nose tungsten carbide cutter. Winners of the Student and the Professional categories who attend the conference and submit an invited paper to ijPHM on their technique will be awarded a cash prize. Top scoring participants will be invited to present at a special session of the conference.

    Additional information can be found on the competition blog, http://www.phmsociety.org/forum/583

    Teams Teams may be comprised of one or more researchers. One winner from each of two categories will be determined on the basis of score. The categories are:

    Professional: open to anyone (including mixed teams)
    Student: open to any team with all members enrolled as full time students during the spring or fall 2010 semesters.
    

    Teams must declare what category they belong to when signing up. There is a cash prize of $1000 for the top entrant from each category, contingent upon:

    attending the conference
    giving an invited presentation on the winning technique
    submitting a journal-quality paper to the International Journal of Prognostics and Health Management (ijPHM) which discloses the full algorithm used.
    

    Additionally, top scoring teams will be invited to give presentations at a special session, and submit papers to ijPHM. Submission of the challenge special session papers is outside the regular paper submission process and follows its own schedule.

    The organizers of the competition reserve the right to both modify these rules and disqualify any team at their discretion.

    Registration Teams may register by contacting the Competition organizers with their name(s), a team alias under which the scores would be posted, affiliation(s) with address(es), contact phone number (for verification) and competition category (professional or student). Student teams should also send the name of the university and the semesters where they are enrolled full-time. You will be emailed your username and password after verification.

    PLEASE NOTE: In the spirit of fair competition, we allow only one account per team. Please do not register multiple times under different user names, under fictitious names, or using anonymous accounts. Competition organizers reserve the right to delete multiple entries from the same person (or team) and/or to disqualify those who are trying to “game” the system or using fictitious identities.

    Data There are six individual cutter records, c1…c6. Records c1, c4 and c6 are training data, and records c2, c3, and c5 are test data: cutter#1 cutter#2 cutter#3 cutter#4 cutter#5 cutter#6

    The data files are ~800 MB each, and were compressed using the bZip2 algorithm. If your un-zipping software complains, make sure it is bZip2-compatible. 7-Zip is Windows open-source software that works well; Linux users can use the bunzip2 command; Mac users can use Stuffit.

    Note that if you downloaded a copy of c3.zip with a wear file in it, this file is incorrect. Please discard it. The data acquisition files are OK.

    Each training record contains one “wear” file that lists wear after each cut in 10^-3 mm, and a folder with approximately 300 individual data acquisition files (one for each cut). The data acquisition files are in .csv format, with seven columns, corresponding to: Column 1: Force (N) in X dimension Column 2: Force (N) in Y dimension Column 3: Force (N) in Z dimension Column 4: Vibration (g) in X dimension Column 5: Vibration (g) in Y dimension Column 6: Vibration (g) in Z dimension Column 7: AE-RMS (V)

    Some background on the apparatus and experimental setup can be found here, and in the references in that paper. The spindle speed of the cutter was 10400 RPM; feed rate was 1555 mm/min; Y depth...

  6. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Premium Bonds - High Value Winners’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-premium-bonds-high-value-winners-bdbc/85ef4531/?iid=006-369&v=presentation

‘Premium Bonds - High Value Winners’ analyzed by Analyst-2

Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘Premium Bonds - High Value Winners’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/samuelcortinhas/premium-bond-winners-december-2021 on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset from https://www.nsandi.com/prize-checker/winners contains all the high value prize winners from this months premium bond draw. Premium bonds are a way to save money in the UK, where you invest an amount up to £50,000 and every month you get put into draw where you can win tax free prizes. Every pound invested counts as 1 raffle so the more money you have invested the more likely you are to win. Prizes can range from £25 to £1,000,000.

Content

This dataset contains all the prizes worth £1,000 or more that were awarded each month between Dec 2021 to present. It includes the prize values, the bond numbers, total value holdings, locations and dates of purchases for each winner. (Note: data format for Dec 2021 is slightly different from the rest, format for 2022 onwards will be uniform.)

Acknowledgements

Data was collected from https://www.nsandi.com/prize-checker/winners by downloading an excel file and converting it to a csv.

--- Original source retains full ownership of the source dataset ---

Search
Clear search
Close search
Google apps
Main menu