18 datasets found
  1. English indices of deprivation 2015 & 2011 census

    • kaggle.com
    Updated Nov 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nm8883 (2021). English indices of deprivation 2015 & 2011 census [Dataset]. https://www.kaggle.com/nm8883/uk-census-data-with-uk-deprivation-index-2015/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 1, 2021
    Dataset provided by
    Kaggle
    Authors
    nm8883
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Background

    This analysis was conducted as part of a university module to compare age with socio-economic group in the UK and investigates unemployment levels with deprivation in England.

    Content

    The dataset includes the English Indices of Deprivation 2015 and the 2011 UK census data.

    The English indices of deprivation measures relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices. More information can be found on the government website here. The Index of Multiple Deprivation ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area) and ranks them according to the following measures:

    Income Deprivation Employment Deprivation Education, Skills and Training Deprivation Health Deprivation and Disability Crime Barriers to Housing and Services Living Environment Deprivation By including the 2011 UK census data and a lookup table (for combining the datasets) it is possible to see how age and gender corresponds to areas of deprivation.

    Acknowledgements

    All data has been made freely available by the UK Government and can be accessed here. It is strongly recommended that the guidance notes for this dataset are read before performing any analysis.

  2. London Bike-Share Usage Dataset

    • kaggle.com
    Updated Apr 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Svetlana Kalacheva (2024). London Bike-Share Usage Dataset [Dataset]. https://www.kaggle.com/datasets/kalacheva/london-bike-share-usage-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2024
    Dataset provided by
    Kaggle
    Authors
    Svetlana Kalacheva
    Area covered
    London
    Description

    Context

    This dataset contains detailed records of 776,527 bicycle journeys from the Transport for London (TfL) Cycle Hire system spanning from August 1 to August 31, 2023. The TfL Cycle Hire initiative provides publicly accessible bicycles for rent across London, promoting sustainable transportation and physical fitness. This comprehensive dataset captures individual trip data, which can be utilized to analyze urban mobility patterns, station performance, and cycling preferences among London's diverse population. This dataset provides a snapshot of cycling activity during the month, including start and end details for each journey, the bicycle used, and the duration of hire.

    Dataset Usage

    The dataset can be used for: - Time Series Forecasting: Predict future bike rental demands based on historical usage patterns. - Geospatial Analysis: Map the start and end locations of trips to identify popular routes and areas with high cycling traffic. - Customer Behavior Analysis: Analyze the duration and frequency of rentals to understand user preferences and habits. - Predictive Maintenance: Use trip duration and frequency data to predict when bikes are likely to require maintenance or replacement. - Multivariate Analysis: Explore relationships between different variables, such as trip durations, station popularity, and time of day, to uncover underlying patterns in bike usage.

    Attribute Information

    The dataset includes the following variables for each ride: - Number: A unique identifier for each trip (Trip ID). - Start Date: The date and time when the trip began. - Start Station Number: The identifier for the starting station. - Start Station: The name of the starting station. - End Date: The date and time when the trip ended. - End Station Number: The identifier for the ending station. - End Station: The name of the ending station. - Bike Number: A unique identifier for the bicycle used. - Bike Model: The model of the bicycle used. - Total Duration: The total time duration of the trip (in a human-readable format). - Total Duration (ms): The total time duration of the trip in milliseconds.

    Source This dataset was sourced directly from the Transport for London's official website, which provides open data to encourage public use and analysis. More details and related datasets can be found at Transport for London (TfL).

    Reference: Transport for London. (August 2023). TfL Cycle Hire Trip Data. Retrieved [Date Retrieved], from https://tfl.gov.uk/info-for/open-data-users/our-open-data.

  3. A

    ‘E-Shop Clothing Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘E-Shop Clothing Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-e-shop-clothing-dataset-5607/latest
    Explore at:
    Dataset updated
    Aug 11, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘E-Shop Clothing Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adityawisnugrahas/eshop-clothing-dataset on 11 August 2021.

    --- Dataset description provided by original source is as follows ---

    Data description “e-shop clothing 2008”

    Variables:

    1. YEAR (2008)

    ========================================================

    1. MONTH -> from April (4) to August (8)

    ========================================================

    1. DAY -> day number of the month

    ========================================================

    1. ORDER -> sequence of clicks during one session

    ========================================================

    1. COUNTRY -> variable indicating the country of origin of the IP address with the following categories:

    1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)

    ========================================================

    1. SESSION ID -> variable indicating session id (short record)

    ========================================================

    1. PAGE 1 (MAIN CATEGORY) -> concerns the main product category: 1-trousers 2-skirts 3-blouses 4-sale

    ========================================================

    1. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product (217 products)

    ========================================================

    1. COLOUR -> colour of product

    1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white

    ========================================================

    1. LOCATION -> photo location on the page, the screen has been divided into six parts:

    1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right

    ========================================================

    1. MODEL PHOTOGRAPHY -> variable with two categories:

    1-en face 2-profile

    ========================================================

    1. PRICE -> price in US dollars

    ========================================================

    1. PRICE 2 -> variable informing whether the price of a particular product is higher than the average price for the entire product category

    1-yes 2-no

    ========================================================

    1. PAGE -> page number within the e-store website (from 1 to 5)

    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    I want to know how to solve this data regarding any problem (clustering, regression, classification, EDA)

    Source: https://archive.ics.uci.edu/ml/datasets/clickstream+data+for+online+shopping

    --- Original source retains full ownership of the source dataset ---

  4. Books from Blackwell's Bookshop

    • kaggle.com
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Sannikov (2022). Books from Blackwell's Bookshop [Dataset]. https://www.kaggle.com/arthurio/books-from-blackwells-bookshop/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Artur Sannikov
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Blackwell’s is a British book retailer founded in 1879 and it has more than 40 bookstores in the UK. They also sell books though their official site.

    Content

    Books metadata downloaded from the official site using web scraping and API. In the cleaned version price names were renamed to distinguish between prices in pounds and euros and some columns were transformed to numbers.

    Inspiration

    You can use the data to:

    • Analyze prices in different book categories (like the genre)

    • Perform sentimental analysis of blurbs and review

    • Predict prices based on the book’s dimensions and weight

  5. Open Postcode Elevation

    • kaggle.com
    Updated Aug 4, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GetTheData (2017). Open Postcode Elevation [Dataset]. https://www.kaggle.com/getthedata/open-postcode-elevation/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2017
    Dataset provided by
    Kaggle
    Authors
    GetTheData
    Description

    Context

    This open dataset takes each British postcode, locates the centroid, and assigns an elevation based on the nearest point on an Ordnance Survey contour line to that centroid.

    Also known as altitude, elevation is given as distance above sea level in metres.

    Content

    • postcode
    • elevation

    Acknowledgements

    Documentation and the latest version can be found at the Open Postcode Elevation homepage.

    Postcode data is from the ONS Postcode Directory.

    Elevation is from OS Terrain 50.

    Published and maintained by GetTheData.

    Licence

    Open data licensed under the Open Government Licence.

    Attribution required.

    Attribution

    • Contains OS data © Crown copyright and database right 2017
    • Contains Royal Mail data © Royal Mail copyright and database right 2017
    • Contains National Statistics data © Crown copyright and database right 2017

    Credit

    Photo by Maojin Lang on Unsplash

  6. UK Stations opening dates

    • kaggle.com
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andy Loring (2025). UK Stations opening dates [Dataset]. https://www.kaggle.com/datasets/andyloring/uk-stations-opening-dates/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Andy Loring
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United Kingdom
    Description

    The opening dates of all current UK railway stations, manually extracted from the chronology information (in PDF format) from this site: https://rchs.org.uk/railway-passenger-stations-in-great-britain-a-chronology/

    Where a station has continually existed, but moved location (along the same line), the earliest opening date is used, together with the most recent 'resiting' date. Where a station closed, but has subsequently re-opened, I have shown both the original opening date and the date from which the station has currently been open since. Where a previous station had existed (on the same line), within a mile of the current station, then I have deemed the current station as being 're-opened' and resited (even if the previous station had a different name). (Where a station has been temporarily closed, eg due to renovation or line improvements, then I have just treated this as being continuously open. Some stations are noted as being closed during the First World War 1914-1918, so where this is the case then the dates have been noted in their own column, but the station is assumed to have been continuously open).

    Details of the columns are (all dates are in UK format dd/mm/yyyy): - Three Letter Code - the station code by which all UK railway stations can be identified (initially used as Computer Reservation System (CRS) codes) - occasionally a station will have/use more than 1 CRS code, in which case a separate entry is shown per code. - station_name - the name commonly used for the station (although this can change or vary between listings) - Status - The current status of the station/record (see further details below) - Currently Opened - the date from which the station has currently be consistently opened since - Year of current opening - just the year part of the above date - Resited (most recent) - if the station has moved location, or been remodelled in its current location, then the last time this happened - Replaced different station - indicator if the station has replaced another station with a different name - Originally Opened - the date the station was originally opened, if different from its currently opened date - Year of Original opening - the year part of the above date (where an Originally Opened date exists), or a repeat of the 'Year of current opening' value - Closed - The date the station closed (if it has subsequently re-opened, or it is currently closed) - WW1 Closed - The date a station closed during the First World War - WW1 Re-open - The date a station re-opened after the First World War - Comment - any further textual information relevant to the other columns

    The 'Status' column may have the following values: Open - the station is currently operational Re-Open - the station is currently operational, but had previously had a period of inactivity (or it replaced another station that was inactive) Duplicate - a duplicate record for where there is more than 1 'Three Letter Code' for that station N/A - a station that is not on the current national rail network (but may previously have been, or may be used for special services) dummy - a blank record for the year before the first of the other records (to give a zero data point)

  7. British Job Agency Employment

    • kaggle.com
    Updated Jul 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul (2018). British Job Agency Employment [Dataset]. https://www.kaggle.com/rahul025/error-detection/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rahul
    Area covered
    United Kingdom
    Description

    Auditing and Cleansing the Job dataset

    The dataset description is shown below:

    Columns and its Description

    Id : 8 digit Id of the job advertisement,

    Title: Title of the advertised job position,

    Location: Location of the advertised job position,

    ContractType: The contract type of the advertised job position, could be full-time, part-time or non-specified,

    ContractTime: The contract time of the advertised job position, could be permanent, contract or non-specified,

    Company: Company (employer) of the advertised job position,

    Category: The Category of the advertised job position, e.g., IT jobs, Engineering Jobs, etc.

    Salary per annum: Annual Salary of the advertised job position, e.g., 80000,

    OpenDate: The opening time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,

    CloseDate: The closing time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,

    SourceName: The website where the job position is advertised.

    In this task, you are required to inspect and audit the data (dataset1_with_error.csv) to identify the data problems, and then fix the problems. Different generic and major data problems could be found in the data might include:

    Lexical errors Irregularities Violations of the Integrity constraint. Inconsistency In the end, save the error-free dataset in dataset1_solution.csv. The number of records in your solution should be the same as the number of those in the input file.

  8. Bicycle Accidents in Great Britain (1979 to 2018)

    • kaggle.com
    Updated Nov 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Harshith (2021). Bicycle Accidents in Great Britain (1979 to 2018) [Dataset]. https://www.kaggle.com/johnharshith/bicycle-accidents-in-great-britain-1979-to-2018/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2021
    Dataset provided by
    Kaggle
    Authors
    John Harshith
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    https://lh3.googleusercontent.com/proxy/4ivUvH4DpmVTktg2zuEn9r9Zh7zs2oZ9LU9sqMpgv1fxVsPR79FiecR1e0-980dzsOcuG5Fazvlt71LzH9C5uLVP62PaZsUU4U652yzPdRzWi8GfNc8yK7AD" alt="Bicycle Accidents">

    Context

    This is a Dataset of Bicycle accidents in Great Britain from 1970 to 2018 from road types to gender casualties.

    Content

    This Dataset contains data such as the accident index, number of vehicles involved, number of casualties, date and time of accident, speed limit, road and weather conditions, day of accident and finally the road type in which the accident took place. It also includes the gender of person driving the bicycle, severity of the accident and the age group range of the victims.

    Inspiration

    Bicycle racing is recognised as an Olympic sport. Bicycle races are popular all over the world, especially in Europe. The countries most devoted to bicycle racing include Belgium, Denmark, France, Germany, Italy, the Netherlands, Spain and Switzerland. Other countries with international standing include Australia, Luxembourg, United Kingdom, United States and Colombia. Also being a big fan of the sport and the number of unfortunate accidents happening across Great Britain inspired me to share this Dataset from the following website https://data.world/gonzandrobles/bicycleaccidentsuk which can be referred for detailed analysis.

  9. Great Britain Road Accidents

    • kaggle.com
    Updated Oct 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrice C (2020). Great Britain Road Accidents [Dataset]. https://www.kaggle.com/datasets/pachriisk/great-britain-road-accidents/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Patrice C
    Area covered
    Great Britain, United Kingdom
    Description

    Context

    These files were taken from the Great Britain Road Accidents 2005_2016 published by the Department for Transport. Licensed under Open Government Licence. The dataset is maintained by Teng Li and was last updated about 3 years ago. The initial dataset is quite large so this sample was created to facilitate the completion of a course project via an open-source web application.

    The files provide detailed data about the circumstances of personal injury road accidents in Great Britain from 2005 onwards, the make of vehicles involved, and the consequential casualties. The statistics relate only to personal injury accidents on public roads that are reported to the police and subsequently recorded, using the STATS19 accident reporting form. Information on damage-only accidents, with no human casualties or accidents on private roads or car parks, are not included in this data.

    Content

    The complete dataset can be found at:

    https://www.kaggle.com/nichaoku/gbaccident0516

    The rows and columns of the dataset provide details of the date, time, number of accidents by severity, casualties, and conditions that may have contributed to the accidents that occured. Details in the casualty and vehicle files can be linked to the relevant accident by the “Accident_Index” field.

    A list of the variables contained in the files is provided along with the dataset.

    Acknowledgements

    Source: https://data.gov.uk/dataset/road-accidents-safety-data

  10. 🌊 Open Flood Risk by Postcode

    • kaggle.com
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 🌊 Open Flood Risk by Postcode [Dataset]. https://www.kaggle.com/datasets/mexwell/open-flood-risk-by-postcode/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 4, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    mexwell
    Description

    Open Flood Risk by Postcode is derived from the Environment Agency's Risk of Flooding from Rivers and Sea which allocates a risk level to areas in England, UK. Using postcode data from Open Postcode Geo, each English postcode is placed in its risk area, allowing a flood risk level to be allocated to a postcode.

    Fields

    • postcode
    • FID
    • PROB_4BAND
    • SUITABILITY
    • PUB_DATE
    • RISK_FOR_INSURANCE_SOP
    • easting
    • northing
    • latitude
    • longitude

    PROB_4BAND is the flood risk level, and can be one of the folowing:

    • High
    • Medium
    • Low
    • Very Low
    • None

    Note that where a postcode is outside a flood risk area, some of the column values will be NULL, represented as \N in this file.

    Documentation

    You can find full documentation on the Open Flood Risk by Postcode homepage.

    Acknowlegements

    Derived from Risk of Flooding from Rivers and Sea Derived from Open Postcode Geo Licensed under the OGL

    Foto von Luke Moss auf Unsplash

  11. Clickstream Data for Online Shopping

    • kaggle.com
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bojan Tunguz (2021). Clickstream Data for Online Shopping [Dataset]. https://www.kaggle.com/datasets/tunguz/clickstream-data-for-online-shopping/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Bojan Tunguz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Source:

    Mariusz Šapczyński, Cracow University of Economics, Poland, lapczynm '@' uek.krakow.pl Sylwester Białowąs, Poznan University of Economics and Business, Poland, sylwester.bialowas '@' ue.poznan.pl

    Data Set Information:

    The dataset contains information on clickstream from online store offering clothing for pregnant women. Data are from five months of 2008 and include, among others, product category, location of the photo on the page, country of origin of the IP address and product price in US dollars.

    Attribute Information:

    The dataset contains 14 variables described in a separate file (See 'Data set description')

    Relevant Papers:

    N/A

    Citation Request:

    If you use this dataset, please cite:

    Šapczyński M., Białowąs S. (2013) Discovering Patterns of Users' Behaviour in an E-shop - Comparison of Consumer Buying Behaviours in Poland and Other European Countries, “Studia Ekonomiczne†, nr 151, “La société de l'information : perspective européenne et globale : les usages et les risques d'Internet pour les citoyens et les consommateurs†, p. 144-153

    Data description ìe-shop clothing 2008î

    Variables:

    1. YEAR (2008)

    ========================================================

    2. MONTH -> from April (4) to August (8)

    ========================================================

    3. DAY -> day number of the month

    ========================================================

    4. ORDER -> sequence of clicks during one session

    ========================================================

    5. COUNTRY -> variable indicating the country of origin of the IP address with the

    following categories:

    1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)

    ========================================================

    6. SESSION ID -> variable indicating session id (short record)

    ========================================================

    7. PAGE 1 (MAIN CATEGORY) -> concerns the main product category:

    1-trousers 2-skirts 3-blouses 4-sale

    ========================================================

    8. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product

    (217 products)

    ========================================================

    9. COLOUR -> colour of product

    1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white

    ========================================================

    10. LOCATION -> photo location on the page, the screen has been divided into six parts:

    1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right

    ========================================================

    11. MODEL PHOTOGRAPHY -> variable with two categories:

    1-en face 2-profile

    ========================================================

    12. PRICE -> price in US dollars

    ========================================================

    13. PRICE 2 -> variable informing whether the price of a particular product is higher than

    the average price for the entire product category

    1-yes 2-no

    ========================================================

    14. PAGE -> page number within the e-store website (from 1 to 5)

    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++

  12. ICU availability by country and region

    • kaggle.com
    Updated Apr 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    saccodd (2020). ICU availability by country and region [Dataset]. https://www.kaggle.com/datasets/saccodd/icu-availability-by-country-and-region/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    saccodd
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    The purpose of this initiative is to build an integrated dataset on Intensive Care Units (ICUs) and their availability by country and region (at the highest regional granularity provided by the sources), using a data model standardized across countries.

    Currently, ICU data is stored in different country-specific sources, with a wide range of access points (national websites, APIs, excel or csv files, etc.)

    Given current COVID-19 crisis, we believe that this information should be provided with the following: * common standardized structure * single point of access * open to the public

    We hope that these datasets will further benefit researchers and help us in the fight against COVID-19.

    Countries and sources:

  13. Emirates Reviews Skytrax

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Osama Alaa Mohammed (2022). Emirates Reviews Skytrax [Dataset]. https://www.kaggle.com/datasets/osamaalaa2001/emirates-reviews-skytrax
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kaggle
    Authors
    Osama Alaa Mohammed
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset contains 2200 honest text reviews about Emirates Airline. the Skytrax platform provides reviews. Skytrax is a United Kingdom-based consultancy that runs an airline and airport review and ranking site.

  14. Ballon d'Or 2024 Nominees League Stats

    • kaggle.com
    Updated Sep 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Farzam Manafzadeh (2024). Ballon d'Or 2024 Nominees League Stats [Dataset]. https://www.kaggle.com/datasets/farzammanafzadeh/ballon-dor-2024-nominees-league-stats
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 15, 2024
    Dataset provided by
    Kaggle
    Authors
    Farzam Manafzadeh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains detailed league performance statistics for the nominees of the 2024 Ballon D'Or across major European football leagues. The stats cover the 2023-2024 season, showcasing metrics such as goals, assists, expected goals (xG), expected assists (xAG), progression metrics, and more.

    Men's Ballon d'Or 2024 Nominees:

    • Jude Bellingham (England, Real Madrid)
    • Hakan Çalhanoğlu (Turkey, Inter)
    • Dani Carvajal (Spain, Real Madrid)
    • Rúben Dias (Portugal, Manchester City)
    • Artem Dovbyk (Ukraine, Dnipro / Girona / Roma)
    • Phil Foden (England, Manchester City)
    • Alejandro Grimaldo (Spain, Bayer Leverkusen)
    • Erling Haaland (Norway, Manchester City)
    • Mats Hummels (Germany, Borussia Dortmund)
    • Harry Kane (England, Bayern Munich)
    • Toni Kroos (Germany, Real Madrid)
    • Ademola Lookman (Nigeria, Atalanta)
    • Emiliano Martínez (Argentina, Aston Villa)
    • Lautaro Martínez (Argentina, Inter )
    • Kylian Mbappé (France, Paris Saint-Germain / Real Madrid)
    • Martin Ødegaard (Norway, Arsenal)
    • Dani Olmo (Spain, Leipzig / Barcelona)
    • Cole Palmer (England, Manchester City / Chelsea)
    • Declan Rice (England, Arsenal)
    • Rodri (Spain, Manchester City)
    • Antonio Rüdiger (Germany, Real Madrid)
    • Bukayo Saka (England, Arsenal)
    • William Saliba (France, Arsenal)
    • Federico Valverde (Uruguay, Real Madrid)
    • Vinícius Júnior (Brazil, Real Madrid)
    • Vitinha (Portugal, Paris Saint-Germain)
    • Nico Williams (Spain, Athletic Club)
    • Florian Wirtz (Germany, Bayer Leverkusen)
    • Granit Xhaka (Switzerland, Bayer Leverkusen)
    • Lamine Yamal (Spain, Barcelona)

    The winner of the Men's Ballon d'Or goes to the best male player voted by a panel of soccer journalists representing the top 100 countries in the FIFA Men's Rankings.

    The Ballon d'Or ceremony will be held on Oct. 28, 2024.

    For the first time since 2003, though, Cristiano Ronaldo and Lionel Messi were not included among the nominees!

  15. Daikon (Diachronic Corpus)

    • kaggle.com
    Updated Aug 17, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Liling Tan (2017). Daikon (Diachronic Corpus) [Dataset]. https://www.kaggle.com/datasets/alvations/daikon/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 17, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Liling Tan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The Daikon Corpus was created during the Diachronic Text Evaluation task in SemEval-2015. The task was to create a system that can date a piece of text.

    For example, given a text snippet:

    “Dictator Saddam Hussein ordered his troops to march into Kuwait. After the invasion is condemned by the UN Security Council, the US has forged a coalition with allies. Today American troops are sent to Saudi Arabia in Operation Desert Shield, protecting Saudi Arabia from possible attack.”

    The text has clear temporal evidence with reference to a

    • historical figure (“Saddam Hussein”),
    • notable organization (“UN Security Council”)
    • factual event (“Operation Desert Shield”).

    Historically, we know that

    • Saddam Hussein lived between 1937 to 2006,
    • UN Security Council has existed since 1946
    • Operation Desert Shield (i.e. the Gulf War) occurred between 1990-1991

    Given the specific chronic deicticity (“today”) that indicates that the text is published during the Gulf War, we can conceive that the text snippet should be dated 1990-1991.

    Content

    The Daikon Corpus is made up of articles from the British Spectator news magazine from year 828 to 2008.

    The corpus contains 24,280 articles with 19 million tokens; the token count is calculated by summing the number of whitespaces plus 1 for each paragraph.

    The Daikon corpus is saved in the JSON format, where the outer most-structure is a list and the inner data structure is a key-value dictionary/hashmap that contains the:

    • url: URL where the original article resides
    • date: Date of the article
    • body: A list of paragraphs
    • title: Title of the text

    Note: If the url is broken, try removing the .html suffix of the url. e.g. change

    http://archive.spectator.co.uk/article/24th-september-2005/57/doctor-in-the-house.html 
    

    to

    http://archive.spectator.co.uk/article/24th-september-2005/57/doctor-in-the-house
    

    Citations

    Liling Tan and Noam Ordan. 2015. 
    USAAR-CHRONOS: Crawling the Web for Temporal Annotations. 
    In Proceedings of Ninth International Workshop on 
    Semantic Evaluation (SemEval 2015). Denver, USA.
    

    Task reference:

    Octavian Popescu and Carlo Strapparava. 
    SemEval 2015, Task 7: Diachronic Text Evaluation. 
    In Proceedings of Ninth International Workshop on 
    Semantic Evaluation (SemEval 2015). Denver, USA.
    

    Dataset image comes from Jonathan Pielmayer

    Inspiration

    Let's make an artificially intelligent "Flynn Carsen" !!

  16. Systimec_And_Banking_Crises

    • kaggle.com
    Updated Jun 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Abd Al-mgyd (2022). Systimec_And_Banking_Crises [Dataset]. https://www.kaggle.com/datasets/mohamedabdalmgyd/systimec-and-banking-crises
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2022
    Dataset provided by
    Kaggle
    Authors
    Mohamed Abd Al-mgyd
    Description

    (Banking And Systemic Crises)

    prepared by (Mohamed Abd Al-mgyd)

    https://github.com/1145267383/Systemic-And-Banking-Crises

    Dataset

    A)20160923_global_crisis_data:

    https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx

    This data was collected over many years by Carmen Reinhart (with her coauthors Ken Rogoff, Christoph Trebesch, and Vincent Reinhart). This data contains the banking crises of 70 countries, from 1800 AD to 2016 AD, with a total of 15,190 records and 16 variables. But the data stabilized after cleaning and adjusting to 8642 records and 17 variables.

    B)Label_Country: This data contains a description of the country whether it's Developing or Developed .

    Variable: Description:

    1-Case: ID Number for Country.

    2-Cc3: ID String for Country.

    3-Country : Name Country.

    4-Year: The date from 1800 to 2016.

    5-Banking_Crisis: Banking problems can often be traced to a decrease the value of banks' assets.

    A) due to a collapse in real estate prices or When the bank asset values decrease substantially . B) if a government stops paying its obligations, this can trigger a sharp decline in value of bonds.

    6-Systemic_Crisis : when many banks in a country are in serious solvency or liquidity problems at the same time—either:

    A) because there are all hits by the same outside shock. B) or because failure in one bank or a group of banks spreads to other banks in the system.

    7-Gold_Standard: The Country have crisis in Gold Standard.

    8-Exch_Usd: Exch local currency in USD, Except exch USD currency in GBP.

    9-Domestic_Debt_In_Default: The Country have domestic debt in default.

    10-Sovereign_External_Debt_1: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom and post-1975 defaults on Official External Creditors.

    11-Sovereign_External_Debt_2: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom but includes post-1975 defaults on Official External Creditors.

    12-Gdp_Weighted_Default:GDP Weighted Default for country.

    13-Inflation: Annual percentages of average consumer prices.

    14-Independence: Independence for country.

    15-Currency_Crises: The Country have crisis in Currency.

    16-Inflation_Crises: The Country have crisis in Inflation.

    17-Level_Country: The description of the country whether it's Developing or Developed.

  17. World Income Inequality Database

    • kaggle.com
    zip
    Updated Oct 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arman (2020). World Income Inequality Database [Dataset]. https://www.kaggle.com/mannmann2/world-income-inequality-database
    Explore at:
    zip(693569 bytes)Available download formats
    Dataset updated
    Oct 20, 2020
    Authors
    Arman
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    Source: https://www.wider.unu.edu/database/wiid User Guide: https://www.wider.unu.edu/sites/default/files/WIID/PDF/WIID-User_Guide_06MAY2020.pdf

    The World Income Inequality Database (WIID) contains information on income inequality in various countries and is maintained by the United Nations University-World Institute for Development Economics Research (UNU-WIDER). The database was originally compiled during 1997-99 for the research project Rising Income Inequality and Poverty Reduction, directed by Giovanni Andrea Corina. A revised and updated version of the database was published in June 2005 as part of the project Global Trends in Inequality and Poverty, directed by Tony Shorrocks and Guang Hua Wan. The database was revised in 2007 and a new version was launched in May 2008.

    The database contains data on inequality in the distribution of income in various countries. The central variable in the dataset is the Gini index, a measure of income distribution in a society. In addition, the dataset contains information on income shares by quintile or decile. The database contains data for 159 countries, including some historical entities. The temporal coverage varies substantially across countries. For some countries there is only one data entry; in other cases there are over 100 data points. The earliest entry is from 1867 (United Kingdom), the latest from 2003. The majority of the data (65%) cover the years from 1980 onwards. The 2008 update (version WIID2c) includes some major updates and quality improvements, in fact leading to a reduced number of variables in the new version. The new version has 334 new observations and several revisions/ corrections made in 2007 and 2008.

  18. DJ Mag Top 100 History Dataset

    • kaggle.com
    zip
    Updated Aug 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Koki Ando (2018). DJ Mag Top 100 History Dataset [Dataset]. https://www.kaggle.com/datasets/koki25ando/dj-mag-top-100-history-dataset
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 8, 2018
    Authors
    Koki Ando
    Description

    Context

    I just wanted to share the dataset I scraped from DJ Mag Official Website to create shiny visualization app.

    !! Dataset will be updated as soon as possible after this year's announcement on 21st October.

    Content

    DJ Magazine (aka DJ Mag) is a British monthly magazine dedicated to EDM and DJs. It was founded in 1991. Top 100 DJs is one of the magazine’s biggest property and it provides a list of the world’s most popular DJs every year since 2004. The poll attracted over 1 million votes in 2015, and now it is considered as one of the world’s biggest biggest music polls.

    For more information, visit https://djmag.com/.
    Scraping Script: DJ Mag Ranking Scraping Script

    • rvest & tidyverse packages are used to collect and clean data.

    Dataset includes all the DJ Mag ranking history from 2004 to 20017.

    • Year: year
    • Ranking: Ranking number
    • DJ: DJ name
    • Change: ranking change from last year

    Acknowledgements

    I really appreciate DJ Mag official for making dataset public and UNICEF for supporting the activity every year.

    Inspiration

    Can you find how has the EDM music industry has changed?
    Please share your reports using this dataset. Your contributions are always welcome!!!

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nm8883 (2021). English indices of deprivation 2015 & 2011 census [Dataset]. https://www.kaggle.com/nm8883/uk-census-data-with-uk-deprivation-index-2015/discussion
Organization logo

English indices of deprivation 2015 & 2011 census

Compare crime, education, health, wealth and other factors in the UK

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2021
Dataset provided by
Kaggle
Authors
nm8883
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Background

This analysis was conducted as part of a university module to compare age with socio-economic group in the UK and investigates unemployment levels with deprivation in England.

Content

The dataset includes the English Indices of Deprivation 2015 and the 2011 UK census data.

The English indices of deprivation measures relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices. More information can be found on the government website here. The Index of Multiple Deprivation ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area) and ranks them according to the following measures:

Income Deprivation Employment Deprivation Education, Skills and Training Deprivation Health Deprivation and Disability Crime Barriers to Housing and Services Living Environment Deprivation By including the 2011 UK census data and a lookup table (for combining the datasets) it is possible to see how age and gender corresponds to areas of deprivation.

Acknowledgements

All data has been made freely available by the UK Government and can be accessed here. It is strongly recommended that the guidance notes for this dataset are read before performing any analysis.

Search
Clear search
Close search
Google apps
Main menu