18 datasets found

English indices of deprivation 2015 & 2011 census
kaggle.com
Updated Nov 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nm8883 (2021). English indices of deprivation 2015 & 2011 census [Dataset]. https://www.kaggle.com/nm8883/uk-census-data-with-uk-deprivation-index-2015/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 1, 2021
Dataset provided by
Kaggle
Authors
nm8883
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Background

This analysis was conducted as part of a university module to compare age with socio-economic group in the UK and investigates unemployment levels with deprivation in England.

Content

The dataset includes the English Indices of Deprivation 2015 and the 2011 UK census data.

The English indices of deprivation measures relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices. More information can be found on the government website here. The Index of Multiple Deprivation ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area) and ranks them according to the following measures:

Income Deprivation Employment Deprivation Education, Skills and Training Deprivation Health Deprivation and Disability Crime Barriers to Housing and Services Living Environment Deprivation By including the 2011 UK census data and a lookup table (for combining the datasets) it is possible to see how age and gender corresponds to areas of deprivation.

Acknowledgements

All data has been made freely available by the UK Government and can be accessed here. It is strongly recommended that the guidance notes for this dataset are read before performing any analysis.
London Bike-Share Usage Dataset
kaggle.com
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Svetlana Kalacheva (2024). London Bike-Share Usage Dataset [Dataset]. https://www.kaggle.com/datasets/kalacheva/london-bike-share-usage-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2024
Dataset provided by
Kaggle
Authors
Svetlana Kalacheva
Area covered
London
Description
Context

This dataset contains detailed records of 776,527 bicycle journeys from the Transport for London (TfL) Cycle Hire system spanning from August 1 to August 31, 2023. The TfL Cycle Hire initiative provides publicly accessible bicycles for rent across London, promoting sustainable transportation and physical fitness. This comprehensive dataset captures individual trip data, which can be utilized to analyze urban mobility patterns, station performance, and cycling preferences among London's diverse population. This dataset provides a snapshot of cycling activity during the month, including start and end details for each journey, the bicycle used, and the duration of hire.

Dataset Usage

The dataset can be used for: - Time Series Forecasting: Predict future bike rental demands based on historical usage patterns. - Geospatial Analysis: Map the start and end locations of trips to identify popular routes and areas with high cycling traffic. - Customer Behavior Analysis: Analyze the duration and frequency of rentals to understand user preferences and habits. - Predictive Maintenance: Use trip duration and frequency data to predict when bikes are likely to require maintenance or replacement. - Multivariate Analysis: Explore relationships between different variables, such as trip durations, station popularity, and time of day, to uncover underlying patterns in bike usage.

Attribute Information

The dataset includes the following variables for each ride: - Number: A unique identifier for each trip (Trip ID). - Start Date: The date and time when the trip began. - Start Station Number: The identifier for the starting station. - Start Station: The name of the starting station. - End Date: The date and time when the trip ended. - End Station Number: The identifier for the ending station. - End Station: The name of the ending station. - Bike Number: A unique identifier for the bicycle used. - Bike Model: The model of the bicycle used. - Total Duration: The total time duration of the trip (in a human-readable format). - Total Duration (ms): The total time duration of the trip in milliseconds.

Source This dataset was sourced directly from the Transport for London's official website, which provides open data to encourage public use and analysis. More details and related datasets can be found at Transport for London (TfL).

Reference: Transport for London. (August 2023). TfL Cycle Hire Trip Data. Retrieved [Date Retrieved], from https://tfl.gov.uk/info-for/open-data-users/our-open-data.
A
‘E-Shop Clothing Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘E-Shop Clothing Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-e-shop-clothing-dataset-5607/latest
Explore at:
Dataset updated
Aug 11, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘E-Shop Clothing Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adityawisnugrahas/eshop-clothing-dataset on 11 August 2021.

--- Dataset description provided by original source is as follows ---

Data description “e-shop clothing 2008”

Variables:

YEAR (2008)

========================================================

MONTH -> from April (4) to August (8)

========================================================

DAY -> day number of the month

========================================================

ORDER -> sequence of clicks during one session

========================================================

COUNTRY -> variable indicating the country of origin of the IP address with the following categories:

1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)

========================================================

SESSION ID -> variable indicating session id (short record)

========================================================

PAGE 1 (MAIN CATEGORY) -> concerns the main product category: 1-trousers 2-skirts 3-blouses 4-sale

========================================================

PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product (217 products)

========================================================

COLOUR -> colour of product

1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white

========================================================

LOCATION -> photo location on the page, the screen has been divided into six parts:

1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right

========================================================

MODEL PHOTOGRAPHY -> variable with two categories:

1-en face 2-profile

========================================================

PRICE -> price in US dollars

========================================================

PRICE 2 -> variable informing whether the price of a particular product is higher than the average price for the entire product category

1-yes 2-no

========================================================

PAGE -> page number within the e-store website (from 1 to 5)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++

I want to know how to solve this data regarding any problem (clustering, regression, classification, EDA)

Source: https://archive.ics.uci.edu/ml/datasets/clickstream+data+for+online+shopping

--- Original source retains full ownership of the source dataset ---
Books from Blackwell's Bookshop
kaggle.com
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Artur Sannikov (2022). Books from Blackwell's Bookshop [Dataset]. https://www.kaggle.com/arthurio/books-from-blackwells-bookshop/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Artur Sannikov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Blackwell’s is a British book retailer founded in 1879 and it has more than 40 bookstores in the UK. They also sell books though their official site.

Content

Books metadata downloaded from the official site using web scraping and API. In the cleaned version price names were renamed to distinguish between prices in pounds and euros and some columns were transformed to numbers.

Inspiration

You can use the data to:

Analyze prices in different book categories (like the genre)

Perform sentimental analysis of blurbs and review

Predict prices based on the book’s dimensions and weight
Open Postcode Elevation
kaggle.com
Updated Aug 4, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GetTheData (2017). Open Postcode Elevation [Dataset]. https://www.kaggle.com/getthedata/open-postcode-elevation/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2017
Dataset provided by
Kaggle
Authors
GetTheData
Description
Context

This open dataset takes each British postcode, locates the centroid, and assigns an elevation based on the nearest point on an Ordnance Survey contour line to that centroid.

Also known as altitude, elevation is given as distance above sea level in metres.

Content

postcode

elevation

Acknowledgements

Documentation and the latest version can be found at the Open Postcode Elevation homepage.

Postcode data is from the ONS Postcode Directory.

Elevation is from OS Terrain 50.

Published and maintained by GetTheData.

Licence

Open data licensed under the Open Government Licence.

Attribution required.

Attribution

Contains OS data © Crown copyright and database right 2017

Contains Royal Mail data © Royal Mail copyright and database right 2017

Contains National Statistics data © Crown copyright and database right 2017

Credit

Photo by Maojin Lang on Unsplash
UK Stations opening dates
kaggle.com
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andy Loring (2025). UK Stations opening dates [Dataset]. https://www.kaggle.com/datasets/andyloring/uk-stations-opening-dates/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Andy Loring
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United Kingdom
Description
The opening dates of all current UK railway stations, manually extracted from the chronology information (in PDF format) from this site: https://rchs.org.uk/railway-passenger-stations-in-great-britain-a-chronology/

Where a station has continually existed, but moved location (along the same line), the earliest opening date is used, together with the most recent 'resiting' date. Where a station closed, but has subsequently re-opened, I have shown both the original opening date and the date from which the station has currently been open since. Where a previous station had existed (on the same line), within a mile of the current station, then I have deemed the current station as being 're-opened' and resited (even if the previous station had a different name). (Where a station has been temporarily closed, eg due to renovation or line improvements, then I have just treated this as being continuously open. Some stations are noted as being closed during the First World War 1914-1918, so where this is the case then the dates have been noted in their own column, but the station is assumed to have been continuously open).

Details of the columns are (all dates are in UK format dd/mm/yyyy): - Three Letter Code - the station code by which all UK railway stations can be identified (initially used as Computer Reservation System (CRS) codes) - occasionally a station will have/use more than 1 CRS code, in which case a separate entry is shown per code. - station_name - the name commonly used for the station (although this can change or vary between listings) - Status - The current status of the station/record (see further details below) - Currently Opened - the date from which the station has currently be consistently opened since - Year of current opening - just the year part of the above date - Resited (most recent) - if the station has moved location, or been remodelled in its current location, then the last time this happened - Replaced different station - indicator if the station has replaced another station with a different name - Originally Opened - the date the station was originally opened, if different from its currently opened date - Year of Original opening - the year part of the above date (where an Originally Opened date exists), or a repeat of the 'Year of current opening' value - Closed - The date the station closed (if it has subsequently re-opened, or it is currently closed) - WW1 Closed - The date a station closed during the First World War - WW1 Re-open - The date a station re-opened after the First World War - Comment - any further textual information relevant to the other columns

The 'Status' column may have the following values: Open - the station is currently operational Re-Open - the station is currently operational, but had previously had a period of inactivity (or it replaced another station that was inactive) Duplicate - a duplicate record for where there is more than 1 'Three Letter Code' for that station N/A - a station that is not on the current national rail network (but may previously have been, or may be used for special services) dummy - a blank record for the year before the first of the other records (to give a zero data point)
British Job Agency Employment
kaggle.com
Updated Jul 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul (2018). British Job Agency Employment [Dataset]. https://www.kaggle.com/rahul025/error-detection/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rahul
Area covered
United Kingdom
Description
Auditing and Cleansing the Job dataset

The dataset description is shown below:

Columns and its Description

Id : 8 digit Id of the job advertisement,

Title: Title of the advertised job position,

Location: Location of the advertised job position,

ContractType: The contract type of the advertised job position, could be full-time, part-time or non-specified,

ContractTime: The contract time of the advertised job position, could be permanent, contract or non-specified,

Company: Company (employer) of the advertised job position,

Category: The Category of the advertised job position, e.g., IT jobs, Engineering Jobs, etc.

Salary per annum: Annual Salary of the advertised job position, e.g., 80000,

OpenDate: The opening time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,

CloseDate: The closing time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,

SourceName: The website where the job position is advertised.

In this task, you are required to inspect and audit the data (dataset1_with_error.csv) to identify the data problems, and then fix the problems. Different generic and major data problems could be found in the data might include:

Lexical errors Irregularities Violations of the Integrity constraint. Inconsistency In the end, save the error-free dataset in dataset1_solution.csv. The number of records in your solution should be the same as the number of those in the input file.
Bicycle Accidents in Great Britain (1979 to 2018)
kaggle.com
Updated Nov 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Harshith (2021). Bicycle Accidents in Great Britain (1979 to 2018) [Dataset]. https://www.kaggle.com/johnharshith/bicycle-accidents-in-great-britain-1979-to-2018/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2021
Dataset provided by
Kaggle
Authors
John Harshith
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
United Kingdom
Description
https://lh3.googleusercontent.com/proxy/4ivUvH4DpmVTktg2zuEn9r9Zh7zs2oZ9LU9sqMpgv1fxVsPR79FiecR1e0-980dzsOcuG5Fazvlt71LzH9C5uLVP62PaZsUU4U652yzPdRzWi8GfNc8yK7AD" alt="Bicycle Accidents">

Context

This is a Dataset of Bicycle accidents in Great Britain from 1970 to 2018 from road types to gender casualties.

Content

This Dataset contains data such as the accident index, number of vehicles involved, number of casualties, date and time of accident, speed limit, road and weather conditions, day of accident and finally the road type in which the accident took place. It also includes the gender of person driving the bicycle, severity of the accident and the age group range of the victims.

Inspiration

Bicycle racing is recognised as an Olympic sport. Bicycle races are popular all over the world, especially in Europe. The countries most devoted to bicycle racing include Belgium, Denmark, France, Germany, Italy, the Netherlands, Spain and Switzerland. Other countries with international standing include Australia, Luxembourg, United Kingdom, United States and Colombia. Also being a big fan of the sport and the number of unfortunate accidents happening across Great Britain inspired me to share this Dataset from the following website https://data.world/gonzandrobles/bicycleaccidentsuk which can be referred for detailed analysis.
Great Britain Road Accidents
kaggle.com
Updated Oct 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrice C (2020). Great Britain Road Accidents [Dataset]. https://www.kaggle.com/datasets/pachriisk/great-britain-road-accidents/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Patrice C
Area covered
Great Britain, United Kingdom
Description
Context

These files were taken from the Great Britain Road Accidents 2005_2016 published by the Department for Transport. Licensed under Open Government Licence. The dataset is maintained by Teng Li and was last updated about 3 years ago. The initial dataset is quite large so this sample was created to facilitate the completion of a course project via an open-source web application.

The files provide detailed data about the circumstances of personal injury road accidents in Great Britain from 2005 onwards, the make of vehicles involved, and the consequential casualties. The statistics relate only to personal injury accidents on public roads that are reported to the police and subsequently recorded, using the STATS19 accident reporting form. Information on damage-only accidents, with no human casualties or accidents on private roads or car parks, are not included in this data.

Content

The complete dataset can be found at:

https://www.kaggle.com/nichaoku/gbaccident0516

The rows and columns of the dataset provide details of the date, time, number of accidents by severity, casualties, and conditions that may have contributed to the accidents that occured. Details in the casualty and vehicle files can be linked to the relevant accident by the “Accident_Index” field.

A list of the variables contained in the files is provided along with the dataset.

Acknowledgements

Source: https://data.gov.uk/dataset/road-accidents-safety-data
🌊 Open Flood Risk by Postcode
kaggle.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mexwell (2023). 🌊 Open Flood Risk by Postcode [Dataset]. https://www.kaggle.com/datasets/mexwell/open-flood-risk-by-postcode/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 4, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
mexwell
Description
Open Flood Risk by Postcode is derived from the Environment Agency's Risk of Flooding from Rivers and Sea which allocates a risk level to areas in England, UK. Using postcode data from Open Postcode Geo, each English postcode is placed in its risk area, allowing a flood risk level to be allocated to a postcode.

Fields

postcode

FID

PROB_4BAND

SUITABILITY

PUB_DATE

RISK_FOR_INSURANCE_SOP

easting

northing

latitude

longitude

PROB_4BAND is the flood risk level, and can be one of the folowing:

High

Medium

Low

Very Low

None

Note that where a postcode is outside a flood risk area, some of the column values will be NULL, represented as \N in this file.

Documentation

You can find full documentation on the Open Flood Risk by Postcode homepage.

Acknowlegements

Derived from Risk of Flooding from Rivers and Sea Derived from Open Postcode Geo Licensed under the OGL

Foto von Luke Moss auf Unsplash
Clickstream Data for Online Shopping
kaggle.com
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bojan Tunguz (2021). Clickstream Data for Online Shopping [Dataset]. https://www.kaggle.com/datasets/tunguz/clickstream-data-for-online-shopping/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Bojan Tunguz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Source:

Mariusz Å apczyÅ„ski, Cracow University of Economics, Poland, lapczynm '@' uek.krakow.pl Sylwester BiaÅ‚owÄ…s, Poznan University of Economics and Business, Poland, sylwester.bialowas '@' ue.poznan.pl

Data Set Information:

The dataset contains information on clickstream from online store offering clothing for pregnant women. Data are from five months of 2008 and include, among others, product category, location of the photo on the page, country of origin of the IP address and product price in US dollars.

Attribute Information:

The dataset contains 14 variables described in a separate file (See 'Data set description')

Relevant Papers:

N/A

Citation Request:

If you use this dataset, please cite:

Å apczyÅ„ski M., BiaÅ‚owÄ…s S. (2013) Discovering Patterns of Users' Behaviour in an E-shop - Comparison of Consumer Buying Behaviours in Poland and Other European Countries, â€œStudia Ekonomiczneâ€ , nr 151, â€œLa sociÃ©tÃ© de l'information : perspective europÃ©enne et globale : les usages et les risques d'Internet pour les citoyens et les consommateursâ€ , p. 144-153

Data description ìe-shop clothing 2008î

Variables:

1. YEAR (2008)

========================================================

2. MONTH -> from April (4) to August (8)

========================================================

3. DAY -> day number of the month

========================================================

4. ORDER -> sequence of clicks during one session

========================================================

5. COUNTRY -> variable indicating the country of origin of the IP address with the

following categories:

1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)

========================================================

6. SESSION ID -> variable indicating session id (short record)

========================================================

7. PAGE 1 (MAIN CATEGORY) -> concerns the main product category:

1-trousers 2-skirts 3-blouses 4-sale

========================================================

8. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product

(217 products)

========================================================

9. COLOUR -> colour of product

1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white

========================================================

10. LOCATION -> photo location on the page, the screen has been divided into six parts:

1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right

========================================================

11. MODEL PHOTOGRAPHY -> variable with two categories:

1-en face 2-profile

========================================================

12. PRICE -> price in US dollars

========================================================

13. PRICE 2 -> variable informing whether the price of a particular product is higher than

the average price for the entire product category

1-yes 2-no

========================================================

14. PAGE -> page number within the e-store website (from 1 to 5)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ICU availability by country and region
kaggle.com
Updated Apr 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
saccodd (2020). ICU availability by country and region [Dataset]. https://www.kaggle.com/datasets/saccodd/icu-availability-by-country-and-region/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
saccodd
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

The purpose of this initiative is to build an integrated dataset on Intensive Care Units (ICUs) and their availability by country and region (at the highest regional granularity provided by the sources), using a data model standardized across countries.

Currently, ICU data is stored in different country-specific sources, with a wide range of access points (national websites, APIs, excel or csv files, etc.)

Given current COVID-19 crisis, we believe that this information should be provided with the following: * common standardized structure * single point of access * open to the public

We hope that these datasets will further benefit researchers and help us in the fight against COVID-19.

Countries and sources:

Italy (as of 2019 - source: Ministero della Salute)

United Kingdom

England (as of 2020 - source: NHS England)

Wales (as of 2019 - source: StatsWales)

Scotland (as of 1998, projected on 2020 whole figure - sources: Scottish Health Directorate & Herald Scotland)

Northern Ireland (as of 2020, w/o sub-area granularity - source: Derry Journal)

US (as of 2018 - source: Harvard Global Health Institute)

Spain (as of 2013 - source: Medicina Intensiva)

Germany (updated daily, not complete - source DIVI Intensiv Register)
Emirates Reviews Skytrax
kaggle.com
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Osama Alaa Mohammed (2022). Emirates Reviews Skytrax [Dataset]. https://www.kaggle.com/datasets/osamaalaa2001/emirates-reviews-skytrax
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2022
Dataset provided by
Kaggle
Authors
Osama Alaa Mohammed
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset contains 2200 honest text reviews about Emirates Airline. the Skytrax platform provides reviews. Skytrax is a United Kingdom-based consultancy that runs an airline and airport review and ranking site.
Ballon d'Or 2024 Nominees League Stats
kaggle.com
Updated Sep 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Farzam Manafzadeh (2024). Ballon d'Or 2024 Nominees League Stats [Dataset]. https://www.kaggle.com/datasets/farzammanafzadeh/ballon-dor-2024-nominees-league-stats
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2024
Dataset provided by
Kaggle
Authors
Farzam Manafzadeh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains detailed league performance statistics for the nominees of the 2024 Ballon D'Or across major European football leagues. The stats cover the 2023-2024 season, showcasing metrics such as goals, assists, expected goals (xG), expected assists (xAG), progression metrics, and more.

Men's Ballon d'Or 2024 Nominees:

Jude Bellingham (England, Real Madrid)

Hakan Çalhanoğlu (Turkey, Inter)

Dani Carvajal (Spain, Real Madrid)

Rúben Dias (Portugal, Manchester City)

Artem Dovbyk (Ukraine, Dnipro / Girona / Roma)

Phil Foden (England, Manchester City)

Alejandro Grimaldo (Spain, Bayer Leverkusen)

Erling Haaland (Norway, Manchester City)

Mats Hummels (Germany, Borussia Dortmund)

Harry Kane (England, Bayern Munich)

Toni Kroos (Germany, Real Madrid)

Ademola Lookman (Nigeria, Atalanta)

Emiliano Martínez (Argentina, Aston Villa)

Lautaro Martínez (Argentina, Inter )

Kylian Mbappé (France, Paris Saint-Germain / Real Madrid)

Martin Ødegaard (Norway, Arsenal)

Dani Olmo (Spain, Leipzig / Barcelona)

Cole Palmer (England, Manchester City / Chelsea)

Declan Rice (England, Arsenal)

Rodri (Spain, Manchester City)

Antonio Rüdiger (Germany, Real Madrid)

Bukayo Saka (England, Arsenal)

William Saliba (France, Arsenal)

Federico Valverde (Uruguay, Real Madrid)

Vinícius Júnior (Brazil, Real Madrid)

Vitinha (Portugal, Paris Saint-Germain)

Nico Williams (Spain, Athletic Club)

Florian Wirtz (Germany, Bayer Leverkusen)

Granit Xhaka (Switzerland, Bayer Leverkusen)

Lamine Yamal (Spain, Barcelona)

The winner of the Men's Ballon d'Or goes to the best male player voted by a panel of soccer journalists representing the top 100 countries in the FIFA Men's Rankings.

The Ballon d'Or ceremony will be held on Oct. 28, 2024.

For the first time since 2003, though, Cristiano Ronaldo and Lionel Messi were not included among the nominees!
Daikon (Diachronic Corpus)
kaggle.com
Updated Aug 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liling Tan (2017). Daikon (Diachronic Corpus) [Dataset]. https://www.kaggle.com/datasets/alvations/daikon/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 17, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Liling Tan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The Daikon Corpus was created during the Diachronic Text Evaluation task in SemEval-2015. The task was to create a system that can date a piece of text.

For example, given a text snippet:

“Dictator Saddam Hussein ordered his troops to march into Kuwait. After the invasion is condemned by the UN Security Council, the US has forged a coalition with allies. Today American troops are sent to Saudi Arabia in Operation Desert Shield, protecting Saudi Arabia from possible attack.”

The text has clear temporal evidence with reference to a

historical figure (“Saddam Hussein”),

notable organization (“UN Security Council”)

factual event (“Operation Desert Shield”).

Historically, we know that

Saddam Hussein lived between 1937 to 2006,

UN Security Council has existed since 1946

Operation Desert Shield (i.e. the Gulf War) occurred between 1990-1991

Given the specific chronic deicticity (“today”) that indicates that the text is published during the Gulf War, we can conceive that the text snippet should be dated 1990-1991.

Content

The Daikon Corpus is made up of articles from the British Spectator news magazine from year 828 to 2008.

The corpus contains 24,280 articles with 19 million tokens; the token count is calculated by summing the number of whitespaces plus 1 for each paragraph.

The Daikon corpus is saved in the JSON format, where the outer most-structure is a list and the inner data structure is a key-value dictionary/hashmap that contains the:

url: URL where the original article resides

date: Date of the article

body: A list of paragraphs

title: Title of the text

Note: If the url is broken, try removing the .html suffix of the url. e.g. change

http://archive.spectator.co.uk/article/24th-september-2005/57/doctor-in-the-house.html

to

http://archive.spectator.co.uk/article/24th-september-2005/57/doctor-in-the-house

Citations

Liling Tan and Noam Ordan. 2015. USAAR-CHRONOS: Crawling the Web for Temporal Annotations. In Proceedings of Ninth International Workshop on Semantic Evaluation (SemEval 2015). Denver, USA.

Task reference:

Octavian Popescu and Carlo Strapparava. SemEval 2015, Task 7: Diachronic Text Evaluation. In Proceedings of Ninth International Workshop on Semantic Evaluation (SemEval 2015). Denver, USA.

Dataset image comes from Jonathan Pielmayer

Inspiration

Let's make an artificially intelligent "Flynn Carsen" !!
Systimec_And_Banking_Crises
kaggle.com
Updated Jun 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Abd Al-mgyd (2022). Systimec_And_Banking_Crises [Dataset]. https://www.kaggle.com/datasets/mohamedabdalmgyd/systimec-and-banking-crises
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 1, 2022
Dataset provided by
Kaggle
Authors
Mohamed Abd Al-mgyd
Description
(Banking And Systemic Crises)

prepared by (Mohamed Abd Al-mgyd)

https://github.com/1145267383/Systemic-And-Banking-Crises

Web APP > https://crises.herokuapp.com/

Dataset

A)20160923_global_crisis_data:

https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx

This data was collected over many years by Carmen Reinhart (with her coauthors Ken Rogoff, Christoph Trebesch, and Vincent Reinhart). This data contains the banking crises of 70 countries, from 1800 AD to 2016 AD, with a total of 15,190 records and 16 variables. But the data stabilized after cleaning and adjusting to 8642 records and 17 variables.

B)Label_Country: This data contains a description of the country whether it's Developing or Developed .

Variable: Description:

1-Case: ID Number for Country.

2-Cc3: ID String for Country.

3-Country : Name Country.

4-Year: The date from 1800 to 2016.

5-Banking_Crisis: Banking problems can often be traced to a decrease the value of banks' assets.

A) due to a collapse in real estate prices or When the bank asset values decrease substantially . B) if a government stops paying its obligations, this can trigger a sharp decline in value of bonds.

6-Systemic_Crisis : when many banks in a country are in serious solvency or liquidity problems at the same time—either:

A) because there are all hits by the same outside shock. B) or because failure in one bank or a group of banks spreads to other banks in the system.

7-Gold_Standard: The Country have crisis in Gold Standard.

8-Exch_Usd: Exch local currency in USD, Except exch USD currency in GBP.

9-Domestic_Debt_In_Default: The Country have domestic debt in default.

10-Sovereign_External_Debt_1: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom and post-1975 defaults on Official External Creditors.

11-Sovereign_External_Debt_2: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom but includes post-1975 defaults on Official External Creditors.

12-Gdp_Weighted_Default:GDP Weighted Default for country.

13-Inflation: Annual percentages of average consumer prices.

14-Independence: Independence for country.

15-Currency_Crises: The Country have crisis in Currency.

16-Inflation_Crises: The Country have crisis in Inflation.

17-Level_Country: The description of the country whether it's Developing or Developed.
World Income Inequality Database
kaggle.com
zip
Updated Oct 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arman (2020). World Income Inequality Database [Dataset]. https://www.kaggle.com/mannmann2/world-income-inequality-database
Explore at:
zip(693569 bytes)Available download formats
Dataset updated
Oct 20, 2020
Authors
Arman
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
World
Description
Source: https://www.wider.unu.edu/database/wiid User Guide: https://www.wider.unu.edu/sites/default/files/WIID/PDF/WIID-User_Guide_06MAY2020.pdf

The World Income Inequality Database (WIID) contains information on income inequality in various countries and is maintained by the United Nations University-World Institute for Development Economics Research (UNU-WIDER). The database was originally compiled during 1997-99 for the research project Rising Income Inequality and Poverty Reduction, directed by Giovanni Andrea Corina. A revised and updated version of the database was published in June 2005 as part of the project Global Trends in Inequality and Poverty, directed by Tony Shorrocks and Guang Hua Wan. The database was revised in 2007 and a new version was launched in May 2008.

The database contains data on inequality in the distribution of income in various countries. The central variable in the dataset is the Gini index, a measure of income distribution in a society. In addition, the dataset contains information on income shares by quintile or decile. The database contains data for 159 countries, including some historical entities. The temporal coverage varies substantially across countries. For some countries there is only one data entry; in other cases there are over 100 data points. The earliest entry is from 1867 (United Kingdom), the latest from 2003. The majority of the data (65%) cover the years from 1980 onwards. The 2008 update (version WIID2c) includes some major updates and quality improvements, in fact leading to a reduced number of variables in the new version. The new version has 334 new observations and several revisions/ corrections made in 2007 and 2008.
DJ Mag Top 100 History Dataset
kaggle.com
zip
Updated Aug 8, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Koki Ando (2018). DJ Mag Top 100 History Dataset [Dataset]. https://www.kaggle.com/datasets/koki25ando/dj-mag-top-100-history-dataset
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Aug 8, 2018
Authors
Koki Ando
Description
Context

I just wanted to share the dataset I scraped from DJ Mag Official Website to create shiny visualization app.

!! Dataset will be updated as soon as possible after this year's announcement on 21st October.

Content

DJ Magazine (aka DJ Mag) is a British monthly magazine dedicated to EDM and DJs. It was founded in 1991. Top 100 DJs is one of the magazine’s biggest property and it provides a list of the world’s most popular DJs every year since 2004. The poll attracted over 1 million votes in 2015, and now it is considered as one of the world’s biggest biggest music polls.

For more information, visit https://djmag.com/.
Scraping Script: DJ Mag Ranking Scraping Script

rvest & tidyverse packages are used to collect and clean data.

Dataset includes all the DJ Mag ranking history from 2004 to 20017.

Year: year

Ranking: Ranking number

DJ: DJ name

Change: ranking change from last year

Acknowledgements

I really appreciate DJ Mag official for making dataset public and UNICEF for supporting the activity every year.

Inspiration

Can you find how has the EDM music industry has changed?
Please share your reports using this dataset. Your contributions are always welcome!!!
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

nm8883 (2021). English indices of deprivation 2015 & 2011 census [Dataset]. https://www.kaggle.com/nm8883/uk-census-data-with-uk-deprivation-index-2015/discussion

English indices of deprivation 2015 & 2011 census

Compare crime, education, health, wealth and other factors in the UK

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 1, 2021

Dataset provided by

Kaggle

Authors

nm8883

License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

Background

This analysis was conducted as part of a university module to compare age with socio-economic group in the UK and investigates unemployment levels with deprivation in England.

Content

The dataset includes the English Indices of Deprivation 2015 and the 2011 UK census data.

The English indices of deprivation measures relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices. More information can be found on the government website here. The Index of Multiple Deprivation ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area) and ranks them according to the following measures:

Income Deprivation Employment Deprivation Education, Skills and Training Deprivation Health Deprivation and Disability Crime Barriers to Housing and Services Living Environment Deprivation By including the 2011 UK census data and a lookup table (for combining the datasets) it is possible to see how age and gender corresponds to areas of deprivation.

Acknowledgements

All data has been made freely available by the UK Government and can be accessed here. It is strongly recommended that the guidance notes for this dataset are read before performing any analysis.

Clear search

Close search

Google apps

Main menu

English indices of deprivation 2015 & 2011 census

Background

Content

Acknowledgements

London Bike-Share Usage Dataset

Context

Dataset Usage

Attribute Information

‘E-Shop Clothing Dataset’ analyzed by Analyst-2

Books from Blackwell's Bookshop

Context

Content

Inspiration

Open Postcode Elevation

Context

Content

Acknowledgements

Licence

Attribution

Credit

UK Stations opening dates

British Job Agency Employment

Auditing and Cleansing the Job dataset

Columns and its Description

Bicycle Accidents in Great Britain (1979 to 2018)

Context

Content

Inspiration

Great Britain Road Accidents

Context

Content

Acknowledgements

🌊 Open Flood Risk by Postcode

Fields

PROB_4BAND is the flood risk level, and can be one of the folowing:

Documentation

Acknowlegements

Clickstream Data for Online Shopping

Source:

Data Set Information:

Attribute Information:

Relevant Papers:

Citation Request:

Data description ìe-shop clothing 2008î

Variables:

1. YEAR (2008)

2. MONTH -> from April (4) to August (8)

3. DAY -> day number of the month

4. ORDER -> sequence of clicks during one session

5. COUNTRY -> variable indicating the country of origin of the IP address with the

6. SESSION ID -> variable indicating session id (short record)

7. PAGE 1 (MAIN CATEGORY) -> concerns the main product category:

8. PAGE 2 (CLOTHING MODEL) -> contains information about the code for each product

9. COLOUR -> colour of product

10. LOCATION -> photo location on the page, the screen has been divided into six parts:

11. MODEL PHOTOGRAPHY -> variable with two categories:

12. PRICE -> price in US dollars

13. PRICE 2 -> variable informing whether the price of a particular product is higher than

14. PAGE -> page number within the e-store website (from 1 to 5)

ICU availability by country and region

Description

Countries and sources:

Emirates Reviews Skytrax

Ballon d'Or 2024 Nominees League Stats

Men's Ballon d'Or 2024 Nominees:

The Ballon d'Or ceremony will be held on Oct. 28, 2024.

Daikon (Diachronic Corpus)

Context

Content

Citations

Inspiration

Systimec_And_Banking_Crises

(Banking And Systemic Crises)

prepared by (Mohamed Abd Al-mgyd)

Dataset

Variable: Description:

World Income Inequality Database

DJ Mag Top 100 History Dataset

Context

Content