http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This analysis was conducted as part of a university module to compare age with socio-economic group in the UK and investigates unemployment levels with deprivation in England.
The dataset includes the English Indices of Deprivation 2015 and the 2011 UK census data.
The English indices of deprivation measures relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices. More information can be found on the government website here. The Index of Multiple Deprivation ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area) and ranks them according to the following measures:
Income Deprivation Employment Deprivation Education, Skills and Training Deprivation Health Deprivation and Disability Crime Barriers to Housing and Services Living Environment Deprivation By including the 2011 UK census data and a lookup table (for combining the datasets) it is possible to see how age and gender corresponds to areas of deprivation.
All data has been made freely available by the UK Government and can be accessed here. It is strongly recommended that the guidance notes for this dataset are read before performing any analysis.
This dataset contains detailed records of 776,527 bicycle journeys from the Transport for London (TfL) Cycle Hire system spanning from August 1 to August 31, 2023. The TfL Cycle Hire initiative provides publicly accessible bicycles for rent across London, promoting sustainable transportation and physical fitness. This comprehensive dataset captures individual trip data, which can be utilized to analyze urban mobility patterns, station performance, and cycling preferences among London's diverse population. This dataset provides a snapshot of cycling activity during the month, including start and end details for each journey, the bicycle used, and the duration of hire.
The dataset can be used for: - Time Series Forecasting: Predict future bike rental demands based on historical usage patterns. - Geospatial Analysis: Map the start and end locations of trips to identify popular routes and areas with high cycling traffic. - Customer Behavior Analysis: Analyze the duration and frequency of rentals to understand user preferences and habits. - Predictive Maintenance: Use trip duration and frequency data to predict when bikes are likely to require maintenance or replacement. - Multivariate Analysis: Explore relationships between different variables, such as trip durations, station popularity, and time of day, to uncover underlying patterns in bike usage.
The dataset includes the following variables for each ride: - Number: A unique identifier for each trip (Trip ID). - Start Date: The date and time when the trip began. - Start Station Number: The identifier for the starting station. - Start Station: The name of the starting station. - End Date: The date and time when the trip ended. - End Station Number: The identifier for the ending station. - End Station: The name of the ending station. - Bike Number: A unique identifier for the bicycle used. - Bike Model: The model of the bicycle used. - Total Duration: The total time duration of the trip (in a human-readable format). - Total Duration (ms): The total time duration of the trip in milliseconds.
Source This dataset was sourced directly from the Transport for London's official website, which provides open data to encourage public use and analysis. More details and related datasets can be found at Transport for London (TfL).
Reference: Transport for London. (August 2023). TfL Cycle Hire Trip Data. Retrieved [Date Retrieved], from https://tfl.gov.uk/info-for/open-data-users/our-open-data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘E-Shop Clothing Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adityawisnugrahas/eshop-clothing-dataset on 11 August 2021.
--- Dataset description provided by original source is as follows ---
Data description “e-shop clothing 2008”
Variables:
========================================================
========================================================
========================================================
========================================================
1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)
========================================================
========================================================
========================================================
========================================================
1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white
========================================================
1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right
========================================================
1-en face 2-profile
========================================================
========================================================
1-yes 2-no
========================================================
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I want to know how to solve this data regarding any problem (clustering, regression, classification, EDA)
Source: https://archive.ics.uci.edu/ml/datasets/clickstream+data+for+online+shopping
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Blackwell’s is a British book retailer founded in 1879 and it has more than 40 bookstores in the UK. They also sell books though their official site.
Books metadata downloaded from the official site using web scraping and API. In the cleaned version price names were renamed to distinguish between prices in pounds and euros and some columns were transformed to numbers.
You can use the data to:
Analyze prices in different book categories (like the genre)
Perform sentimental analysis of blurbs and review
Predict prices based on the book’s dimensions and weight
This open dataset takes each British postcode, locates the centroid, and assigns an elevation based on the nearest point on an Ordnance Survey contour line to that centroid.
Also known as altitude, elevation is given as distance above sea level in metres.
Documentation and the latest version can be found at the Open Postcode Elevation homepage.
Postcode data is from the ONS Postcode Directory.
Elevation is from OS Terrain 50.
Published and maintained by GetTheData.
Open data licensed under the Open Government Licence.
Attribution required.
Photo by Maojin Lang on Unsplash
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The opening dates of all current UK railway stations, manually extracted from the chronology information (in PDF format) from this site: https://rchs.org.uk/railway-passenger-stations-in-great-britain-a-chronology/
Where a station has continually existed, but moved location (along the same line), the earliest opening date is used, together with the most recent 'resiting' date. Where a station closed, but has subsequently re-opened, I have shown both the original opening date and the date from which the station has currently been open since. Where a previous station had existed (on the same line), within a mile of the current station, then I have deemed the current station as being 're-opened' and resited (even if the previous station had a different name). (Where a station has been temporarily closed, eg due to renovation or line improvements, then I have just treated this as being continuously open. Some stations are noted as being closed during the First World War 1914-1918, so where this is the case then the dates have been noted in their own column, but the station is assumed to have been continuously open).
Details of the columns are (all dates are in UK format dd/mm/yyyy): - Three Letter Code - the station code by which all UK railway stations can be identified (initially used as Computer Reservation System (CRS) codes) - occasionally a station will have/use more than 1 CRS code, in which case a separate entry is shown per code. - station_name - the name commonly used for the station (although this can change or vary between listings) - Status - The current status of the station/record (see further details below) - Currently Opened - the date from which the station has currently be consistently opened since - Year of current opening - just the year part of the above date - Resited (most recent) - if the station has moved location, or been remodelled in its current location, then the last time this happened - Replaced different station - indicator if the station has replaced another station with a different name - Originally Opened - the date the station was originally opened, if different from its currently opened date - Year of Original opening - the year part of the above date (where an Originally Opened date exists), or a repeat of the 'Year of current opening' value - Closed - The date the station closed (if it has subsequently re-opened, or it is currently closed) - WW1 Closed - The date a station closed during the First World War - WW1 Re-open - The date a station re-opened after the First World War - Comment - any further textual information relevant to the other columns
The 'Status' column may have the following values: Open - the station is currently operational Re-Open - the station is currently operational, but had previously had a period of inactivity (or it replaced another station that was inactive) Duplicate - a duplicate record for where there is more than 1 'Three Letter Code' for that station N/A - a station that is not on the current national rail network (but may previously have been, or may be used for special services) dummy - a blank record for the year before the first of the other records (to give a zero data point)
The dataset description is shown below:
Id : 8 digit Id of the job advertisement,
Title: Title of the advertised job position,
Location: Location of the advertised job position,
ContractType: The contract type of the advertised job position, could be full-time, part-time or non-specified,
ContractTime: The contract time of the advertised job position, could be permanent, contract or non-specified,
Company: Company (employer) of the advertised job position,
Category: The Category of the advertised job position, e.g., IT jobs, Engineering Jobs, etc.
Salary per annum: Annual Salary of the advertised job position, e.g., 80000,
OpenDate: The opening time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,
CloseDate: The closing time for applying for the advertised job position, e.g., 20120104T150000, means 3pm, 4th January 2012,
SourceName: The website where the job position is advertised.
In this task, you are required to inspect and audit the data (dataset1_with_error.csv) to identify the data problems, and then fix the problems. Different generic and major data problems could be found in the data might include:
Lexical errors Irregularities Violations of the Integrity constraint. Inconsistency In the end, save the error-free dataset in dataset1_solution.csv. The number of records in your solution should be the same as the number of those in the input file.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://lh3.googleusercontent.com/proxy/4ivUvH4DpmVTktg2zuEn9r9Zh7zs2oZ9LU9sqMpgv1fxVsPR79FiecR1e0-980dzsOcuG5Fazvlt71LzH9C5uLVP62PaZsUU4U652yzPdRzWi8GfNc8yK7AD" alt="Bicycle Accidents">
This is a Dataset of Bicycle accidents in Great Britain from 1970 to 2018 from road types to gender casualties.
This Dataset contains data such as the accident index, number of vehicles involved, number of casualties, date and time of accident, speed limit, road and weather conditions, day of accident and finally the road type in which the accident took place. It also includes the gender of person driving the bicycle, severity of the accident and the age group range of the victims.
Bicycle racing is recognised as an Olympic sport. Bicycle races are popular all over the world, especially in Europe. The countries most devoted to bicycle racing include Belgium, Denmark, France, Germany, Italy, the Netherlands, Spain and Switzerland. Other countries with international standing include Australia, Luxembourg, United Kingdom, United States and Colombia. Also being a big fan of the sport and the number of unfortunate accidents happening across Great Britain inspired me to share this Dataset from the following website https://data.world/gonzandrobles/bicycleaccidentsuk which can be referred for detailed analysis.
These files were taken from the Great Britain Road Accidents 2005_2016 published by the Department for Transport. Licensed under Open Government Licence. The dataset is maintained by Teng Li and was last updated about 3 years ago. The initial dataset is quite large so this sample was created to facilitate the completion of a course project via an open-source web application.
The files provide detailed data about the circumstances of personal injury road accidents in Great Britain from 2005 onwards, the make of vehicles involved, and the consequential casualties. The statistics relate only to personal injury accidents on public roads that are reported to the police and subsequently recorded, using the STATS19 accident reporting form. Information on damage-only accidents, with no human casualties or accidents on private roads or car parks, are not included in this data.
The complete dataset can be found at:
https://www.kaggle.com/nichaoku/gbaccident0516
The rows and columns of the dataset provide details of the date, time, number of accidents by severity, casualties, and conditions that may have contributed to the accidents that occured. Details in the casualty and vehicle files can be linked to the relevant accident by the “Accident_Index” field.
A list of the variables contained in the files is provided along with the dataset.
Source: https://data.gov.uk/dataset/road-accidents-safety-data
Open Flood Risk by Postcode is derived from the Environment Agency's Risk of Flooding from Rivers and Sea which allocates a risk level to areas in England, UK. Using postcode data from Open Postcode Geo, each English postcode is placed in its risk area, allowing a flood risk level to be allocated to a postcode.
Note that where a postcode is outside a flood risk area, some of the column values will be NULL, represented as \N in this file.
You can find full documentation on the Open Flood Risk by Postcode homepage.
Derived from Risk of Flooding from Rivers and Sea Derived from Open Postcode Geo Licensed under the OGL
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Mariusz Šapczyński, Cracow University of Economics, Poland, lapczynm '@' uek.krakow.pl Sylwester Białowąs, Poznan University of Economics and Business, Poland, sylwester.bialowas '@' ue.poznan.pl
The dataset contains information on clickstream from online store offering clothing for pregnant women. Data are from five months of 2008 and include, among others, product category, location of the photo on the page, country of origin of the IP address and product price in US dollars.
The dataset contains 14 variables described in a separate file (See 'Data set description')
N/A
If you use this dataset, please cite:
Šapczyński M., Białowąs S. (2013) Discovering Patterns of Users' Behaviour in an E-shop - Comparison of Consumer Buying Behaviours in Poland and Other European Countries, “Studia Ekonomiczne†, nr 151, “La société de l'information : perspective européenne et globale : les usages et les risques d'Internet pour les citoyens et les consommateurs†, p. 144-153
========================================================
========================================================
========================================================
========================================================
following categories:
1-Australia 2-Austria 3-Belgium 4-British Virgin Islands 5-Cayman Islands 6-Christmas Island 7-Croatia 8-Cyprus 9-Czech Republic 10-Denmark 11-Estonia 12-unidentified 13-Faroe Islands 14-Finland 15-France 16-Germany 17-Greece 18-Hungary 19-Iceland 20-India 21-Ireland 22-Italy 23-Latvia 24-Lithuania 25-Luxembourg 26-Mexico 27-Netherlands 28-Norway 29-Poland 30-Portugal 31-Romania 32-Russia 33-San Marino 34-Slovakia 35-Slovenia 36-Spain 37-Sweden 38-Switzerland 39-Ukraine 40-United Arab Emirates 41-United Kingdom 42-USA 43-biz (.biz) 44-com (.com) 45-int (.int) 46-net (.net) 47-org (*.org)
========================================================
========================================================
1-trousers 2-skirts 3-blouses 4-sale
========================================================
(217 products)
========================================================
1-beige 2-black 3-blue 4-brown 5-burgundy 6-gray 7-green 8-navy blue 9-of many colors 10-olive 11-pink 12-red 13-violet 14-white
========================================================
1-top left 2-top in the middle 3-top right 4-bottom left 5-bottom in the middle 6-bottom right
========================================================
1-en face 2-profile
========================================================
========================================================
the average price for the entire product category
1-yes 2-no
========================================================
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The purpose of this initiative is to build an integrated dataset on Intensive Care Units (ICUs) and their availability by country and region (at the highest regional granularity provided by the sources), using a data model standardized across countries.
Currently, ICU data is stored in different country-specific sources, with a wide range of access points (national websites, APIs, excel or csv files, etc.)
Given current COVID-19 crisis, we believe that this information should be provided with the following: * common standardized structure * single point of access * open to the public
We hope that these datasets will further benefit researchers and help us in the fight against COVID-19.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains 2200 honest text reviews about Emirates Airline. the Skytrax platform provides reviews. Skytrax is a United Kingdom-based consultancy that runs an airline and airport review and ranking site.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains detailed league performance statistics for the nominees of the 2024 Ballon D'Or
across major European football leagues. The stats cover the 2023-2024
season, showcasing metrics such as goals
, assists
, expected goals (xG)
, expected assists (xAG)
, progression metrics
, and more.
The winner of the Men's Ballon d'Or goes to the best male player voted by a panel of soccer journalists representing the top 100 countries in the FIFA Men's Rankings.
For the first time since 2003, though, Cristiano Ronaldo and Lionel Messi were not included among the nominees!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Daikon Corpus was created during the Diachronic Text Evaluation task in SemEval-2015. The task was to create a system that can date a piece of text.
For example, given a text snippet:
“Dictator Saddam Hussein ordered his troops to march into Kuwait. After the invasion is condemned by the UN Security Council, the US has forged a coalition with allies. Today American troops are sent to Saudi Arabia in Operation Desert Shield, protecting Saudi Arabia from possible attack.”
The text has clear temporal evidence with reference to a
Historically, we know that
Given the specific chronic deicticity (“today”) that indicates that the text is published during the Gulf War, we can conceive that the text snippet should be dated 1990-1991.
The Daikon Corpus is made up of articles from the British Spectator news magazine from year 828 to 2008.
The corpus contains 24,280 articles with 19 million tokens; the token count is calculated by summing the number of whitespaces plus 1 for each paragraph.
The Daikon corpus is saved in the JSON format, where the outer most-structure is a list and the inner data structure is a key-value dictionary/hashmap that contains the:
Note: If the url is broken, try removing the .html
suffix of the url. e.g. change
http://archive.spectator.co.uk/article/24th-september-2005/57/doctor-in-the-house.html
to
http://archive.spectator.co.uk/article/24th-september-2005/57/doctor-in-the-house
Liling Tan and Noam Ordan. 2015.
USAAR-CHRONOS: Crawling the Web for Temporal Annotations.
In Proceedings of Ninth International Workshop on
Semantic Evaluation (SemEval 2015). Denver, USA.
Task reference:
Octavian Popescu and Carlo Strapparava.
SemEval 2015, Task 7: Diachronic Text Evaluation.
In Proceedings of Ninth International Workshop on
Semantic Evaluation (SemEval 2015). Denver, USA.
Dataset image comes from Jonathan Pielmayer
Let's make an artificially intelligent "Flynn Carsen" !!
A)20160923_global_crisis_data:
https://www.hbs.edu/behavioral-finance-and-financial-stability/data/Pages/global.aspx
This data was collected over many years by Carmen Reinhart (with her coauthors Ken Rogoff, Christoph Trebesch, and Vincent Reinhart). This data contains the banking crises of 70 countries, from 1800 AD to 2016 AD, with a total of 15,190 records and 16 variables. But the data stabilized after cleaning and adjusting to 8642 records and 17 variables.
B)Label_Country: This data contains a description of the country whether it's Developing or Developed .
1-Case: ID Number for Country.
2-Cc3: ID String for Country.
3-Country : Name Country.
4-Year: The date from 1800 to 2016.
5-Banking_Crisis: Banking problems can often be traced to a decrease the value of banks' assets.
A) due to a collapse in real estate prices or When the bank asset values decrease substantially . B) if a government stops paying its obligations, this can trigger a sharp decline in value of bonds.
6-Systemic_Crisis : when many banks in a country are in serious solvency or liquidity problems at the same time—either:
A) because there are all hits by the same outside shock. B) or because failure in one bank or a group of banks spreads to other banks in the system.
7-Gold_Standard: The Country have crisis in Gold Standard.
8-Exch_Usd: Exch local currency in USD, Except exch USD currency in GBP.
9-Domestic_Debt_In_Default: The Country have domestic debt in default.
10-Sovereign_External_Debt_1: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom and post-1975 defaults on Official External Creditors.
11-Sovereign_External_Debt_2: Default and Restructurings, -Does not include defaults on WWI debt to United States and United Kingdom but includes post-1975 defaults on Official External Creditors.
12-Gdp_Weighted_Default:GDP Weighted Default for country.
13-Inflation: Annual percentages of average consumer prices.
14-Independence: Independence for country.
15-Currency_Crises: The Country have crisis in Currency.
16-Inflation_Crises: The Country have crisis in Inflation.
17-Level_Country: The description of the country whether it's Developing or Developed.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Source: https://www.wider.unu.edu/database/wiid User Guide: https://www.wider.unu.edu/sites/default/files/WIID/PDF/WIID-User_Guide_06MAY2020.pdf
The World Income Inequality Database (WIID) contains information on income inequality in various countries and is maintained by the United Nations University-World Institute for Development Economics Research (UNU-WIDER). The database was originally compiled during 1997-99 for the research project Rising Income Inequality and Poverty Reduction, directed by Giovanni Andrea Corina. A revised and updated version of the database was published in June 2005 as part of the project Global Trends in Inequality and Poverty, directed by Tony Shorrocks and Guang Hua Wan. The database was revised in 2007 and a new version was launched in May 2008.
The database contains data on inequality in the distribution of income in various countries. The central variable in the dataset is the Gini index, a measure of income distribution in a society. In addition, the dataset contains information on income shares by quintile or decile. The database contains data for 159 countries, including some historical entities. The temporal coverage varies substantially across countries. For some countries there is only one data entry; in other cases there are over 100 data points. The earliest entry is from 1867 (United Kingdom), the latest from 2003. The majority of the data (65%) cover the years from 1980 onwards. The 2008 update (version WIID2c) includes some major updates and quality improvements, in fact leading to a reduced number of variables in the new version. The new version has 334 new observations and several revisions/ corrections made in 2007 and 2008.
I just wanted to share the dataset I scraped from DJ Mag Official Website to create shiny visualization app.
!! Dataset will be updated as soon as possible after this year's announcement on 21st October.
DJ Magazine (aka DJ Mag) is a British monthly magazine dedicated to EDM and DJs. It was founded in 1991. Top 100 DJs is one of the magazine’s biggest property and it provides a list of the world’s most popular DJs every year since 2004. The poll attracted over 1 million votes in 2015, and now it is considered as one of the world’s biggest biggest music polls.
For more information, visit https://djmag.com/.
Scraping Script: DJ Mag Ranking Scraping Script
Dataset includes all the DJ Mag ranking history from 2004 to 20017.
I really appreciate DJ Mag official for making dataset public and UNICEF for supporting the activity every year.
Can you find how has the EDM music industry has changed?
Please share your reports using this dataset. Your contributions are always welcome!!!
Not seeing a result you expected?
Learn how you can add new datasets to our index.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This analysis was conducted as part of a university module to compare age with socio-economic group in the UK and investigates unemployment levels with deprivation in England.
The dataset includes the English Indices of Deprivation 2015 and the 2011 UK census data.
The English indices of deprivation measures relative deprivation in small areas in England called lower-layer super output areas. The index of multiple deprivation is the most widely used of these indices. More information can be found on the government website here. The Index of Multiple Deprivation ranks every small area in England from 1 (most deprived area) to 32,844 (least deprived area) and ranks them according to the following measures:
Income Deprivation Employment Deprivation Education, Skills and Training Deprivation Health Deprivation and Disability Crime Barriers to Housing and Services Living Environment Deprivation By including the 2011 UK census data and a lookup table (for combining the datasets) it is possible to see how age and gender corresponds to areas of deprivation.
All data has been made freely available by the UK Government and can be accessed here. It is strongly recommended that the guidance notes for this dataset are read before performing any analysis.