This dataset captures rich, first-hand consumer experiences across financial services brands and products in the UK. It includes structured review metrics (e.g. satisfaction score, NPS, value for money), natural language reviews, and advanced derived data such as sentiment scoring and thematic tags. This is an ideal resource for consultancies, researchers, and AI/ML teams aiming to analyse financial services experiences, benchmark brands, or build consumer behaviour models.
Data is collected directly from Smart Money People’s independent review platform and updated monthly. It is anonymised, GDPR-compliant, and available as review-level raw data or aggregated monthly summaries.
Variants: Date, demographic, financial sector, product category
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains 77,405 customer reviews of water bottles scraped from Flipkart, one of India’s leading e-commerce platforms. The data has been preprocessed to remove extraneous characters and duplicate entries, making it suitable for a variety of text analysis tasks. It captures essential information about each review, including product details, customer ratings, and the full review text, offering valuable insights into customer opinions and product performance.
This dataset is provided as a CSV file named cleaned_reviews.csv
. It contains 77,405 individual customer reviews, organised into six distinct columns. The data structure is tabular, with each row representing a unique review.
This dataset is ideal for: * Sentiment analysis to gauge customer emotions towards water bottle products. * Various Natural Language Processing (NLP) tasks, such as text classification, topic modelling, and keyword extraction. * Market research to understand consumer preferences, pain points, and satisfaction levels regarding water bottles. * The development of recommendation systems based on product reviews and ratings. * Gaining deeper insights into customer opinions and product performance on an e-commerce platform.
The dataset focuses on customer reviews from Flipkart, a major e-commerce platform in India, thus providing insights primarily into the Indian market. The reviews cover various water bottle products listed on the platform. A specific time range for the reviews is not available in the provided sources.
CC0
This dataset is suitable for: * Data Scientists and Machine Learning Engineers for building and training NLP models, such as sentiment classifiers. * Data Analysts for conducting market research, exploring customer feedback trends, and generating reports on product performance. * Product Managers seeking to understand customer satisfaction, identify popular product features, or address common complaints related to water bottles. * Academic Researchers interested in e-commerce review analysis, consumer behaviour, or text mining.
Original Data Source: Water bottle review dataset (FLIPKART)
Product Review Datasets: Uncover user sentiment
Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.
Data sources:
Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:
Choose from multiple data delivery options to suit your needs:
Why choose Oxylabs?
Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.
Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.
Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.
Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The csv file contains the dataset of literature search produced by the ZOOOM EU Funded Project on open software, open hardware, open data business models.
The Armed Conflict Location & Event Data Project (ACLED) is a US-registered non-profit whose mission is to provide the highest quality real-time data on political violence and demonstrations globally. The information collected includes the type of event, its date, the location, the actors involved, a brief narrative summary, and any reported fatalities. ACLED users rely on our robust global dataset to support decision-making around policy and programming, accurately analyze political and country risk, support operational security planning, and improve supply chain management.ACLED’s transparent methodology, expert team composed of 250 individuals speaking more than 70 languages, real-time coding system, and weekly update schedule are unrivaled in the field of data collection on conflict and disorder. Global Coverage: We track political violence, demonstrations, and strategic developments around the world, covering more than 240 countries and territories.Published Weekly: Our data are collected in real time and published weekly. It is the only dataset of its kind to provide such a high update frequency, with peer datasets most often updating monthly or yearly.Historical Data: Our dataset contains at least two full years of data for all countries and territories, with more extensive coverage available for multiple regions.Experienced Researchers: Our data are coded by experienced researchers with local, country, and regional expertise and language skills.Thorough Data Collection and Sourcing: Pulling from traditional media, reports, local partner data, and verified new media, ACLED uses a tailor-made sourcing methodology for individual regions/countries.Extensive Review Process: Our data go through an exhaustive multi-stage quality assurance process to ensure their accuracy and reliability. This process includes both manual and automated error checking and contextual review.Clean, Standardized, and Validated: Our data can be easily connected with internal dashboards through our API or downloaded through the Data Export Tool on our website.Resources Available on ESRI’s Living AtlasACLED data are available through the Living Atlas for the most recent 12 month period. The data are mapped to the centroid of first administrative divisions (“admin1”) within countries (e.g., states, districts, provinces) and aggregated by month. Variables in the data include:The number of events per admin1-month, disaggregated by event type (protests, riots, battles, violence against civilians, explosions/remote violence, and strategic developments)A conservative estimate of reported fatalities per admin1-monthThe total number of distinct violent actors active in the corresponding admin1 for each monthThis Living Atlas item is a Web Map, which provides a pre-configured view of ACLED event data in a few layers:ACLED Event Counts layer: events per admin1-month, styled by predominant event type for each location.ACLED Violent Actors layer: the number of distinct violent actors per admin1-month.ACLED Fatality Estimates layer: the estimated number of fatalities from political violence per admin1-month.These layers are based on the ACLED Conflict and Demonstrations Event Data Feature Layer, which has the same data but only a basic default styling that is similar to the Event Counts layer. The Web Map layers are configured with a time-slider component to account for the multiple months of data per admin1 unit. These indicators are also available in the ACLED Conflict and Demonstrations Data Key Indicators Group Layer, which includes the same preconfigured layers but without the time-slider component or background layers.Resources Available on the ACLED WebsiteThe fully disaggregated dataset is available for download on ACLED's website including:Date (day, month, year)Actors, associated actors, and actor typesLocation information (ADMIN1, ADMIN2, ADMIN3, location and geo coordinates)A conservative fatality estimateDisorder type, event types, and sub-event typesTags further categorizing the data A notes column providing a narrative of the event For more information, please see the ACLED Codebook.To explore ACLED’s full dataset, please register on the ACLED Access Portal, following the instructions available in this Access Guide. Upon registration, you’ll receive access to ACLED data on a limited basis. Commercial users have access to 3 free data downloads company-wide with access to up to one year of historical data. Public sector users have access to 6 downloads of up to three years of historical data organization-wide. To explore options for extended access, please reach out to our Access Team (access@acleddata.com).With an ACLED license, users can also leverage ACLED’s interactive Global Dashboard and check in for weekly data updates and analysis tracking key political violence and protest trends around the world. ACLED also has several analytical tools available such as our Early Warning Dashboard, Conflict Alert System (CAST), and Conflict Index Dashboard.
The Armed Conflict Location & Event Data Project (ACLED) is a US-registered non-profit whose mission is to provide the highest quality real-time data on political violence and demonstrations globally. The information collected includes the type of event, its date, the location, the actors involved, a brief narrative summary, and any reported fatalities. ACLED users rely on our robust global dataset to support decision-making around policy and programming, accurately analyze political and country risk, support operational security planning, and improve supply chain management.ACLED’s transparent methodology, expert team composed of 250 individuals speaking more than 70 languages, real-time coding system, and weekly update schedule are unrivaled in the field of data collection on conflict and disorder. Global Coverage: We track political violence, demonstrations, and strategic developments around the world, covering more than 240 countries and territories.Published Weekly: Our data are collected in real time and published weekly. It is the only dataset of its kind to provide such a high update frequency, with peer datasets most often updating monthly or yearly.Historical Data: Our dataset contains at least two full years of data for all countries and territories, with more extensive coverage available for multiple regions.Experienced Researchers: Our data are coded by experienced researchers with local, country, and regional expertise and language skills.Thorough Data Collection and Sourcing: Pulling from traditional media, reports, local partner data, and verified new media, ACLED uses a tailor-made sourcing methodology for individual regions/countries.Extensive Review Process: Our data go through an exhaustive multi-stage quality assurance process to ensure their accuracy and reliability. This process includes both manual and automated error checking and contextual review.Clean, Standardized, and Validated: Our data can be easily connected with internal dashboards through our API or downloaded through the Data Export Tool on our website.Resources Available on ESRI’s Living AtlasACLED data are available through the Living Atlas for the most recent 12 month period. The data are mapped to the centroid of first administrative divisions (“admin1”) within countries (e.g., states, districts, provinces) and aggregated by month. Variables in the data include:The number of events per admin1-month, disaggregated by event type (protests, riots, battles, violence against civilians, explosions/remote violence, and strategic developments)A conservative estimate of reported fatalities per admin1-monthThe total number of distinct violent actors active in the corresponding admin1 for each monthThis Living Atlas item is a Web Map, which provides a pre-configured view of ACLED event data in a few layers:ACLED Event Counts layer: events per admin1-month, styled by predominant event type for each location.ACLED Violent Actors layer: the number of distinct violent actors per admin1-month.ACLED Fatality Estimates layer: the estimated number of fatalities from political violence per admin1-month.These layers are based on the ACLED Conflict and Demonstrations Event Data Feature Layer, which has the same data but only a basic default styling that is similar to the Event Counts layer. The Web Map layers are configured with a time-slider component to account for the multiple months of data per admin1 unit. These indicators are also available in the ACLED Conflict and Demonstrations Data Key Indicators Group Layer, which includes the same preconfigured layers but without the time-slider component or background layers.Resources Available on the ACLED WebsiteThe fully disaggregated dataset is available for download on ACLED's website including:Date (day, month, year)Actors, associated actors, and actor typesLocation information (ADMIN1, ADMIN2, ADMIN3, location and geo coordinates)A conservative fatality estimateDisorder type, event types, and sub-event typesTags further categorizing the data A notes column providing a narrative of the event For more information, please see the ACLED Codebook.To explore ACLED’s full dataset, please register on the ACLED Access Portal, following the instructions available in this Access Guide. Upon registration, you’ll receive access to ACLED data on a limited basis. Commercial users have access to 3 free data downloads company-wide with access to up to one year of historical data. Public sector users have access to 6 downloads of up to three years of historical data organization-wide. To explore options for extended access, please reach out to our Access Team (access@acleddata.com).With an ACLED license, users can also leverage ACLED’s interactive Global Dashboard and check in for weekly data updates and analysis tracking key political violence and protest trends around the world. ACLED also has several analytical tools available such as our Early Warning Dashboard, Conflict Alert System (CAST), and Conflict Index Dashboard.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The review dataset for 3 video games - Call of Duty : Black Ops 3, Persona 5 Royal and Counter Strike: Global Offensive was taken through a web scrape of SteamDB [https://steamdb.info/] which is a large repository for game related data such as release dates, reviews, prices, and more. In the initial scrape, each individual game has two files - customer reviews (Count: 100 reviews) and price time series data.
To obtain data on the reviews of the selected video games, we performed web scraping using R software. The customer reviews dataset contains the date that the review was posted and the review text, while the price dataset contains the date that the price was changed and the price on that date. In order to clean and prepare the data we first start by sectioning the data in excel. After scraping, our csv file fits each review in one row with the date. We split the data, separating date and review, allowing them to have separate columns. Luckily scraping the price separated price and date, so after the separating we just made sure that every file had similar column names.
After, we use R to finish the cleaning. Each game has a separate file for prices and review, so each of the prices is converted into a continuous time series by extending the previously available price for each date. Then the price dataset is combined with its respective in R on the common date column using left join. The resulting dataset for each game contains four columns - game name, date, reviews and price. From there, we allow the user to select the game they would like to view.
These statistical tables are one of the results from a project undertaken by Statistics Canada on behalf of the Treasury Board Secretariat (TBS) in support of the Horizontal Innovation and Clean Technology Review. They were produced from program data provided by 22 federal government departments and Crown corporations and their subsequent integration into Statistics Canada’s Linkable File Environment (LFE), which comprises a large number of administrative and survey data linked at the enterprise level. More than 430,000 individual records were collected, from 98 program streams over the 2007-2016 period. Program streams were also grouped in seven aggregate categories: grants, repayable contributions, non-repayable contributions, conditional repayable contributions, financing, government performed services and other. Program recipients at the enterprise level (whether for-profit or public entities) were matched to Statistics Canada’s Business Register (BR), which contains all active enterprises in Canada, and then linked to the LFE using both deterministic (Business Numbers) and probabilistic techniques. A high match rate was achieved, representing 89.4% of all records and 96.6% of funds, corresponding to 88,415 unique recipient enterprises over the reference period. Relevant data for these enterprises, such as financial and employment variables, industry, location, profit and exporter status, were then extracted from the LFE.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
These statistical tables are one of the results from a project undertaken by Statistics Canada on behalf of the Treasury Board Secretariat (TBS) in support of the Horizontal Innovation and Clean Technology Review. They were produced from program data provided by 22 federal government departments and Crown corporations and their subsequent integration into Statistics Canada’s Linkable File Environment (LFE), which comprises a large number of administrative and survey data linked at the enterprise level. More than 430,000 individual records were collected, from 98 program streams over the 2007-2016 period. Program streams were also grouped in seven aggregate categories: grants, repayable contributions, non-repayable contributions, conditional repayable contributions, financing, government performed services and other. Program recipients at the enterprise level (whether for-profit or public entities) were matched to Statistics Canada’s Business Register (BR), which contains all active enterprises in Canada, and then linked to the LFE using both deterministic (Business Numbers) and probabilistic techniques. A high match rate was achieved, representing 89.4% of all records and 96.6% of funds, corresponding to 88,415 unique recipient enterprises over the reference period. Relevant data for these enterprises, such as financial and employment variables, industry, location, profit and exporter status, were then extracted from the LFE.
Affordable, clean, and secure energy and energy services are essential for improving U.S. economic productivity, enhancing our quality of life, protecting our environment, and ensuring our Nation's security. To help the federal government meet these energy goals, President Obama issued a Presidential Memorandum on January 9 directing the administration to conduct a Quadrennial Energy Review (QER). As described in the President’s Climate Action Plan, this first-ever review will focus on energy infrastructure and will identify the threats, risks, and opportunities for U.S. energy and climate security, enabling the federal government to translate policy goals into a set of integrated actions. The Presidential Memorandum created an interagency task force co-chaired by the Director of the Office of Science and Technology Policy and the Special Assistant to the President for Energy and Climate Change. The Department of Energy will help coordinate interagency activities and provide policy analysis and modeling, and stakeholder engagement.
Contributor:
Richard Limeburner
Woods Hole Oceanographic Institution
Woods Hole, MA 02543
rlimeburner@whoi.edu
Dataset:
Process Study CTD data, Gulf of Maine/Georges Bank
NOTE:
These data sets should be considered preliminary. A cursory quality review has been performed and the results of this review are tabulated below on a cruise by cruise basis.
CHARLES ISELIN CRUISE 9407, CTD DATA QUALITY REVIEW
GENERAL COMMENT: THIS DATA SET IS OF POOR QUALITY, USE \"CAUTION\".
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..CASTS 96 AND 98 CONTAIN ONLY A SURFACE DATA CYCLE. FILES DELETED. 3..OXYGEN BAD ALL CASTS. REPLACED WITH -9..
4..SALINITY IS NOISY. SOME CASTS WORSE THEN OTHERS. NO ATTEMPT TO CLEAN UP. USE WITH CAUTION.
5..LIGHT TRANSMISSION. CASTS 68 - 99 SHOW A ONE VOLT OFFSET FROM CASTS 1-68. CALIBRATION PROBLEM? USE WITH CAUTION. NO ATTEMPT TO CLEAN UP SPIKES.
6..FLUORESCENCE. CASTS 68 - 99 SHOW A TWO VOLT OFFSET FROM CASTS 1 - 68. CALIBRATION PROBLEM? USE WITH CAUTION. NO ATTEMPT TO CLEAN UP SPIKES.
ENDEAVOR CRUISE 259, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION WAS EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..NO CTD CAST 20. APPEARS TO BE THE RESULT OF MIS-NUMBERING.
3..CTD CAST 13 HAS ONLY ONE DATA CYCLE, SURFACE VALUE. FILE DELETED.
4..CTD CAST 27. THE DATA REPORTED FOR CAST 27 IS THE EXACT SAME AS FOR CAST 28. THE FILE FOR CAST 27 DELETED.
5..CTD CAST 29. FLUOROMETER DATA BAD ENTIRE CAST, REPLACED WITH -9.0 BAD/MISSING DATA INDICATOR.
6..CTD CAST 30. TRANSMISSOMETER DATA, DATA SPIKE AT 274 DECIBARS. REPLACED WITH -9.0.
7..OXYGEN DISPLAYS OCCASIONAL DATA SPIKES AND SHIFTS EXTENDING OVER MORE THEN ONE DECIBAR. NO ACTION TAKEN
ENDEAVOR CRUISE 260, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION WAS EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..CAST NUMBERS 26, 27 AND 28 MISSING.
3..OXYGEN DISPLAYS OCCASIONAL DATA SPIKES EXTENDING OVER MORE THEN ONE DECIBAR (CAST 13 AT 110 DECIBARS, AND CAST 16 AT 41 DECIBARS). NO ACTION TAKEN.
ENDEAVOR CRUISE 262, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..CAST NUMBERS 20, AND 21. ALL OXYGEN VALUES BAD. REPLACED WITH -9..
3..LIGHT TRANSMISSION. ONE DECIBAR DATA SPIKES REMOVED FROM CAST 2, 5 AND 23, SURFACE VALUES. A ONE DECIBAR SPIKE REMOVED FROM CAST 21 AT 14 DECIBARS.
4..FLUORESCENCE. DATA SPIKE REMOVED FROM CAST 21 AT 14 DECIBARS.
ENDEAVOR CRUISE 264, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..CAST NUMBERS 29, 30, 32, AND 34 MISSING.
3..LIGHT TRANSMISSION. ONE DECIBAR DATA SPIKES REMOVED FROM CAST 2 AT 10M, AND CAST 23 AT 88M.
ENDEAVOR CRUISE 266, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..DATA FOR CAST NUMBERS 22, AND 23 MISSING.
ENDEAVOR CRUISE 267 LEG 1, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..DATA FOR CAST NUMBERS 4, 98, AND 103 MISSING.
3..CAST 1 AND 81 ALL VALUES REMOVED FROM LAST DATA CYCLE DUE TO DATA SPIKES.
4..FLUORESCENCE. DATA OFF SCALE ON THE HIGH SIDE. SEVERAL CASTS EXCEED THE MAX. OF VALUE OF 5 VOLTS.
ENDEAVOR CRUISE 267 LEG 2, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..ONE DECIBAR SALINITY SPIKES REMOVED FROM STATION 210 at 24m, 214 at 16m and 217 at 4m.
ENDEAVOR CRUISE 269, CTD DATA QUALITY REVIEW
1..CAST SAMPLING INFORMATION EDITED TO BE CONSISTENT WITH THE EVENT LOG. DATES AND TIMES ARE REPORTED AS GMT.
2..CAST 30, FLUORESCENCE DATA SPIKE AT 69 METERS REMOVED.
3..OXYGEN VERY NOISY (LARGE NUMBER OF SPIKES). NO ACTION TAKEN. USE WITH CAUTION.
updated: November 16, 2005; gfh
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Dry Car Cleaning Business market is an innovative and rapidly evolving sector that offers a sustainable alternative to traditional car washing methods. This approach utilizes specialized products and techniques to clean vehicles without the use of excessive water, making it an environmentally friendly choice tha
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Explore the historical Whois records related to data-clean-room.tech (Domain). Get insights into ownership history and changes over time.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Gutter Cleaning Services market has seen remarkable growth in recent years, driven by an increasing awareness of home maintenance and the critical role that clean gutters play in preventing water damage. With the current market size estimated at several billion dollars globally, stakeholders are recognizing the
Papageorgiou, Chris, Saam, Marianne, and Schulte, Patrick, (2017) "Substitution between Clean and Dirty Energy Inputs - A Macroeconomic Perspective." Review of Economics and Statistics 99:2, 281-290.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication package for: Bayer, Patrick and Johannes Urpelainen. 2013. “External Sources of Clean Technology: Evidence from the Clean Development Mechanism.” Review of International Organizations 8 (1): 81-109.
https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
This dataset, curated by PIONEER, encompasses a detailed collection of 181,207 asthma admissions from 1st June 2016 to 31st May 2022, offering a comprehensive analysis tool for researchers examining the effects of air quality on respiratory health. It includes extensive patient demographics, serial physiological measurements, assessments, diagnostic codes (ICD-10 and SNOMED-CT), initial presentations, symptoms, and outcomes. Additionally, it integrates DEFRA air pollution data, geographically linked t individual health data, allowing for a nuanced exploration of environmental impacts on asthma incidence and severity. The dataset includes 4 years of data prior to and currently 1 year post introduction of the clean air zone.
The dataset invites longitudinal studies to evaluate the Clean Air Zones' effectiveness. Timelines post-introduction of the clean air zone can be expanded to include data up to 2024. Its granular detail provides invaluable insights into emergency medicine, public health policy, and environmental science, supporting targeted interventions and policy formulations aimed at reducing asthma exacerbations and improving air quality standards.
Geography: The West Midlands (WM) has a population of 6 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & > 120 ITU bed capacity. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.
Data set availability: Data access is available via the PIONEER Hub for projects which will benefit the public or patients. This can be by developing a new understanding of disease, by providing insights into how to improve care, or by developing new models, tools, treatments, or care processes. Data access can be provided to NHS, academic, commercial, policy and third sector organisations. Applications from SMEs are welcome. There is a single data access process, with public oversight provided by our public review committee, the Data Trust Committee. Contact pioneer@uhb.nhs.uk or visit www.pioneerdatahub.co.uk for more details.
Available supplementary data: Matched controls; ambulance and community data. Unstructured data (images). We can provide the dataset in OMOP and other common data models and can build synthetic data to meet bespoke requirements.
Available supplementary support: Analytics, model build, validation & refinement; A.I. support. Data partner support for ETL (extract, transform & load) processes. Bespoke and “off the shelf” Trusted Research Environment (TRE) build and run. Consultancy with clinical, patient & end-user and purchaser access/ support. Support for regulatory requirements. Cohort discovery. Data-driven trials and “fast screen” services to assess population size.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Clean Protein Shake market has increasingly gained traction, driven by a growing consumer focus on health, wellness, and clean eating. This segment involves protein shakes made with high-quality, natural ingredients that cater to the needs of fitness enthusiasts, busy professionals, and individuals looking to ma
This is the Extended Golf Play Dataset, a rich and detailed collection designed to expand upon the classic golf dataset [1]. It incorporates a wide array of features suitable for various data science applications and is especially valuable for teaching purposes [1]. The dataset is organised in a long format, where each row represents a single observation and often includes textual data, such as player reviews or comments [2]. It contains a special set of mini datasets, each tailored to a specific teaching point, for example, demonstrating data cleaning or combining datasets [1]. These are ideal for beginners to practise with real examples and are complemented by notebooks with step-by-step guides [1].
The dataset features a variety of columns, including core, extra, and text-based attributes: * ID: A unique identifying number for each player [1]. * Date: The specific day the data was recorded or the golf session took place [1, 2]. * Weekday: The day of the week, with numerical representation (e.g., 0 for Sunday, 1 for Monday) [1, 3]. * Holiday: Indicates whether the day was a special holiday (Yes/No), specifically noted for holidays in Japan (1 for yes, 0 for no) [1, 3]. * Month: The month in which golf was played [3]. * Season: The time of year, such as spring, summer, autumn, or winter [1, 3]. * Outlook: Describes the weather conditions during the session (e.g., sunny, cloudy, rainy, snowy) [1, 3]. * Temperature: The ambient temperature during the golf session, recorded in Celsius [1, 3]. * Humidity: The percentage of moisture in the air [1, 3]. * Windy: A boolean indicator (True/False or 1 for yes, 0 for no) if it was windy [1, 3]. * Crowded-ness: A measure of how busy the golf course was, ranging from 0 to 1 [1, 4]. * PlayTime-Hour: The duration for which people played golf, in hours [1]. * Play: Indicates whether golf was played or not (Yes/No) [1]. * Review: Textual feedback from players about their day at golf [1]. * EmailCampaign: Text content of emails sent daily by the golf place [1]. * MaintenanceTasks: Descriptions of work carried out to maintain the golf course [1].
This dataset is organised in a long format, meaning each row represents a single observation [2]. Data files are typically in CSV format, with sample files updated separately to the platform [5]. Specific numbers for rows or records are not currently available within the provided sources. The dataset also includes a special collection of mini datasets within its structure [1].
This dataset is highly versatile and ideal for learning and applying various data science skills: * Data Visualisation: Learn to create graphs and identify patterns within the data [1]. * Predictive Modelling: Discover which data points are useful for predicting if golf will be played [1]. * Data Cleaning: Practise spotting and managing data that appears incorrect or inconsistent [1]. * Time Series Analysis: Understand how various factors change over time, such as daily or monthly trends [1, 2]. * Data Grouping: Learn to combine similar days or observations together [1]. * Text Analysis: Extract insights from textual features like player reviews, potentially for sentiment analysis or thematic extraction [1, 2]. * Recommendation Systems: Develop models to suggest optimal times to play golf based on historical data [1]. * Data Management: Gain experience in managing and analysing data structured in a long format, which is common for repeated measures [2].
The dataset's regional coverage is global [6]. While the Date
column records the day the data was captured or the session occurred, no specific time range for the collected data is stated beyond the listing date of 11/06/2025 [1, 6]. Demographic scope includes unique player IDs [1], but no specific demographic details or data availability notes for particular groups or years are provided.
CC-BY
This dataset is designed for a broad audience: * New Learners: It is easy to understand and comes with guides to aid the learning process [1]. * Teachers: An excellent resource for conducting classes on data visualisation and interpretation [1]. * Researchers: Suitable for testing novel data analysis methodologies [1]. * Students: Can acquire a wide range of skills, from making graphs to understanding textual data and building recommendation systems [1].
Original Data Source: ⛳️ Golf Play Dataset Extended
The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.
National coverage
The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.
Sample survey data [ssd]
A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.
Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.
Computer Assisted Personal Interview [capi]
Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.
The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.
Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.
The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the
This dataset captures rich, first-hand consumer experiences across financial services brands and products in the UK. It includes structured review metrics (e.g. satisfaction score, NPS, value for money), natural language reviews, and advanced derived data such as sentiment scoring and thematic tags. This is an ideal resource for consultancies, researchers, and AI/ML teams aiming to analyse financial services experiences, benchmark brands, or build consumer behaviour models.
Data is collected directly from Smart Money People’s independent review platform and updated monthly. It is anonymised, GDPR-compliant, and available as review-level raw data or aggregated monthly summaries.
Variants: Date, demographic, financial sector, product category