100+ datasets found

d
Exploratory Data Analysis of Airbnb Data
dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad, Imad; Rasheed, Ibtassam; Man, Yip Chi (2023). Exploratory Data Analysis of Airbnb Data [Dataset]. http://doi.org/10.5683/SP3/F2OCZF
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/F2OCZF
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Ahmad, Imad; Rasheed, Ibtassam; Man, Yip Chi
Description
Airbnb® is an American company operating an online marketplace for lodging, primarily for vacation rentals. The purpose of this study is to perform an exploratory data analysis of the two datasets containing Airbnb® listings and across 10 major cities. We aim to use various data visualizations to gain valuable insight on the effects of pricing, covid, and more!
f
Data from: FactExplorer: Fact Embedding-Based Exploratory Data Analysis for...
tandf.figshare.com
pdf
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qi Jiang; Guodao Sun; Yue Dong; Lvhan Pan; Baofeng Chang; Li Jiang; Haoran Liang; Ronghua Liang (2025). FactExplorer: Fact Embedding-Based Exploratory Data Analysis for Tabular Data [Dataset]. http://doi.org/10.6084/m9.figshare.28399639.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28399639.v1
Dataset updated
Jun 23, 2025
Dataset provided by
Taylor & Francis
Authors
Qi Jiang; Guodao Sun; Yue Dong; Lvhan Pan; Baofeng Chang; Li Jiang; Haoran Liang; Ronghua Liang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Despite exploratory data analysis (EDA) is a powerful approach for uncovering insights from unfamiliar datasets, existing EDA tools face challenges in assisting users to assess the progress of exploration and synthesize coherent insights from isolated findings. To address these challenges, we present FactExplorer, a novel fact-based EDA system that shifts the analysis focus from raw data to data facts. FactExplorer employs a hybrid logical-visual representation, providing users with a comprehensive overview of all potential facts at the outset of their exploration. Moreover, FactExplorer introduces fact-mining techniques, including topic-based drill-down and transition path search capabilities. These features facilitate in-depth analysis of facts and enhance the understanding of interconnections between specific facts. Finally, we present a usage scenario and conduct a user study to assess the effectiveness of FactExplorer. The results indicate that FactExplorer facilitates the understanding of isolated findings and enables users to steer a thorough and effective EDA.
E
Exploratory Data Analysis (EDA) Tools Report
marketreportanalytics.com
doc, pdf, ppt
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Exploratory Data Analysis (EDA) Tools Report [Dataset]. https://www.marketreportanalytics.com/reports/exploratory-data-analysis-eda-tools-54369
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Apr 2, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Exploratory Data Analysis (EDA) tools market is experiencing robust growth, driven by the increasing volume and complexity of data across industries. The rising need for data-driven decision-making, coupled with the expanding adoption of cloud-based analytics solutions, is fueling market expansion. While precise figures for market size and CAGR are not provided, a reasonable estimation, based on the prevalent growth in the broader analytics market and the crucial role of EDA in the data science workflow, would place the 2025 market size at approximately $3 billion, with a projected Compound Annual Growth Rate (CAGR) of 15% through 2033. This growth is segmented across various applications, with large enterprises leading the adoption due to their higher investment capacity and complex data needs. However, SMEs are witnessing rapid growth in EDA tool adoption, driven by the increasing availability of user-friendly and cost-effective solutions. Further segmentation by tool type reveals a strong preference for graphical EDA tools, which offer intuitive visualizations facilitating better data understanding and communication of findings. Geographic regions, such as North America and Europe, currently hold a significant market share, but the Asia-Pacific region shows promising potential for future growth owing to increasing digitalization and data generation. Key restraints to market growth include the need for specialized skills to effectively utilize these tools and the potential for data bias if not handled appropriately. The competitive landscape is dynamic, with both established players like IBM and emerging companies specializing in niche areas vying for market share. Established players benefit from brand recognition and comprehensive enterprise solutions, while specialized vendors provide innovative features and agile development cycles. Open-source options like KNIME and R packages (Rattle, Pandas Profiling) offer cost-effective alternatives, particularly attracting academic institutions and smaller businesses. The ongoing development of advanced analytics functionalities, such as automated machine learning integration within EDA platforms, will be a significant driver of future market growth. Further, the integration of EDA tools within broader data science platforms is streamlining the overall analytical workflow, contributing to increased adoption and reduced complexity. The market's evolution hinges on enhanced user experience, more robust automation features, and seamless integration with other data management and analytics tools.
o
YouTube Trending Videos of the Day
opendatabay.com
.undefined
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). YouTube Trending Videos of the Day [Dataset]. https://www.opendatabay.com/data/ai-ml/34cfa60b-afac-4753-9409-bc00f9e8fbec
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube, Data Science and Analytics
Description
The dataset includes YouTube trending videos statistics for Mediterranean countries on 2022-11-07. It contains 15 columns and it's related to 19 countries:

IT - Italy ES - Spain GR - Greece HR - Croatia TR - Turkey AL - Albania DZ - Algeria EG - Egypt LY - Lybia TN - Tunisia MA - Morocco IL - Israel ME - Montenegro LB - Lebanon FR - France BA - Bosnia and Herzegovina MT - Malta SI - Slovenia CY - Cyprus

SY - Syria

The columns are, instead, the following:

country: where is the country in which the video was published. video_id: video identification number. Each video has one. You can find it clicking on a video with the right button and selecting 'stats for nerds'. title: title of the video. publishedAt: publication date of the video. channelId: identification number of the channel who published the video. channelTitle: name of the channel who published the video. categoryId: identification number category of the video. Each number corresponds to a certain category. For example, 10 corresponds to 'music' category. Check here for the complete list. trending_date: trending date of the video. tags: tags present in the video. view_count: view count of the video. comment_count: number of comments in the video. thumbnail_link: the link of the image that appears before clicking the video. -comments_disabled: tells if the comments are disabled or not for a certain video. -ratings_disabled: tells if the rating is disabled or not for that video. -description: description below the video. Inspiration You can perform an exploratory data analysis of the dataset, working with Pandas or Numpy (if you use Python) or other data analysis libraries; and you can practice to run queries using SQL or the Pandas functions. Also, it's possible to analyze the titles, the tags and the description of the videos to search for relevant information. Remember to upvote if you found the dataset useful :).

License

CC0

Original Data Source: YouTube Trending Videos of the Day
Data and Code for Exploratory Factor Analysis in Sample 1
osf.io
Updated Apr 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Nielsen (2020). Data and Code for Exploratory Factor Analysis in Sample 1 [Dataset]. https://osf.io/z2hr3
Explore at:
Dataset updated
Apr 6, 2020
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Mathias Nielsen
Description
This component contains the data and syntax code used to conduct the Exploratory Factor Analysis and compute Velicer’s minimum average partial test in sample 1
COVID 19 Dataset
kaggle.com
zip
Updated Aug 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Gupta (2020). COVID 19 Dataset [Dataset]. https://www.kaggle.com/rahulgupta21/datahub-covid19
Explore at:
zip(915971 bytes)Available download formats
Dataset updated
Aug 24, 2020
Authors
Rahul Gupta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coronavirus disease 2019 (COVID-19) time series listing confirmed cases, reported deaths and reported recoveries. Data is disaggregated by country (and sometimes subregion). Coronavirus disease (COVID-19) is caused by the Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) and has had a worldwide effect. On March 11 2020, the World Health Organization (WHO) declared it a pandemic, pointing to the over 118,000 cases of the Coronavirus illness in over 110 countries and territories around the world at the time.

This dataset includes time series data tracking the number of people affected by COVID-19 worldwide, including:

confirmed tested cases of Coronavirus infection the number of people who have reportedly died while sick with Coronavirus the number of people who have reportedly recovered from it

Content

Data is in CSV format and updated daily. It is sourced from this upstream repository maintained by the amazing team at Johns Hopkins University Center for Systems Science and Engineering (CSSE) who have been doing a great public service from an early point by collating data from around the world.

We have cleaned and normalized that data, for example tidying dates and consolidating several files into normalized time series. We have also added some metadata such as column descriptions and data packaged it.
f
Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.691274.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-Hui Zhou; Ehsan Saghapour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Guns incident data
kaggle.com
Updated Sep 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman Miglani (2020). Guns incident data [Dataset]. https://www.kaggle.com/datasets/datatattle/guns-incident-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aman Miglani
Description
This data consists of the incidents involving guns. Perform EDA to find out the hidden patterns. Columns: 1) Race: Race of individual 2) Date: Date of incident 3) Education 4) Police involvment

Please leave an upvote if you find this relevant. P.S. I am new and it will help immensely. :)
o
Apple IPhone Customer Reviews
opendatabay.com
.undefined
Updated Jun 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Apple IPhone Customer Reviews [Dataset]. https://www.opendatabay.com/data/consumer/42533232-0299-4752-8408-4579f2251a34
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 10, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Reviews & Ratings
Description
Based on the dataset of iPhone reviews from Amazon, here are some project areas we can do:

-> Sentiment analysis: Determine overall sentiment and identify trends.

-> Feature analysis: Analyze user satisfaction with specific features.

-> Topic modeling: Discover underlying themes and discussion points.

Original Data Source: Apple IPhone Customer Reviews
o
🇸🇬 Shopee App Reviews from Google Store
opendatabay.com
.undefined
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). 🇸🇬 Shopee App Reviews from Google Store [Dataset]. https://www.opendatabay.com/data/consumer/d5fa3d0d-8802-40cd-9e29-d477075f54e2
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 14, 2025
Dataset authored and provided by
Datasimple
Area covered
Reviews & Ratings
Description
Context From the Shopee Wikipedia page

Shopee Pte. Ltd., under the trade name "Shopee," is a Singaporean multinational technology company specializing in e-commerce. It is a subsidiary company of Sea Limited. It was launched in 2015 in Singapore, before its global expansion. As of 2021, Shopee is considered the largest e-commerce platform in Southeast Asia with 343 million monthly visitors. It also serves consumers and sellers across countries in East Asia and Latin America who wish to purchase and sell their goods online.

(Personally, I use Shopee regularly.)

Usage This dataset should paint a good picture on what is the public's perception of the app over the years. Using this dataset, we can do the following

Extract sentiments and trends Identify which version of the app had the most positive feedback, the worst. Use topic modelling to identify the pain points of the application. (AND MANY MORE!)

Note Images generated using Bing Image Generator

Original Data Source: 🇸🇬 Shopee App Reviews from Google Store
Health Insurance Lead Prediction
kaggle.com
zip
Updated Mar 2, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sathishkumar (2021). Health Insurance Lead Prediction [Dataset]. https://www.kaggle.com/klmsathishkumar/health-insurance-lead-prediction
Explore at:
zip(1177806 bytes)Available download formats
Dataset updated
Mar 2, 2021
Authors
Sathishkumar
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context Your Client FinMan is a financial services company that provides various financial services like loan, investment funds, insurance etc. to its customers. FinMan wishes to cross-sell health insurance to the existing customers who may or may not hold insurance policies with the company. The company recommend health insurance to it's customers based on their profile once these customers land on the website. Customers might browse the recommended health insurance policy and consequently fill up a form to apply. When these customers fill-up the form, their Response towards the policy is considered positive and they are classified as a lead.

Once these leads are acquired, the sales advisors approach them to convert and thus the company can sell proposed health insurance to these leads in a more efficient manner.

Content Demographics (city, age, region etc.) Information regarding holding policies of the customer Recommended Policy Information

Acknowledgements This is dataset is released as part of a hackathon conducted by Analytics Vidhya. Visit https://datahack.analyticsvidhya.com/contest/job-a-thon/#ProblemStatement for more information.
Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary Guidelines for Americans [Dataset]. https://catalog.data.gov/dataset/data-from-an-exploratory-analysis-of-barriers-to-usage-of-the-usda-dietary-guidelines-for--bb6c7
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The average American’s diet does not align with the Dietary Guidelines for Americans (DGA) provided by the U.S. Department of Agriculture and the U.S. Department of Health and Human Services (2020). The present study aimed to compare fruit and vegetable consumption among those who had and had not heard of the DGA, identify characteristics of DGA users, and identify barriers to DGA use. A nationwide survey of 943 Americans revealed that those who had heard of the DGA ate more fruits and vegetables than those who had not. Men, African Americans, and those who have more education had greater odds of using the DGA as a guide when preparing meals relative to their respective counterparts. Disinterest, effort, and time were among the most cited reasons for not using the DGA. Future research should examine how to increase DGA adherence among those unaware of or who do not use the DGA. Comparative analyses of fruit and vegetable consumption among those who were aware/unaware and use/do not use the DGA were completed using independent samples t tests. Fruit and vegetable consumption variables were log-transformed for analysis. Binary logistic regression was used to examine whether demographic features (race, gender, and age) predict DGA awareness and usage. Data were analyzed using SPSS version 28.1 and SAS/STAT® version 9.4 TS1M7 (2023 SAS Institute Inc).
Real Estate Sales 730 Days
kaggle.com
Updated Dec 7, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Real Estate Sales 730 Days [Dataset]. https://www.kaggle.com/datasets/thedevastator/analyzing-hartford-real-estate-sales-over-730-da/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Real Estate Sales 730 Days

City of Hartford real estate sales for the past 2 years

By [source]

About this dataset

This dataset contains data on City of Hartford real estate sales for the last two years, with comprehensive records including property ID, parcel ID, sale date, sale price and more. This dataset is continuously updated each night and sourced from an official reliable source. The columns in this dataset include LocationStartNumber, ApartmentUnitNumber, StreetNameAndWay, LandSF TotalFinishedArea, LivingUnits ,OwnerLastName OwnerFirstName ,PrimaryGrantor ,SaleDate SalePrice ,TotalAppraisedValue and LegalReference - all valuable information to anyone wishing to understand the recent market trends and developments in the City of Hartford real estate industry. With this data providing detailed insights into what properties are selling at what time frame and for how much money – let’s see what secrets we can learn from examining the City of Hartford real estate activity!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains helpful information about homes sold in the Hartford area over the past two years. This data can be used to analyze trends in real estate markets, as well as monitor sales activity for various areas.

In order to use this dataset, you will need knowledge of EDA (Exploratory Data Analysis) such as data cleaning and data visualization techniques. You will also need a basic understanding of SQL queries and Python scripting language.

The first step is to familiarize yourself with the columns and information contained within the dataset by analyzing descriptive statistics like mean, min, max etc. Next you can filter or “slice” the data based on certain criteria or variables that interest you - such as sale date range, location (by street name or zip code), sale price range, type of dwelling unit etc. After using various filters for analysis it is important to take an error-check step by looking for outliers or any discrepancies that may exist - this will ensure more accuracy in results when plotting graphs and visualizing trends via software tools like Tableau and Power BI etc.

Next you can conduct exploratory analysis through plot visualizations of relationships between buyer characteristics (first & last name) vs prices over time; living units vs square footage stats; average price per bedroom/bathroom ratio comparisons etc – all while taking into account external factors such as seasonal changeovers that could affect pricing fluctuations during given intervals across multiple neighborhoods - use interactive maps if available ets. At this point it's easy to compile insightful reports containing commonalities amongst buyers and begin generalizing your findings with extrapolations which allow us gain a better understanding of current market conditions across different demographic spectrums being compared ie traditional Vs luxury properties – all made possible simply through dedicated research with datasets like these!

Research Ideas

Analyzing market trends in the City of Hartford's real estate industry by tracking sale prices and appraised values over time to identify regions who are being under or over valued.

Conducting a predictive analysis project to predict future sales prices, annual appreciation rates, and key features associated with residential properties such as total finished area and living units for investment purposes.

Studying the impact of local zoning laws on property ownership and development by comparing sale dates, primary grantors, legal references, street names and ways in a given area over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: real-estate-sales-730-days-1.csv | Column name | Description | |:------------------------|:---------------------------------------------------------------| | LocationStartNumber | The starting number of the location of the property. (Integer) | | ApartmentUnitNumber | The apartment unit number of the property. (Integer) | | StreetNameAndWay | The st...
Data from: Drastic changes before the 2011 Tohoku earthquake, revealed by...
figshare.com
zip
Updated Feb 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomokazu Konishi (2023). Drastic changes before the 2011 Tohoku earthquake, revealed by exploratory data analysis [Dataset]. http://doi.org/10.6084/m9.figshare.22010279.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22010279.v1
Dataset updated
Feb 4, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Tomokazu Konishi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Tohoku Region
Description
Predicting earthquakes is of the utmost importance, especially to those countries of high risk, and although much effort has been made, it has yet to be realised. Nevertheless, there is a paucity of statistical approaches in seismic studies to the extent that an old theory is believed without verification. Seismic records of time and magnitude in Japan were analysed by exploratory data analysis (EDA). EDA is a parametric statistical approach based on the characteristics of data and is suitable for data-driven investigations. The distribution style of each dataset was determined, and the important parameters were found. This enabled us to identify and evaluate the anomalies in the data. Before the huge 2011 Tohoku earthquake, swarm earthquakes occurred before the main earthquake at improbable frequencies. The frequency and magnitude of all earthquakes increased. Both changes made larger earthquakes more likely to occur: even an M9 earthquake was expected every two years. From these simple measurements, the EDA succeeded in extracting useful information. Detecting and evaluating anomalies using this approach for every set of data would lead to a more accurate prediction of earthquakes.
Steam Action Game's Dataset
kaggle.com
Updated Sep 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wajih ul Hassan (2024). Steam Action Game's Dataset [Dataset]. https://www.kaggle.com/datasets/wajihulhassan369/steam-games-dataset/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wajih ul Hassan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains a large amount of information on action games available on the Steam platform.

It contains game titles, tags, release dates, prices, and more.

The information is useful for studying game patterns, price modeling, and investigating the correlations between game tags and pricing methods. This dataset is useful for both gamers and Data Scientist who want to conduct exploratory data analysis, construct machine learning models, or investigate the gaming industry.

Name : Contains the name of games

Price : Price of Games in $

Release_date : When was the game released

Review_no : How many reviews were given to game

Review_type : How was the Reviews ('Very Positive', 'Mostly Positive', 'Mixed', 'Positive', 'Overwhelmingly Positive', 'Mostly Negative', 'Very Negative', 'Overwhelmingly Negative')

Tags : The different tags given to the game e.g., Adventure,Fantasy etc

Description : The description of Game
s
Exploratory Factor Analysis split - Insomnia negative affect and paranoia
orda.shef.ac.uk
bin
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Scott (2023). Exploratory Factor Analysis split - Insomnia negative affect and paranoia [Dataset]. http://doi.org/10.15131/shef.data.5331739
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.5331739
Dataset updated
May 31, 2023
Dataset provided by
The University of Sheffield
Authors
Alexander Scott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our full dataset was randomly split in half, we conducted an exploratory factor analysis (EFA) on one half of the dataset and a confirmatory factor analysis (CFA) on the other.This dataset represents the exploratory factor analysis dataset and forms the basis of the EFA presented in the PLoS paper: Scott, A.J., Rowse, G. and Webb, T.L. (2017) A structural equation model of the relationship between insomnia, negative affect, and paranoid thinking. PLoS One, 12 (10). e0186233. DOI:10.1371/journal.pone.0186233
m
Data for "Best Practices for Your Exploratory Factor Analysis: a Factor...
data.mendeley.com
Updated Aug 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pablo Rogers (2021). Data for "Best Practices for Your Exploratory Factor Analysis: a Factor Tutorial" published by RAC-Revista de Administração Contemporânea [Dataset]. http://doi.org/10.17632/rdky78bk8r.2
Explore at:
Unique identifier
https://doi.org/10.17632/rdky78bk8r.2
Dataset updated
Aug 17, 2021
Authors
Pablo Rogers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains material related to the analysis performed in the article "Best Practices for Your Exploratory Factor Analysis: a Factor Tutorial". The material includes the data used in the analyses in .dat format, the labels (.txt) of the variables used in the Factor software, the outputs (.txt) evaluated in the article, and videos (.mp4 with English subtitles) recorded for the purpose of explaining the article. The videos can also be accessed in the following playlist: https://youtube.com/playlist?list=PLln41V0OsLHbSlYcDszn2PoTSiAwV5Oda. Below is a summary of the article:

"Exploratory Factor Analysis (EFA) is one of the statistical methods most widely used in Administration, however, its current practice coexists with rules of thumb and heuristics given half a century ago. The purpose of this article is to present the best practices and recent recommendations for a typical EFA in Administration through a practical solution accessible to researchers. In this sense, in addition to discussing current practices versus recommended practices, a tutorial with real data on Factor is illustrated, a software that is still little known in the Administration area, but freeware, easy to use (point and click) and powerful. The step-by-step illustrated in the article, in addition to the discussions raised and an additional example, is also available in the format of tutorial videos. Through the proposed didactic methodology (article-tutorial + video-tutorial), we encourage researchers/methodologists who have mastered a particular technique to do the same. Specifically, about EFA, we hope that the presentation of the Factor software, as a first solution, can transcend the current outdated rules of thumb and heuristics, by making best practices accessible to Administration researchers".
A
‘Hr Analytics Job Prediction’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Hr Analytics Job Prediction’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-hr-analytics-job-prediction-4c7a/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Hr Analytics Job Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mfaisalqureshi/hr-analytics-and-job-prediction on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Hr Data Analytics This dataset contains information about employees who worked in a company.

Content

This dataset contains columns: Satisfactory Level, Number of Project, Average Monthly Hours, Time Spend Company, Promotion Last 5
Years, Department, Salary

Acknowledgements

You can download, copy and share this dataset for analysis and Predictions employees Behaviour.

Inspiration

Answer the following questions would be worthy 1- Do Exploratory Data analysis to figure out which variables have a direct and clear impact on employee retention (i.e. whether they leave the company or continue to work) 2- Plot bar charts showing the impact of employee salaries on retention 3- Plot bar charts showing a correlation between department and employee retention 4- Now build a logistic regression model using variables that were narrowed down in step 1 5- Measure the accuracy of the model

--- Original source retains full ownership of the source dataset ---
n
Data from: Research and exploratory analysis driven - time-data...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko (2022). Research and exploratory analysis driven - time-data visualization (read-tv) software [Dataset]. http://doi.org/10.5061/dryad.d51c5b02g
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.d51c5b02g
Dataset updated
Jan 30, 2022
Dataset provided by
Medical University of South Carolina
Authors
John Del Gaizo; Kenneth Catchpole; Alexander Alekseyenko
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
read-tv

The main paper is about, read-tv, open-source software for longitudinal data visualization. We uploaded sample use case surgical flow disruption data to highlight read-tv's capabilities. We scrubbed the data of protected health information, and uploaded it as a single CSV file. A description of the original data is described below.

Data source

Surgical workflow disruptions, defined as “deviations from the natural progression of an operation thereby potentially compromising the efficiency or safety of care”, provide a window on the systems of work through which it is possible to analyze mismatches between the work demands and the ability of the people to deliver the work. They have been shown to be sensitive to different intraoperative technologies, surgical errors, surgical experience, room layout, checklist implementation and the effectiveness of the supporting team. The significance of flow disruptions lies in their ability to provide a hitherto unavailable perspective on the quality and efficiency of the system. This allows for a systematic, quantitative and replicable assessment of risks in surgical systems, evaluation of interventions to address them, and assessment of the role that technology plays in exacerbation or mitigation.

In 2014, Drs Catchpole and Anger were awarded NIBIB R03 EB017447 to investigate flow disruptions in Robotic Surgery which has resulted in the detailed, multi-level analysis of over 4,000 flow disruptions. Direct observation of 89 RAS (robitic assisted surgery) cases, found a mean of 9.62 flow disruptions per hour, which varies across different surgical phases, predominantly caused by coordination, communication, equipment, and training problems.

Methods This section does not describe the methods of read-tv software development, which can be found in the associated manuscript from JAMIA Open (JAMIO-2020-0121.R1). This section describes the methods involved in the surgical work flow disruption data collection. A curated, PHI-free (protected health information) version of this dataset was used as a use case for this manuscript.

Observer training

Trained human factors researchers conducted each observation following the completion of observer training. The researchers were two full-time research assistants based in the department of surgery at site 3 who visited the other two sites to collect data. Human Factors experts guided and trained each observer in the identification and standardized collection of FDs. The observers were also trained in the basic components of robotic surgery in order to be able to tangibly isolate and describe such disruptive events.

Comprehensive observer training was ensured with both classroom and floor training. Observers were required to review relevant literature, understand general practice guidelines for observing in the OR (e.g., where to stand, what to avoid, who to speak to), and conduct practice observations. The practice observations were broken down into three phases, all performed under the direct supervision of an experienced observer. During phase one, the trainees oriented themselves to the real-time events of both the OR and the general steps in RAS. The trainee was also introduced to the OR staff and any other involved key personnel. During phase two, the trainer and trainee observed three RAS procedures together to practice collecting FDs and become familiar with the data collection tool. Phase three was dedicated to determining inter-rater reliability by having the trainer and trainee simultaneously, yet independently, conduct observations for at least three full RAS procedures. Observers were considered fully trained if, after three full case observations, intra-class correlation coefficients (based on number of observed disruptions per phase) were greater than 0.80, indicating good reliability.

Data collection

Following the completion of training, observers individually conducted observations in the OR. All relevant RAS cases were pre-identified on a monthly basis by scanning the surgical schedule and recording a list of procedures. All procedures observed were conducted with the Da Vinci Xi surgical robot, with the exception of one procedure at Site 2, which was performed with the Si robot. Observers attended those cases that fit within their allotted work hours and schedule. Observers used Microsoft Surface Pro tablets configured with a customized data collection tool developed using Microsoft Excel to collect data. The data collection tool divided procedures into five phases, as opposed to the four phases previously used in similar research, to more clearly distinguish between task demands throughout the procedure. Phases consisted of phase 1 - patient in the room to insufflation, phase 2 -insufflation to surgeon on console (including docking), phase 3 - surgeon on console to surgeon off console, phase 4 - surgeon off console to patient closure, and phase 5 - patient closure to patient leaves the operating room. During each procedure, FDs were recorded into the appropriate phase, and a narrative, time-stamp, and classification (based off of a robot-specific FD taxonomy) were also recorded.

Each FD was categorized into one of ten categories: communication, coordination, environment, equipment, external factors, other, patient factors, surgical task considerations, training, or unsure. The categorization system is modeled after previous studies, as well as the examples provided for each FD category.

Once in the OR, observers remained as unobtrusive as possible. They stood at an appropriate vantage point in the room without getting in the way of team members. Once an appropriate time presented itself, observers introduced themselves to the circulating nurse and informed them of the reason for their presence. Observers did not directly engage in conversations with operating room staff, however, if a staff member approached them with any questions/comments they would respond.

Data Reduction and PHI (Protected Health Information) Removal

This dataset uses 41 of the aforementioned surgeries. All columns have been removed except disruption type, a numeric timestamp for number of minutes into the day, and surgical phase. In addition, each surgical case had it's initial disruption set to 12 noon, (720 minutes).
o
Fake-Real News
opendatabay.com
.undefined
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Fake-Real News [Dataset]. https://www.opendatabay.com/data/ai-ml/3d64e244-a70c-4dec-9a82-b550be89e373
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Datasimple
Area covered
Entertainment & Media Consumption
Description
Context As we all know, Fake-News has become the centre of attraction worldwide because of its hazardous impact on our society. One of the recent example is spread of Fake-news related to Covid-19 cure, precautions, and symptoms and you must be understood by now, how dangerous this bogus information could be. Distorted piece of information propagated at the times of election for achieving political agenda is not hidden from anyone.

Fake news is quickly becoming an epidemic, and it alarms and angers me how often and how rapidly totally fabricated stories circulate. Why? In the first place, the deceptive effect: the fact that if a lie is repeated enough times, you’ll begin to believe it’s true.

You understand by now that fake news and other types of false information can take on various appearances. They can likewise have significant effects, because information shapes our world view: we make important decisions based on information. We form an idea about people or a situation by obtaining information. So if the information we saw on the Web is invented, false, exaggerated or distorted, we won’t make good decisions.

Hence, Its in dire need to do something about it and It's a Big Data problem, where data scientist can contribute from their end to fight against Fake-News.

Content Although, fighting against fake-News is a big data problem but I have created this small dataset having approx. 10,000 piece of news article and meta-data scraped through approx. 600 web-pages of Politifact website to analyse it using data science skills and get some insights of how can we stop spread of misinformation at broader aspect and what approach will give us better accuracy to achieve the same.

This dataset is having 6 attributes among which News_Headline is the most important to us in order to classify news as FALSE or TRUE. As you notice the Label attribute clearly, there are 6 classes specified in it. So, it's totally up-to you whether you want to use my dataset for multi-class classification or convert these class labels into FALSE or TRUE and then, perform binary classification. Although, for your convenience, I will write a notebook on how to convert this dataset from multi-class to binary-class. To deal with the text data, you need to have good hands on practice on NLP & Data-Mining concepts.

News_Headline - contains piece of information that has to be analysed. Link_Of_News - contains url of News Headlines specified in very first column. Source - this column contains author names who has posted the information on facebook, instagram, twitter or any other social-media platform. Stated_On - This column contains date when the information is posted by the authors on different social-media platforms. Date - This column contains date when this piece of information is analysed by politifact team of fact-checkers in order to labelize as FAKE or REAL. Label - This column contains 5 class labels : True, Mostly-True, Half-True, Barely-True, False, Pants on Fire. So, you can either perform multi-class classification on it or convert Mostly-True, Half-True, Barely-True as True and drop Pants on Fire and perform Binary-class classification.

Acknowledgements A very Big thanks to fact-checking team of Politifact.com website as they provide with correct labels by working hard manually. So that we data science people can take advantage to train our models on such labels and make better models. These are some research papers that will help you to get start with the project and clear your fundamentals.

Big Data and quality data for fake news and misinformation detection by Fatemeh Torabi Asr, Maite Taboada

Automatic deception detection: Methods for finding fake news by Nadia K. Conroy Victoria L. Rubin Yimin Chen

Inspiration I want to see which approach can solve this problem of combating Fake-News with greater accuracy.

License

CC BY-SA

Original Data Source: Fake-Real News

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmad, Imad; Rasheed, Ibtassam; Man, Yip Chi (2023). Exploratory Data Analysis of Airbnb Data [Dataset]. http://doi.org/10.5683/SP3/F2OCZF

Exploratory Data Analysis of Airbnb Data

Explore at:

Unique identifier

https://doi.org/10.5683/SP3/F2OCZF

Dataset updated

Dec 28, 2023

Dataset provided by

Borealis

Authors

Ahmad, Imad; Rasheed, Ibtassam; Man, Yip Chi

Description

Airbnb® is an American company operating an online marketplace for lodging, primarily for vacation rentals. The purpose of this study is to perform an exploratory data analysis of the two datasets containing Airbnb® listings and across 10 major cities. We aim to use various data visualizations to gain valuable insight on the effects of pricing, covid, and more!

Clear search

Close search

Google apps

Main menu

Exploratory Data Analysis of Airbnb Data

Data from: FactExplorer: Fact Embedding-Based Exploratory Data Analysis for...

Exploratory Data Analysis (EDA) Tools Report

YouTube Trending Videos of the Day

SY - Syria

License

Data and Code for Exploratory Factor Analysis in Sample 1

COVID 19 Dataset

Context

Content

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

Guns incident data

Apple IPhone Customer Reviews

🇸🇬 Shopee App Reviews from Google Store

Health Insurance Lead Prediction

Data from: An Exploratory Analysis of Barriers to Usage of the USDA Dietary...

Real Estate Sales 730 Days

Real Estate Sales 730 Days

City of Hartford real estate sales for the past 2 years

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Data from: Drastic changes before the 2011 Tohoku earthquake, revealed by...

Steam Action Game's Dataset

Exploratory Factor Analysis split - Insomnia negative affect and paranoia

Data for "Best Practices for Your Exploratory Factor Analysis: a Factor...

‘Hr Analytics Job Prediction’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Data from: Research and exploratory analysis driven - time-data...

Fake-Real News

License

Exploratory Data Analysis of Airbnb Data