75 datasets found
  1. c

    Hacker News Sentiment Analysis Dataset

    • cubig.ai
    zip
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Hacker News Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/586/hacker-news-sentiment-analysis-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Hacker News Sentiment Analysis Dataset is a technology community public opinion analysis data that provides an emotional analysis (polarity, subjectivity, and emotional categories) of each of the top 141 hacker news posts along with the title, URL, point, and comment count.

    2) Data Utilization (1) Hacker News Sentiment Analysis Dataset has characteristics that: • This dataset includes polar (-1-1), subjectivity (0-1), and category (positive/neutral/negative) columns that quantify the sentiment of comments using TextBlob, based on the latest top posts as of June 24, 2025. • It is generated through web scraping and NLP preprocessing, and allows for quantitative comparison of community responses to technology news. (2) Hacker News Sentiment Analysis Dataset can be used to: • Visualize technology trends Emotional: Connect emotional scores with post topics to visually analyze community response patterns to specific technology news such as AI and policies. • NLP Model Learning: Emotional classification models can be trained using comment data with real-world technical discussions or applied to research on the subjectivity prediction of comments.

  2. c

    ckanext-data-comparision

    • catalog.civicdataecosystem.org
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). ckanext-data-comparision [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-data-comparision
    Explore at:
    Dataset updated
    Jun 4, 2025
    Description

    The Data Comparision extension for CKAN allows users to compare data from CSV and XLSX files through visualizations. This extension aims to enhance data analysis capabilities within CKAN by providing a direct visual comparison of data sets, facilitating a better understanding of the content. The extension is compatible with CKAN 2.9, providing an extended feature set for data comparison. Key Features: CSV/XLSX Data Support: Enables the comparison of data stored in common tabular formats such as CSV and XLSX files by leveraging the extension's visualization capabilities. Visual Data Comparison: Supports visualizing the data for side-by-side comparisons, allowing users to easily identify differences and similarities between datasets. Chart.js Integration: Relies on Chart.js library for generating data visualizations, specifically for the comparison feature. This ensures compatibility and a wide selection of chart formats. Technical Integration: The extension needs to be added to the ckan.plugins setting in the CKAN configuration file (/etc/ckan/default/ckan.ini by default). It also requires installing Chart.js via npm. After these configurations and a CKAN restart, the plugin extends CKAN's user interface with data comparison features. Benefits & Impact: By facilitating a comparative view of data stored in CSV and XLSX formats, the Data Comparison extension reduces the effort needed to analyze datasets. This enables better decision-making based on clear, visually represented data comparisons. Because the data is visualized, differences and similarities are noted quicker than non-visualized data.

  3. R

    Books Dataset

    • universe.roboflow.com
    zip
    Updated Dec 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    npubooks (2022). Books Dataset [Dataset]. https://universe.roboflow.com/npubooks/books-ul2pd/model/54
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2022
    Dataset authored and provided by
    npubooks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    10000 Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Library Management: Use the "books" model to help librarians identify and categorize books based on their covers, making it easier to manage book inventory, re-shelve returned books, and locate misplaced books.

    2. Bookstore Assistance: Implement the "books" model in bookstores' mobile apps or in-store kiosks, allowing customers to quickly find books they are looking for and discover related titles by snapping a picture of a book cover.

    3. Book Club Recommendation Engine: Use the "books" model as the basis for a book club app that recommends new titles based on the book covers from previous selections, helping users explore new genres and authors.

    4. Accessibility for Visually Impaired: Integrate the "books" model into software or devices designed for visually impaired individuals, helping them identify and select books without needing to rely on others for assistance.

    5. Academic Research: Utilize the "books" model in academic research projects to identify books and their authors for literary, historical, or sociological studies, making it easier to analyze large datasets of visual book cover material.

  4. Uber Customer Reviews Dataset (2024)

    • kaggle.com
    zip
    Updated Dec 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). Uber Customer Reviews Dataset (2024) [Dataset]. https://www.kaggle.com/datasets/kanchana1990/uber-customer-reviews-dataset-2024/code
    Explore at:
    zip(469693 bytes)Available download formats
    Dataset updated
    Dec 19, 2024
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Title: Uber Customer Reviews Dataset (2024)
    Subtitle: Sentiment Analysis and Insights from 12,000+ Google Play Store Reviews

    Dataset Overview

    This dataset contains over 12,000 customer reviews of the Uber app collected from the Google Play Store. The reviews provide insights into user experiences, including ratings, feedback on services, and developer responses. The data is cleaned and anonymized to ensure privacy compliance and ethical usage. It serves as a valuable resource for sentiment analysis, natural language processing (NLP), and machine learning applications.

    Data Science Applications

    1. Sentiment Analysis:

      • Classify reviews as positive, neutral, or negative based on score and content.
      • Identify trends in customer satisfaction over time.
    2. Natural Language Processing (NLP):

      • Perform topic modeling to uncover themes in customer feedback.
      • Generate word clouds to visualize frequently used words in reviews.
    3. Machine Learning:

      • Train a sentiment classification model using review text as input.
      • Predict customer satisfaction based on textual feedback.
    4. Business Insights:

      • Analyze user feedback to identify strengths (e.g., "friendly drivers") and weaknesses (e.g., "long wait times").
      • Compare app performance across different versions (appVersion).
    5. Time Series Analysis:

      • Track changes in average scores over time using the at column.
      • Identify seasonal patterns or anomalies in customer feedback.
    6. Customer Behavior Analysis:

      • Study the correlation between score and thumbsUpCount to understand review impact.
      • Analyze the effect of developer replies (replyContent) on customer satisfaction.

    Column Descriptors

    | Column Name | Description

    | userName | Anonymized username of the reviewer.
    | userImage | URL of the reviewer's profile image (if available).
    | content | Text content of the review describing the user's experience.
    | score | Numerical rating given by the user (1–5).
    | thumbsUpCount | Number of likes received by the review.
    | reviewCreatedVersion | App version at the time of review creation (if available).
    | at | Timestamp indicating when the review was posted.
    | replyContent | Developer's response to the review (if any).
    | repliedAt | Timestamp indicating when the developer replied (if any).
    | appVersion | App version string associated with the review (if available).

    Ethically Mined Data

    This dataset was collected in compliance with ethical web scraping practices: - Data was sourced from publicly available Google Play Store reviews. - Personally identifiable information (PII) such as email addresses,images or phone numbers has been removed.

    Acknowledgements

    • Google Play Store: For providing access to publicly available app reviews.
    • DALL·E 3: For generating a visually appealing dataset image.
  5. Company Product Sales Analysis & BI Report

    • kaggle.com
    zip
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oluwabori Abiodun-Johnson (2023). Company Product Sales Analysis & BI Report [Dataset]. https://www.kaggle.com/datasets/oluwaboriaj/pizza-company-sales-bi-report
    Explore at:
    zip(15967889 bytes)Available download formats
    Dataset updated
    Oct 25, 2023
    Authors
    Oluwabori Abiodun-Johnson
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is a self-guided project.

    PROBLEM STATEMENT: What underlying trends could the company be missing out on in our Pizza Sales data that can aid in gap analysis of its business sales.

    OBJECTIVES: 1. Generate Key Performance Indicators (KPIs) of the Pizza Sales data for insight gain into underlying business performance. 2. Visualize important aspects of the Pizza Sales data to gain insight and understand key trends\

    I dived into the csv dataset to uncover patterns within the Pizza Sales data which spanned across a calendar.

    Used Microsoft SQL SMSS to perform EDA (Exploratory Data Analysis); ergo, identifying trends and sales patterns.

    Having completed that, I used the Microsoft Power BI to create a visualization as a means to visually represent of my analytical findings to technical and non-technical viewers.

    STEPS COMPLETED: Data Importation SQL Data analysis query writing Data Cleaning Data Processing Data Visualization Report/Dashboard Development

  6. n

    Data from: Creating a multi-track classical music performance dataset for...

    • data-staging.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +5more
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bochen Li; Xinzhao Liu; Karthik Dinesh; Zhiyao Duan; Gaurav Sharma (2019). Creating a multi-track classical music performance dataset for multi-modal music analysis: challenges, insights, and applications [Dataset]. http://doi.org/10.5061/dryad.ng3r749
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 20, 2019
    Authors
    Bochen Li; Xinzhao Liu; Karthik Dinesh; Zhiyao Duan; Gaurav Sharma
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    We introduce a dataset for facilitating audio-visual analysis of musical performances. The dataset comprises 44 simple multi-instrument classical music pieces assembled from coordinated but separately recorded performances of individual tracks. For each piece, we provide the musical score in MIDI format, the audio recordings of the individual tracks, the audio and video recording of the assembled mixture, and ground- truth annotation files including frame-level and note-level tran- scriptions. We describe our methodology for the creation of the dataset, particularly highlighting our approaches for addressing the challenges involved in maintaining synchronization and ex- pressiveness. We demonstrate the high quality of synchronization achieved with our proposed approach by comparing the dataset against existing widely-used music audio datasets. We anticipate that the dataset will be useful for the devel- opment and evaluation of existing music information retrieval (MIR) tasks, as well as for novel multi-modal tasks. We bench- mark two existing MIR tasks (multi-pitch analysis and score- informed source separation) on the dataset and compare against other existing music audio datasets. Additionally, we consider two novel multi-modal MIR tasks (visually informed multi-pitch analysis and polyphonic vibrato analysis) enabled by the dataset and provide evaluation measures and baseline systems for future comparisons (from our recent work). Finally, we propose several emerging research directions that the dataset enables.

  7. Classic confusion matrix to visually analyze the classification performance...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David C. Molik; DeAndre Tomlinson; Shane Davitt; Eric L. Morgan; Matthew Sisk; Benjamin Roche; Natalie Meyers; Michael E. Pfrender (2023). Classic confusion matrix to visually analyze the classification performance of an algorithm. [Dataset]. http://doi.org/10.1371/journal.pntd.0008755.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David C. Molik; DeAndre Tomlinson; Shane Davitt; Eric L. Morgan; Matthew Sisk; Benjamin Roche; Natalie Meyers; Michael E. Pfrender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classic confusion matrix to visually analyze the classification performance of an algorithm.

  8. Dataset used for analysis.

    • plos.figshare.com
    xlsx
    Updated Jul 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chi Thanh Bui; Thi Thuy An Ngo; Huynh Khanh Long Chau; Nguyen Phuc Nguyen Tran (2025). Dataset used for analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0328093.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 10, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Chi Thanh Bui; Thi Thuy An Ngo; Huynh Khanh Long Chau; Nguyen Phuc Nguyen Tran
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In today’s digital landscape, visual content plays a crucial role in shaping consumer behavior. This study explores how visual electronic word-of-mouth (eWOM) on social media influences online purchase intention, applying the Stimulus-Organism-Response (SOR) framework. Using Partial Least Squares Structural Equation Modeling (PLS-SEM) to analyze data from 335 social media users, this study examines the effects of visual eWOM’s quality, quantity, and credibility on consumer perceptions, attitudes, and ultimately their purchase intentions. Our findings reveal that the quality and credibility of visual eWOM significantly enhance perceived information usefulness and its adoption by consumers. Information quantity, however, primarily influences attitudes towards the information, but does not directly drive its adoption. Contrary to expectations, information usefulness alone cannot predict purchase intention. Instead, information adoption emerges as a key mediator, indicating that consumers must actively engage with and internalize visual content for it to impact their buying behavior. This underscores that the effectiveness of visual eWOM is not solely based on its characteristics but depends on consumers’ active engagement and processing. These insights highlight the need for content that is not only visually appealing but also credible and engaging to facilitate information adoption and drive purchase intentions. This study enhances the understanding of visual eWOM’s impact on online purchasing and provides valuable insights for marketers aiming to optimize digital engagement strategies.

  9. Uber Trip Analysis with Power BI

    • kaggle.com
    zip
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Raj (2025). Uber Trip Analysis with Power BI [Dataset]. https://www.kaggle.com/datasets/ssrai7/uber-trip-analysis-with-power-bi/code
    Explore at:
    zip(12995785 bytes)Available download formats
    Dataset updated
    Jul 23, 2025
    Authors
    Sahil Raj
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🚖 Uber Data Analysis Dashboard (Power BI)

    This dataset is part of a dashboard project that analyzes Uber ride behavior across different time patterns – built using Microsoft Power BI.

    🔍 Project Highlights:

    • Analyze ride volumes across hours, days, and months
    • See peak times and hotspots visually
    • Interactive visuals built in Power BI
    • Cleaned and prepared Excel dataset also provided

    📂 Files:

    • Uber Trip Details.xlsx – Cleaned dataset
    • Uber.pbix – Power BI Dashboard file

    🌐 Related Links:

    Feel free to fork, reuse, or share feedback!

  10. c

    Amazon pets category images dataset

    • crawlfeeds.com
    jpg, zip
    Updated Sep 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Amazon pets category images dataset [Dataset]. https://crawlfeeds.com/datasets/amazon-pets-category-images-dataset
    Explore at:
    jpg, zipAvailable download formats
    Dataset updated
    Sep 14, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Amazon Pets Category Images Dataset is a curated collection of high-resolution images sourced from the pet products category on Amazon. This dataset contains images across various subcategories, such as pet food, toys, grooming tools, bedding, and accessories. With a wide range of products for pets like dogs, cats, birds, and more, this dataset is perfect for researchers, developers, and businesses interested in studying product visuals, conducting market analysis, or training AI models focused on pet-related imagery.

    The dataset consists solely of product images, without accompanying metadata or descriptions, offering a straightforward resource for visual analysis, product comparison, or training image-based machine learning models.

    Key Features:

    • Product Categories: Includes pet food, toys, grooming products, bedding, and accessories.
    • Image Quality: High-resolution images suitable for detailed visual analysis and machine learning.
    • Pet Types: Covers a variety of pet-related products for dogs, cats, birds, fish, and more.
    • Source: Extracted from Amazon’s pet products section.
    • Image Count: Hundreds to thousands of images based on different product categories.

    Use Cases:

    • Image Classification and AI Training: Use the dataset to train machine learning models for product identification and categorization.
    • Product Comparison: Compare different product designs and packaging in the pet industry using visual data.
    • Visual Marketing Research: Analyze how pet products are visually represented on Amazon for trends and branding strategies.
    • Content Creation: Leverage these images for creating pet-related content, including marketing, advertising, and social media.

  11. c

    Facebook Dataset

    • cubig.ai
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Facebook Dataset [Dataset]. https://cubig.ai/store/products/269/facebook-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Facebook Data is a social network analysis data that can be used to identify key user groups that can contribute to business growth and develop recommendation strategies, including Facebook users' activity patterns, interactions, likes, friendships, gender, and age.

    2) Data Utilization (1) Facebook Data has characteristics that: • This dataset consists of numerical and categorical variables such as user ID, gender, age, number of friends, number of likes (mobile/web), number of friend requests, number of likes received/sent, and frequency of activities, allowing you to analyze user-specific behavioral characteristics and interaction patterns from multiple angles. (2) Facebook Data can be used to: • Core User Group Targeting and Recommendation Strategies: Use key characteristics such as gender, age, frequency of activity, friends and likes to identify user groups that have a significant impact on business growth and to develop customized content and advertising recommendation strategies. • Analysis of Usage Behavior and Platform Trends: Mobile and Web-based Good By analyzing data such as distribution, age and gender activity patterns, and friend relationship formation, you can visually explore changes in user usage behavior and major trends within the platform.

  12. c

    Titanic Dataset

    • cubig.ai
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Titanic Dataset [Dataset]. https://cubig.ai/store/products/393/titanic-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • Based on passenger information from the Titanic, which sank in 1912, the Titanic Dataset is a representative binary classification data that includes various demographics and boarding information such as Survived, Passengers Class, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, and Embarked.

    2) Data Utilization (1) Titanic Dataset has characteristics that: • It consists of a total of 891 training samples and 12 to 15 columns (numerical and categorical mix) and also includes variables such as Age, Cabin, and Embarked with some missing values, making it suitable for preprocessing and feature engineering practice. (2) Titanic Dataset can be used to: • Development of survival prediction models: Key characteristics such as passenger rating, gender, age, and fare can be used to predict survival with different machine learning classification models such as logistic regression, random forest, and SVM. • Analysis of survival influencing factors: By analyzing the correlation between variables such as gender, age, socioeconomic status, and survival rates, you can statistically and visually explore which groups have a higher survival probability.

  13. D

    Biological Data Visualization Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Biological Data Visualization Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-biological-data-visualization-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Biological Data Visualization Market Outlook



    The global biological data visualization market size was valued at approximately USD 800 million in 2023 and is expected to reach USD 2.2 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 12%. The rising volume of biological data generated through various research activities and the increasing need for advanced analytical tools are key factors driving this market's growth. The integration of artificial intelligence and machine learning in data visualization tools, combined with the growing application of biological data visualization in personalized medicine, are also significant growth drivers.



    One of the primary growth factors of the biological data visualization market is the exponential increase in biological data generation due to advancements in high-throughput technologies such as next-generation sequencing (NGS), mass spectrometry, and microarray technology. These technologies produce vast amounts of data that require sophisticated visualization tools for proper analysis and interpretation. Without effective visualization, the potential insights and discoveries within this data may remain untapped, underscoring the market's critical role in modern biological research.



    Additionally, the increasing prevalence of complex diseases and the subsequent demand for personalized medicine are fueling the demand for advanced data visualization tools. Personalized medicine relies heavily on the analysis of genetic, proteomic, and other biological data to tailor treatments to individual patients. Effective visualization tools facilitate the interpretation of this complex data, enabling healthcare providers to make informed clinical decisions. This trend is expected to drive substantial growth in the biological data visualization market over the forecast period.



    Moreover, there is a growing adoption of cloud-based visualization solutions. Cloud deployment offers significant advantages, including scalability, cost-effectiveness, and accessibility from various locations. This is particularly beneficial for academic and research institutions and smaller biotech companies with limited resources. The integration of cloud computing with advanced visualization tools is expected to further propel market growth, as it allows for more efficient handling and analysis of large datasets.



    From a regional perspective, North America currently holds the largest market share, driven by significant investments in research and development, advanced healthcare infrastructure, and high adoption rates of advanced technologies. Europe follows closely, with substantial growth attributed to government support for research initiatives and a strong presence of pharmaceutical and biotech companies. The Asia Pacific region is anticipated to witness the highest CAGR, owing to increasing investments in biotech research, growing healthcare infrastructure, and expanding adoption of advanced technologies in countries like China and India.



    In the realm of Life Sciences Analytics, the role of data visualization is becoming increasingly pivotal. Life Sciences Analytics involves the use of data-driven insights to enhance research and development, clinical trials, and patient care. By leveraging advanced visualization tools, researchers and healthcare professionals can gain a deeper understanding of complex biological data, leading to more informed decisions and innovative solutions. The integration of Life Sciences Analytics with data visualization not only facilitates the interpretation of vast datasets but also accelerates the discovery of new patterns and correlations, ultimately advancing the field of personalized medicine.



    Component Analysis



    The biological data visualization market by component is segmented into software and services. Software solutions constitute the bulk of the market, providing tools that are essential for processing and visually representing complex biological data. These software tools range from basic data plotting programs to advanced systems incorporating machine learning algorithms for predictive modeling. The demand for these tools is driven by their ability to handle large datasets, provide user-friendly interfaces, and offer real-time data visualization capabilities, which are crucial for both research and clinical applications.



    In contrast, the services segment, although smaller, plays a crucial role in the market. Services include co

  14. IRIS FLOWER-plot images dataset

    • kaggle.com
    zip
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rijab Butt (2024). IRIS FLOWER-plot images dataset [Dataset]. https://www.kaggle.com/datasets/irijabbutt/iris-flower-plot-image-dataset
    Explore at:
    zip(191988944 bytes)Available download formats
    Dataset updated
    Jun 6, 2024
    Authors
    Rijab Butt
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    IRIS FLOWER SCATTER PLOT IMAGES DATASET

    Overview

    This dataset is derived from the well-known Iris flower dataset and contains 5000 images in PNG format. These images represent scatter plots that visually capture the relationships between different pairs of features in the Iris dataset. The original Iris dataset consists of 150 samples from three species of Iris flowers (Iris setosa, Iris versicolor, and Iris virginica), with each sample having four features: sepal length, sepal width, petal length, and petal width. The scatter plot images in this dataset provide visual insights into how these features correlate and differentiate the three species.

    Dataset Description

    • Total Images: 5000 PNG images
    • Image Format: PNG
    • Resolution: High-resolution scatter plots (resolution details can be specified)
    • Source: Derived from the Iris dataset available in Scikit-learn
    • Feature Pairs: Scatter plots are generated for all possible pairs of features (sepal length vs. sepal width, petal length vs. petal width, etc.) ##**Features of the Dataset** Diverse Visual Representations: The dataset includes scatter plots with various feature pairings, providing comprehensive visual analysis of feature relationships. Species Differentiation: Each scatter plot clearly distinguishes between the three species of Iris flowers using different colors or markers. High Quality: The images are generated with high-quality plotting techniques to ensure clarity and precision in the representation of data points. Annotations: Scatter plots are annotated with axes labels and legends to facilitate easy interpretation. Randomized Samples: The dataset contains 5000 images, which implies multiple scatter plots for each pair of features, with randomized sample selections to cover different aspects and variations within the dataset. ##**Use Cases** Data Visualization: Ideal for educational purposes to demonstrate data visualization techniques and the importance of scatter plots in exploratory data analysis. Machine Learning: Useful for training machine learning models on image recognition tasks, particularly in distinguishing between different species based on visual patterns. Research and Analysis: Can be used in research studies that require a large number of scatter plot images for testing new algorithms in image processing or pattern recognition. ##**Conclusion** The Iris Flower Scatter Plot Images Dataset provides a rich resource for visual data analysis, machine learning training, and educational purposes. By leveraging the classic Iris dataset, it offers a unique way to explore feature relationships through high-quality scatter plot images.
  15. R

    Data from: Camera Movements Dataset

    • universe.roboflow.com
    zip
    Updated Dec 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diploma thesis (2023). Camera Movements Dataset [Dataset]. https://universe.roboflow.com/diploma-thesis/camera-movements
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 9, 2023
    Dataset authored and provided by
    Diploma thesis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Veins EoFK Polygons
    Description

    Here are a few use cases for this project:

    1. Manga Content Curation: Utilize the "elements in manga" computer vision model to categorize manga based on specific visual features (cry, vein, chibi, kimono, focus, movement), making it easier for readers to discover content according to their interests and preferences.

    2. Automatic Manga Translation Assistance: Assist translators working on manga localization by identifying key visual elements (cry, vein, chibi, kimono, focus, movement) to inform an accurate and culturally sensitive translation of the material, preserving the original artistic intent.

    3. Artistic Style Analysis: Compare and contrast different manga artists' styles and techniques by analyzing the use of key visual elements (cry, vein, chibi, kimono, focus, movement) in their creations, providing valuable insights for aspiring manga creators and enthusiasts.

    4. Manga Storytelling Aid: Enhance storytelling in manga creation by using the "elements in manga" computer vision model to analyze the impact of specific visual elements (cry, vein, chibi, kimono, focus, movement) on storytelling, pacing, and emotional impact.

    5. Manga Comics Accessibility: Improve accessibility for visually impaired readers by combining the "elements in manga" computer vision model with natural language processing to generate detailed and accurate image descriptions based on the presence of key visual elements (cry, vein, chibi, kimono, focus, movement), making manga content more accessible through screen readers or braille displays.

  16. R

    Imagedetection Dataset

    • universe.roboflow.com
    zip
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    custom yolov5 (2023). Imagedetection Dataset [Dataset]. https://universe.roboflow.com/custom-yolov5-fwa2b/imagedetection-kf1ww/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2023
    Dataset authored and provided by
    custom yolov5
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Letters Bounding Boxes
    Description

    Here are a few use cases for this project:

    1. Educational Application: This model could be used in educational applications or games designed for children learning to recognize letters or digits. It could help in providing immediate feedback to learners by identifying whether the written letter or digit is correct.

    2. Document Analysis: The model could be applied for document analysis and capturing data from written or printed material, including books, bills, notes, letters, and more. The numbers and special characters capability could be used for capturing amounts, expressions, or nuances in the text.

    3. Accessibility Software: This model could be integrated into accessibility software applications aimed at assisting visually impaired individuals. It can analyze images or real-time video to read out the identified letters, figures, and special characters.

    4. License Plate Recognition: Given its ability to recognize a wide array of symbols, the model could be useful for extracting information from license plates, aiding in security and law enforcement settings.

    5. Handwritten Forms Processing: This computer vision model could be utilized to extract and categorize data from handwritten forms or applications, aiding in the automation of data entry tasks in various organizations.

  17. w

    National Panel Survey 2008-2015, Uniform Panel Dataset - Tanzania

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Mar 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (2021). National Panel Survey 2008-2015, Uniform Panel Dataset - Tanzania [Dataset]. https://microdata.worldbank.org/index.php/catalog/3814
    Explore at:
    Dataset updated
    Mar 17, 2021
    Dataset authored and provided by
    National Bureau of Statistics
    Time period covered
    2008 - 2015
    Area covered
    Tanzania
    Description

    Abstract

    Panel data possess several advantages over conventional cross-sectional and time-series data, including their power to isolate the effects of specific actions, treatments, and general policies often at the core of large-scale econometric development studies. While the concept of panel data alone provides the capacity for modeling the complexities of human behavior, the notion of universal panel data – in which time- and situation-driven variances leading to variations in tools, and thus results, are mitigated – can further enhance exploitation of the richness of panel information.

    This Basic Information Document (BID) provides a brief overview of the Tanzania National Panel Survey (NPS), but focuses primarily on the theoretical development and application of panel data, as well as key elements of the universal panel survey instrument and datasets generated by the four rounds of the NPS. As this Basic Information Document (BID) for the UPD does not describe in detail the background, development, or use of the NPS itself, the round-specific NPS BIDs should supplement the information provided here.

    The NPS Uniform Panel Dataset (UPD) consists of both survey instruments and datasets, meticulously aligned and engineered with the aim of facilitating the use of and improving access to the wealth of panel data offered by the NPS. The NPS-UPD provides a consistent and straightforward means of conducting not only user-driven analyses using convenient, standardized tools, but also for monitoring MKUKUTA, FYDP II, and other national level development indicators reported by the NPS.

    The design of the NPS-UPD combines the four completed rounds of the NPS – NPS 2008/09 (R1), NPS 2010/11 (R2), NPS 2012/13 (R3), and NPS 2014/15 (R4) – into pooled, module-specific survey instruments and datasets. The panel survey instruments offer the ease of comparability over time, with modifications and variances easily identifiable as well as those aspects of the questionnaire which have remained identical and offer consistent information. By providing all module-specific data over time within compact, pooled datasets, panel datasets eliminate the need for user-generated merges between rounds and present data in a clear, logical format, increasing both the usability and comprehension of complex data.

    Geographic coverage

    Designed for analysis of key indicators at four primary domains of inference, namely: Dar es Salaam, other urban, rural, Zanzibar.

    Analysis unit

    • Households
    • Individuals

    Universe

    The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    While the same sample of respondents was maintained over the first three rounds of the NPS, longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time; i.e. attrition. Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts was done for the NPS 2014/15 to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. A newly completed Population and Housing Census (PHC) in 2012, providing updated population figures along with changes in administrative boundaries, emboldened the opportunity to realign the NPS sample and abate collective bias potentially introduced through attrition.

    To maintain the panel concept of the NPS, the sample design for NPS 2014/2015 consisted of a combination of the original NPS sample and a new NPS sample. A nationally representative sub-sample was selected to continue as part of the “Extended Panel” while an entirely new sample, “Refresh Panel”, was selected to represent national and sub-national domains. Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The format of the NPS-UPD survey instrument is similar to previously disseminated NPS survey instruments. Each module has a questionnaire and clearly identifies if the module collects information at the individual or household level. Within each module-specific questionnaire of the NPS-UPD survey instrument, there are five distinct sections, arranged vertically: (1) the UPD - “U” on the survey instrument, (2) R4, (3), R3, (4) R2, and (5) R1 – the latter 4 sections presenting each questionnaire in its original form at time of its respective dissemination.

    The uppermost section of each module’s questionnaire (“U”) represents the model universal panel questionnaire, with questions generated from the comprehensive listing of questions across all four rounds of the NPS and codes generated from the comprehensive collection of codes. The following sections are arranged vertically by round, considering R4 as most recent. While not all rounds will have data reported for each question in the UPD and not each question will have reports for each of the UPD codes listed, the NPS-UPD survey instrument represents the visual, all-inclusive set of information collected by the NPS over time.

    The four round-specific sections (R4, R3, R2, R1) are aligned with their UPD-equivalent question, visually presenting their contribution to compatibility with the UPD. Each round-specific section includes the original round-specific variable names, response codes and skip patterns (corresponding to their respective round-specific NPS data sets, and despite their variance from other rounds or from the comprehensive UPD code listing)4.

  18. d

    Market Value Analysis 2015

    • catalog.data.gov
    • data.nola.gov
    • +1more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nola.gov (2025). Market Value Analysis 2015 [Dataset]. https://catalog.data.gov/dataset/market-value-analysis-2015
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.nola.gov
    Description

    Our Normative Assumptions when Analyzing Markets: • Public subsidy is scarce and it alone cannot create a market; • Public subsidy must be used to leverage, or clear the path for, private investment; • In distressed markets, invest into strength (e.g., major institutions, transportation hubs, environmental amenities) – “Build from Strength”; • All parts of a city are customers of the services and resources that it has to offer; • Decisions to invest and/or deploy governmental programs must be based on objectively gathered data and sound quantitative and qualitative analysis. Preparing the MVA:1. Take all of the data layers and geocode to Census block groups.2. Inspect and validate those data layers.3. Using a statistical cluster analysis, identify areas that share a common constellation of characteristics.4. Map the result.5. Visually inspect areas of the City for conformity with the statistical/spatial representation.6. Re-solve and re-inspect until we achieve an accurate representation.

  19. Visualization of networks – analyzing and visualizing connections between...

    • meta4ds.fokus.fraunhofer.de
    unknown, zip
    Updated Oct 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2021). Visualization of networks – analyzing and visualizing connections between (planned) NFDI consortia [Dataset]. https://meta4ds.fokus.fraunhofer.de/datasets/oai-zenodo-org-5548239?locale=en
    Explore at:
    zip(17160587), unknownAvailable download formats
    Dataset updated
    Oct 4, 2021
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    Description

    This is a fix-release for some broken links in the README. Thanks to @HenningTimm for the community-driven support. ------ Dorothea Strecker, Sama Majidian, Lukas C. Bossert, Évariste Demandt This repository contains materials used during the workshop "Visualization of networks – analyzing and visualizing connections between (planned) NFDI consortia" at the NFDI4Ing Community Meeting (NFDI4Ing Konferenz) 2021 on September 28. During the workshop, the network of (planned) NFDI consortia was visualized and analyzed using the statistical software R and the library igraph. Abstract: Currently, Germany's National Research Data Infrastructure spans a network of nine funded consortia from the first round and ten from the second round. This workshop enables you to visually display and analyze the network of consortia in your internet browser via a remote Jupyter Notebook. The workshop follows the tradition of literate programming. No prior experience in programming and no locally installed software needed – let's weave and tangle ! Slides The presentation slides for the workshop are stored in the file "NFDI4Ing_Community_Meeting_2021.pdf". JupyerNotebook for visualization of networks with R. In the interactive part of the workshop we worked with JupyterNotebooks. The documented sample solution is stored in various formats in the folder Notebook. Direct exports from JupyterNotebook are provided in the following formats: JupyterNotebook (R) PDF (via LuaLaTeX) org-mode Markdown Rscript Webpage WebSlides This repository is licensed under the MIT License.

  20. c

    Women National Basketball Association Shots Dataset

    • cubig.ai
    zip
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Women National Basketball Association Shots Dataset [Dataset]. https://cubig.ai/store/products/469/women-national-basketball-association-shots-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Women National Basketball Association Shots Dataset is a dataset that compiled a total of 41,497 attempted shots in the WNBA during the 2021–2022 season, including the game ID, type, success or failure, scoring value, spatial coordinates, team and score status, and remaining time per quarter.

    2) Data Utilization (1) Women National Basketball Association Shots Dataset has characteristics that: • This dataset is a mixture of categorical and numerical variables, and contains both the space and context information of the shot. (2) Women National Basketball Association Shots Dataset can be used to: • Predicting Shoot Success: By training a machine learning classification model with input of x·y coordinates and match situation information, we can predict the success or failure of each shot attempt. • Shoot Position Pattern Analysis: Create a heat map or contour plot based on coordinate_x and coordinate_y to visually analyze frequently attempted shoot positions and success patterns.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CUBIG (2025). Hacker News Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/586/hacker-news-sentiment-analysis-dataset

Hacker News Sentiment Analysis Dataset

Explore at:
zipAvailable download formats
Dataset updated
Jul 14, 2025
Dataset authored and provided by
CUBIG
License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description

1) Data Introduction • The Hacker News Sentiment Analysis Dataset is a technology community public opinion analysis data that provides an emotional analysis (polarity, subjectivity, and emotional categories) of each of the top 141 hacker news posts along with the title, URL, point, and comment count.

2) Data Utilization (1) Hacker News Sentiment Analysis Dataset has characteristics that: • This dataset includes polar (-1-1), subjectivity (0-1), and category (positive/neutral/negative) columns that quantify the sentiment of comments using TextBlob, based on the latest top posts as of June 24, 2025. • It is generated through web scraping and NLP preprocessing, and allows for quantitative comparison of community responses to technology news. (2) Hacker News Sentiment Analysis Dataset can be used to: • Visualize technology trends Emotional: Connect emotional scores with post topics to visually analyze community response patterns to specific technology news such as AI and policies. • NLP Model Learning: Emotional classification models can be trained using comment data with real-world technical discussions or applied to research on the subjectivity prediction of comments.

Search
Clear search
Close search
Google apps
Main menu