94 datasets found
  1. Google Certificate BellaBeats Capstone Project

    • kaggle.com
    zip
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
    Explore at:
    zip(169161 bytes)Available download formats
    Dataset updated
    Jan 5, 2023
    Authors
    Jason Porzelius
    Description

    Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

    Section 1 - Ask:

    A. Guiding Questions:
    1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

    B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

    1. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

    Section 2 - Prepare:

    A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

    B. Key Tasks:

    1. Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
      *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

    2. Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

  2. Z

    Example ScRNAseq Dataset 2 for Learning Web-based Tools

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yarlagadda, Sagnik; Giorgio, Todd D (2023). Example ScRNAseq Dataset 2 for Learning Web-based Tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8084705
    Explore at:
    Dataset updated
    Jun 29, 2023
    Dataset provided by
    Vanderbilt University
    Authors
    Yarlagadda, Sagnik; Giorgio, Todd D
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is one of the three example ScRNAseq datasets used to follow the guided example analyses within "A Guide to Single-Cell RNA Sequencing Analysis Using Web-based Tools for Non-Bioinformaticians" in the FEBS Journal. This dataset can be downloaded and imported into a variety of web-based tools and used as a learning device to gain more familiarity with the tools. As described in the paper, this dataset represents the negative control (carrier only).

  3. Predictive Modeling of E-Commerce Purchase Intent

    • kaggle.com
    zip
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adil Shamim (2025). Predictive Modeling of E-Commerce Purchase Intent [Dataset]. https://www.kaggle.com/datasets/adilshamim8/online
    Explore at:
    zip(273010 bytes)Available download formats
    Dataset updated
    May 3, 2025
    Authors
    Adil Shamim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Considering dataset description

    The user wants an improved description of the "online_shoppers_intention" dataset. I'll base the rewrite on metadata like 12,330 sessions, collected over a year, and incorporate details about the dataset’s purpose, data components, and possible uses. I could consider writing it as a cohesive paragraph or bullet points. I'll need to include citations too—let's search for further details on OpenML. Once I find that, I'll craft a clearer, more thorough description.

    Thought for 5 seconds

    Overview The Online Shoppers Purchasing Intention dataset captures 12,330 distinct web‐session records collected over a one‐year span from an e-commerce site, with each session belonging to a different visitor to prevent user‐ or campaign-specific biases. Originally published in 2017 and licensed under CC BY 4.0, it was curated by Sakar et al. for benchmarking classifiers on independent and identically distributed tabular data.

    Features

    • Numerical (10):

      • Administrative, Informational, ProductRelated (counts of pages visited) and their corresponding _Duration fields (total time in seconds spent on those pages).
      • BounceRates, ExitRates (average session‐level bounce and exit rates) and PageValues (average monetary value of pages preceding a purchase).
      • SpecialDay (normalized [0 – 1] indicator of how close the visit was to major shopping holidays, e.g. Valentine’s Day).
    • Categorical (7):

      • Month (Aug – Sep), OperatingSystems (8 codes), Browser (13 codes), Region (9 codes), TrafficType (20 codes), VisitorType (“New_Visitor,” “Returning_Visitor,” “Other”), and Weekend (True/False).

    Target and Class Distribution

    • Revenue (False/True) denotes whether the session ended in a purchase.
    • Of the 12,330 sessions, 84.5 % (10,422) did not result in revenue, while 15.5 % (1,908) did.

    Intended Use This dataset is ideal for developing and comparing binary classification models—ranging from multilayer perceptrons and LSTM networks to tree-based methods—to predict online purchasing intention in a controlled, time-invariant setting.

  4. D

    Healthcare Cloud Based Analytics Market Report | Global Forecast From 2025...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Healthcare Cloud Based Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-healthcare-cloud-based-analytics-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Healthcare Cloud Based Analytics Market Outlook



    The global healthcare cloud based analytics market size was valued at approximately USD 14.8 billion in 2023, and it is anticipated to reach around USD 54.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 15.7% from 2024 to 2032. One of the primary growth factors influencing this market is the increasing demand for data-driven decision-making processes in healthcare settings to enhance patient outcomes and operational efficiency.



    One significant growth factor for the healthcare cloud based analytics market is the rapid digital transformation within the healthcare sector. The transition from paper-based systems to electronic health records (EHRs) and the adoption of telehealth services are driving the need for sophisticated analytics solutions that can process vast amounts of healthcare data. The accessibility and scalability offered by cloud-based solutions make them particularly attractive for healthcare providers looking to leverage patient data for better diagnostic and treatment outcomes.



    Moreover, the rising focus on personalized medicine and the need for population health management are propelling the demand for healthcare cloud based analytics. Personalized medicine requires the analysis of large datasets to understand individual patient profiles and predict responses to treatments. Similarly, population health management aims to improve health outcomes by analyzing data to identify trends and intervene proactively. Cloud-based analytics platforms provide the necessary computational power and flexibility to handle these complex data requirements efficiently.



    The cost-efficiency of cloud based solutions compared to traditional on-premises systems is another crucial growth driver. Healthcare organizations are under constant pressure to reduce operational costs while improving patient care quality. Cloud-based analytics solutions eliminate the need for significant upfront investments in hardware and software while offering the benefits of scalable resources and reduced IT maintenance costs. This financial advantage is particularly appealing to small and medium-sized healthcare providers who may have limited budgets for technology investments.



    The integration of Business Intelligence in Healthcare is transforming the way data is utilized to improve patient care and streamline operations. By employing BI tools, healthcare organizations can analyze vast datasets to uncover insights that drive better decision-making. These tools enable healthcare providers to track patient outcomes, optimize resource allocation, and enhance overall operational efficiency. The ability to visualize data through dashboards and reports allows for a deeper understanding of patient trends and organizational performance, ultimately leading to improved healthcare delivery and patient satisfaction.



    From a regional perspective, North America currently holds the largest market share in the healthcare cloud based analytics market, driven by advanced healthcare infrastructure and high adoption rates of digital healthcare technologies. However, regions like Asia Pacific are expected to witness the highest growth rates during the forecast period. Factors such as increasing healthcare expenditures, growing awareness about the benefits of healthcare analytics, and supportive government initiatives are contributing to the market expansion in these regions.



    Component Analysis



    The healthcare cloud based analytics market can be segmented by component into software and services. The software segment includes various analytics platforms and tools designed to process and analyze healthcare data. These software solutions are essential for enabling healthcare providers to harness the power of big data and derive actionable insights. As the volume of healthcare data continues to grow exponentially, the demand for robust and scalable analytics software solutions is expected to increase significantly. Innovations in artificial intelligence and machine learning are also enhancing the capabilities of these software solutions, making them more effective in predictive analytics and decision support.



    Cloud Computing in Healthcare is revolutionizing the way healthcare data is stored, accessed, and analyzed. By leveraging cloud technology, healthcar

  5. Cloud-based User Entity Behavior Analytics Log Data Set

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger; Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger (2023). Cloud-based User Entity Behavior Analytics Log Data Set [Dataset]. http://doi.org/10.5281/zenodo.7119953
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 30, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger; Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1].

    The logs are provided in JSON format with the following attributes in the first level:

    • id: Unique log line identifier that starts at 1 and increases incrementally, e.g., 1.
    • time: Time stamp of the event in ISO format, e.g., 2021-01-01T00:00:02Z.
    • uid: Unique anonymized identifier for the user generating the event, e.g., old-pink-crane-sharedealer.
    • uidType: Specifier for uid, which is either the user name or IP address for logged out users.
    • type: The action carried out by the user, e.g., file_accessed.
    • params: Additional event parameters (e.g., paths, groups) stored in a nested dictionary.
    • isLocalIP: Optional flag for event origin, which is either internal (true) or external (false).
    • role: Optional user role: consulting, administration, management, sales, technical, or external.
    • location: Optional IP-based geolocation of event origin, including city, country, longitude, latitude, etc.

    In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:

    {"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}

    {"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"}

    The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers.

    The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo.

    Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set.

    If you use the dataset, please cite the following publication:

    [1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]

  6. f

    Data Sheet 2_Visual analysis of multi-omics data.csv

    • frontiersin.figshare.com
    csv
    Updated Sep 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Austin Swart; Ron Caspi; Suzanne Paley; Peter D. Karp (2024). Data Sheet 2_Visual analysis of multi-omics data.csv [Dataset]. http://doi.org/10.3389/fbinf.2024.1395981.s002
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 10, 2024
    Dataset provided by
    Frontiers
    Authors
    Austin Swart; Ron Caspi; Suzanne Paley; Peter D. Karp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present a tool for multi-omics data analysis that enables simultaneous visualization of up to four types of omics data on organism-scale metabolic network diagrams. The tool’s interactive web-based metabolic charts depict the metabolic reactions, pathways, and metabolites of a single organism as described in a metabolic pathway database for that organism; the charts are constructed using automated graphical layout algorithms. The multi-omics visualization facility paints each individual omics dataset onto a different “visual channel” of the metabolic-network diagram. For example, a transcriptomics dataset might be displayed by coloring the reaction arrows within the metabolic chart, while a companion proteomics dataset is displayed as reaction arrow thicknesses, and a complementary metabolomics dataset is displayed as metabolite node colors. Once the network diagrams are painted with omics data, semantic zooming provides more details within the diagram as the user zooms in. Datasets containing multiple time points can be displayed in an animated fashion. The tool will also graph data values for individual reactions or metabolites designated by the user. The user can interactively adjust the mapping from data value ranges to the displayed colors and thicknesses to provide more informative diagrams.

  7. D

    Dataset Versioning For Analytics Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Dataset Versioning For Analytics Market Research Report 2033 [Dataset]. https://dataintelo.com/report/dataset-versioning-for-analytics-market
    Explore at:
    pdf, csv, pptxAvailable download formats
    Dataset updated
    Oct 1, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Dataset Versioning for Analytics Market Outlook



    According to our latest research, the global dataset versioning for analytics market size reached USD 527.4 million in 2024. The market is experiencing robust expansion with a remarkable CAGR of 18.2% during the forecast period. By 2033, the market is projected to achieve a value of USD 2,330.6 million. This growth is primarily driven by the escalating demand for efficient data management, regulatory compliance, and the proliferation of AI and machine learning applications across diverse industries.




    The primary growth driver in the dataset versioning for analytics market is the exponential increase in data volume and complexity across organizations of all sizes. As enterprises continue to generate and utilize vast amounts of structured and unstructured data, the need for robust dataset versioning solutions has become imperative. These solutions enable organizations to track, manage, and analyze different versions of datasets, ensuring data integrity, reproducibility, and transparency throughout the analytics lifecycle. The surge in adoption of advanced analytics, machine learning, and artificial intelligence further amplifies the necessity for dataset versioning, as it facilitates the training, validation, and deployment of models with consistent and reliable data sources. In addition, the integration of dataset versioning tools with popular analytics platforms and cloud services has made these solutions more accessible and scalable, catering to the evolving needs of modern data-driven enterprises.




    Another significant factor fueling market growth is the rising emphasis on data governance and regulatory compliance across industries such as BFSI, healthcare, and government. Stringent regulations like GDPR, HIPAA, and CCPA mandate organizations to maintain accurate records of data usage, lineage, and modifications. Dataset versioning solutions play a pivotal role in helping organizations meet these compliance requirements by providing comprehensive audit trails, access controls, and data lineage tracking. This not only mitigates the risk of non-compliance penalties but also enhances organizational trust and credibility. Furthermore, the growing awareness about the strategic importance of data governance in driving business value and mitigating operational risks has prompted enterprises to invest in sophisticated dataset versioning tools, thereby propelling market expansion.




    The proliferation of cloud computing and the increasing adoption of hybrid and multi-cloud architectures are also contributing to the growth of the dataset versioning for analytics market. Cloud-based dataset versioning solutions offer unparalleled scalability, flexibility, and cost-efficiency, enabling organizations to manage and version datasets seamlessly across distributed environments. The shift towards cloud-native analytics and the integration of dataset versioning with cloud data lakes, warehouses, and analytics platforms have further accelerated market adoption. Additionally, advancements in automation, AI-driven data cataloging, and self-service analytics are enhancing the capabilities of dataset versioning tools, making them indispensable for organizations seeking to maximize the value of their data assets while minimizing operational complexities.




    From a regional perspective, North America continues to dominate the dataset versioning for analytics market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of major technology vendors, high adoption rates of advanced analytics, and a mature regulatory landscape. However, the Asia Pacific region is witnessing the fastest growth, driven by rapid digital transformation, increasing investments in AI and analytics, and the emergence of data-centric industries. Europe also holds a significant market share, supported by stringent data protection regulations and growing awareness about data governance. The Middle East & Africa and Latin America are gradually catching up, with increasing adoption of cloud-based analytics and regulatory initiatives promoting data management best practices.



    Component Analysis



    The dataset versioning for analytics market is segmented by component into software and services. The software segment holds the dominant share, driven by the widespread adoption of standalone and integrated dataset versioning platforms that cater to various data management and analytics requirements. These s

  8. Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and...

    • technavio.com
    pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2024). Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/cloud-analytics-market-industry-analysis
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    TechNavio
    Authors
    Technavio
    License

    https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

    Time period covered
    2024 - 2028
    Description

    Snapshot img

    Cloud Analytics Market Size 2024-2028

    The cloud analytics market size is forecast to increase by USD 74.08 billion at a CAGR of 24.4% between 2023 and 2028.

    The market is experiencing significant growth due to several key trends. The adoption of hybrid and multi-cloud setups is on the rise, as these configurations enhance data connectivity and flexibility. Another trend driving market growth is the increasing use of cloud security applications to safeguard sensitive data.
    However, concerns regarding confidential data security and privacy remain a challenge for market growth. Organizations must ensure robust security measures are in place to mitigate risks and maintain trust with their customers. Overall, the market is poised for continued expansion as businesses seek to leverage the benefits of cloud technologies for data processing and data analytics.
    

    What will be the Size of the Cloud Analytics Market During the Forecast Period?

    Request Free Sample

    The market is experiencing significant growth due to the increasing volume of data generated by businesses and the demand for advanced analytics solutions. Cloud-based analytics enables organizations to process and analyze large datasets from various data sources, including unstructured data, in real-time. This is crucial for businesses looking to make data-driven decisions and gain valuable insights to optimize their operations and meet customer requirements. Key industries such as sales and marketing, customer service, and finance are adopting cloud analytics to improve key performance indicators and gain a competitive edge. Both Small and Medium-sized Enterprises (SMEs) and large enterprises are embracing cloud analytics, with solutions available on private, public, and multi-cloud platforms.
    Big data technology, such as machine learning and artificial intelligence, are integral to cloud analytics, enabling advanced data analytics and business intelligence. Cloud analytics provides businesses with the flexibility to store and process data In the cloud, reducing the need for expensive on-premises data storage and computation. Hybrid environments are also gaining popularity, allowing businesses to leverage the benefits of both private and public clouds. Overall, the market is poised for continued growth as businesses increasingly rely on data-driven insights to inform their decision-making processes.
    

    How is this Cloud Analytics Industry segmented and which is the largest segment?

    The cloud analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2017-2022 for the following segments.

    Solution
    
      Hosted data warehouse solutions
      Cloud BI tools
      Complex event processing
      Others
    
    
    Deployment
    
      Public cloud
      Hybrid cloud
      Private cloud
    
    
    Geography
    
      North America
    
        US
    
    
      Europe
    
        Germany
        UK
    
    
      APAC
    
        China
        Japan
    
    
      Middle East and Africa
    
    
    
      South America
    

    By Solution Insights

    The hosted data warehouse solutions segment is estimated to witness significant growth during the forecast period.
    

    Hosted data warehouses enable organizations to centralize and analyze large datasets from multiple sources, facilitating advanced analytics solutions and real-time insights. By utilizing cloud-based infrastructure, businesses can reduce operational costs through eliminating licensing expenses, hardware investments, and maintenance fees. Additionally, cloud solutions offer network security measures, such as Software Defined Networking and Network integration, ensuring data protection. Cloud analytics caters to diverse industries, including SMEs and large enterprises, addressing requirements for sales and marketing, customer service, and key performance indicators. Advanced analytics capabilities, including predictive analytics, automated decision making, and fraud prevention, are essential for data-driven decision making and business optimization.

    Furthermore, cloud platforms provide access to specialized talent, big data technology, and AI, enhancing customer experiences and digital business opportunities. Data connectivity and data processing in real-time are crucial for network agility and application performance. Hosted data warehouses offer computational power and storage capabilities, ensuring efficient data utilization and enterprise information management. Cloud service providers offer various cloud environments, including private, public, multi-cloud, and hybrid, catering to diverse business needs. Compliance and security concerns are addressed through cybersecurity frameworks and data security measures, ensuring data breaches and thefts are minimized.

    Get a glance at the Cloud Analytics Industry report of share of various segments Request Free Sample

    The Hosted data warehouse solutions s

  9. m

    Data from: Cyber Attack Evaluation Dataset for Deep Packet Inspection and...

    • data.mendeley.com
    Updated Oct 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shishir Kumar Shandilya (2022). Cyber Attack Evaluation Dataset for Deep Packet Inspection and Analysis [Dataset]. http://doi.org/10.17632/3szjvt3w78.1
    Explore at:
    Dataset updated
    Oct 18, 2022
    Authors
    Shishir Kumar Shandilya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To determine the effectiveness of any defense mechanism, there is a need for comprehensive real-time network data that solely references various attack scenarios based on older software versions or unprotected ports, and so on. This presented dataset has entire network data at the time of several cyber attacks to enable experimentation on challenges based on implementing defense mechanisms on a larger scale. For collecting the data, we captured the network traffic of configured virtual machines using Wireshark and tcpdump. To analyze the impact of several cyber attack scenarios, this dataset presents a set of ten computers connected to Router1 on VLAN1 in a Docker Bridge network, that try and exploit each other. It includes browsing the web and downloading foreign packages including malicious ones. Also, services like FTP and SSH were exploited using several attack mechanisms. The presented dataset shows the importance of updating and patching systems to protect themselves to a greater extent, by following attack tactics on older versions of packages as compared to the newer and updated ones. This dataset also includes an Apache Server hosted on the different subset on VLAN2 which is connected to the VLAN1 to demonstrate isolation and cross-VLAN communication. The services on this web server were also exploited by the previously stated ten computers. The attack types include: Distributed Denial of Service, SQL Injection, Account Takeover, Service Exploitation (SSH, FTP), DNS and ARP Spoofing, Scanning and Firewall Searching and Indexing (using Nmap), Hammering the services to brute-force passwords and usernames, Malware attack, Spoofing and Man-in-the-Middle Attack. The attack scenarios also show various scanning mechanisms and the impact of Insider Threats on the entire network.

  10. w

    Global Real-Time Index Database Market Research Report: By End Use...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Real-Time Index Database Market Research Report: By End Use (Financial Services, Healthcare, Telecommunications, Retail, Government), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By Database Type (Relational Database, NoSQL Database, Time-Series Database), By Application (Data Analytics, Real-Time Monitoring, Predictive Analysis, Reporting) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/real-time-index-database-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20242.29(USD Billion)
    MARKET SIZE 20252.49(USD Billion)
    MARKET SIZE 20355.8(USD Billion)
    SEGMENTS COVEREDEnd Use, Deployment Type, Database Type, Application, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSgrowing demand for real-time analytics, increasing data volume and variety, rising cloud adoption trends, need for enhanced decision-making, regulatory compliance and data governance
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDNasdaq, Fitch Ratings, Tickdata, Thomson Reuters, MSCI, St. Louis Federal Reserve, FTSE Russell, Bloomberg, Morningstar, IHS Markit, S&P Dow Jones Indices, FactSet, S&P Global, Refinitiv
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESCloud-based solutions integration, Enhanced data analytics capabilities, Adoption in fintech applications, Real-time data accessibility demands, Rising importance of accurate indexing.
    COMPOUND ANNUAL GROWTH RATE (CAGR) 8.8% (2025 - 2035)
  11. d

    Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...

    • datarade.ai
    .json, .csv
    Updated Aug 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2024). Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-global-social-media-data-1-1m-mill-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset authored and provided by
    Dataplex
    Area covered
    Mexico, Jersey, Holy See, Botswana, Gambia, Macao, Christmas Island, Chile, Côte d'Ivoire, Martinique
    Description

    The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

    Dataset Overview:

    This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

    2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

    Sourced Directly from Reddit:

    All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

    Key Features:

    • Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.
    • User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.
    • Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.
    • AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

    Use Cases:

    • Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.
    • Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.
    • Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.
    • Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

    Data Quality and Reliability:

    The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

    Integration and Usability:

    The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

    User-Friendly Structure and Metadata:

    The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

    Ideal For:

    • Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.
    • Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.
    • Researchers: Explore the social dynamics of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

    This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...

  12. E-Commerce Customer Behavior & Sales Analysis -TR

    • kaggle.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UmutUygurr (2025). E-Commerce Customer Behavior & Sales Analysis -TR [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/e-commerce-customer-behavior-and-sales-analysis-tr
    Explore at:
    zip(138245 bytes)Available download formats
    Dataset updated
    Oct 29, 2025
    Authors
    UmutUygurr
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🛒 E-Commerce Customer Behavior and Sales Dataset 📊 Dataset Overview This comprehensive dataset contains 5,000 e-commerce transactions from a Turkish online retail platform, spanning from January 2023 to March 2024. The dataset provides detailed insights into customer demographics, purchasing behavior, product preferences, and engagement metrics.

    🎯 Use Cases This dataset is perfect for:

    Customer Segmentation Analysis: Identify distinct customer groups based on behavior Sales Forecasting: Predict future sales trends and patterns Recommendation Systems: Build product recommendation engines Customer Lifetime Value (CLV) Prediction: Estimate customer value Churn Analysis: Identify customers at risk of leaving Marketing Campaign Optimization: Target customers effectively Price Optimization: Analyze price sensitivity across categories Delivery Performance Analysis: Optimize logistics and shipping 📁 Dataset Structure The dataset contains 18 columns with the following features:

    Order Information Order_ID: Unique identifier for each order (ORD_XXXXXX format) Date: Transaction date (2023-01-01 to 2024-03-26) Customer Demographics Customer_ID: Unique customer identifier (CUST_XXXXX format) Age: Customer age (18-75 years) Gender: Customer gender (Male, Female, Other) City: Customer city (10 major Turkish cities) Product Information Product_Category: 8 categories (Electronics, Fashion, Home & Garden, Sports, Books, Beauty, Toys, Food) Unit_Price: Price per unit (in TRY/Turkish Lira) Quantity: Number of units purchased (1-5) Transaction Details Discount_Amount: Discount applied (if any) Total_Amount: Final transaction amount after discount Payment_Method: Payment method used (5 types) Customer Behavior Metrics Device_Type: Device used for purchase (Mobile, Desktop, Tablet) Session_Duration_Minutes: Time spent on website (1-120 minutes) Pages_Viewed: Number of pages viewed during session (1-50) Is_Returning_Customer: Whether customer has purchased before (True/False) Post-Purchase Metrics Delivery_Time_Days: Delivery duration (1-30 days) Customer_Rating: Customer satisfaction rating (1-5 stars) 📈 Key Statistics Total Records: 5,000 transactions Date Range: January 2023 - March 2024 (15 months) Average Transaction Value: ~450 TRY Customer Satisfaction: 3.9/5.0 average rating Returning Customer Rate: 60% Mobile Usage: 55% of transactions 🔍 Data Quality ✅ No missing values ✅ Consistent formatting across all fields ✅ Realistic data distributions ✅ Proper data types for all columns ✅ Logical relationships between features 💡 Sample Analysis Ideas Customer Segmentation with K-Means Clustering

    Segment customers based on spending, frequency, and recency Sales Trend Analysis

    Identify seasonal patterns and peak shopping periods Product Category Performance

    Compare revenue, ratings, and return rates across categories Device-Based Behavior Analysis

    Understand how device choice affects purchasing patterns Predictive Modeling

    Build models to predict customer ratings or purchase amounts City-Level Market Analysis

    Compare market performance across different cities 🛠️ Technical Details File Format: CSV (Comma-Separated Values) Encoding: UTF-8 File Size: ~500 KB Delimiter: Comma (,) 📚 Column Descriptions Column Name Data Type Description Example Order_ID String Unique order identifier ORD_001337 Customer_ID String Unique customer identifier CUST_01337 Date DateTime Transaction date 2023-06-15 Age Integer Customer age 35 Gender String Customer gender Female City String Customer city Istanbul Product_Category String Product category Electronics Unit_Price Float Price per unit 1299.99 Quantity Integer Units purchased 2 Discount_Amount Float Discount applied 129.99 Total_Amount Float Final amount paid 2469.99 Payment_Method String Payment method Credit Card Device_Type String Device used Mobile Session_Duration_Minutes Integer Session time 15 Pages_Viewed Integer Pages viewed 8 Is_Returning_Customer Boolean Returning customer True Delivery_Time_Days Integer Delivery duration 3 Customer_Rating Integer Satisfaction rating 5 🎓 Learning Outcomes By working with this dataset, you can learn:

    Data cleaning and preprocessing techniques Exploratory Data Analysis (EDA) with Python/R Statistical analysis and hypothesis testing Machine learning model development Data visualization best practices Business intelligence and reporting 📝 Citation If you use this dataset in your research or project, please cite:

    E-Commerce Customer Behavior and Sales Dataset (2024) Turkish Online Retail Platform Data (2023-2024) Available on Kaggle ⚖️ License This dataset is released under the CC0: Public Domain license. You are free to use it for any purpose.

    🤝 Contribution Found any issues or have suggestions? Feel free to provide feedback!

    📞 Contact For questions or collaborations, please reach out through Kaggle.

    Happy Analyzing! 🚀

    Keywords: e-c...

  13. EnviroAtlas - NatureServe Analysis of Imperiled or Federally Listed Species...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Jul 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Office of Research and Development-Sustainable and Healthy Communities Research Program, EnviroAtlas (Point of Contact) (2025). EnviroAtlas - NatureServe Analysis of Imperiled or Federally Listed Species by HUC-12 for the Conterminous United States [Dataset]. https://catalog.data.gov/dataset/enviroatlas-natureserve-analysis-of-imperiled-or-federally-listed-species-by-huc-12-for-the-con4
    Explore at:
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Contiguous United States, United States
    Description

    This EnviroAtlas dataset includes analysis by NatureServe of species that are Imperiled (G1/G2) or Listed under the U.S. Endangered Species Act (ESA) by 12-digit Hydrologic Units (HUCs). The analysis results are for use and publication by both the LandScope America website and by the EnviroAtlas. Results are provided for the total number of Aquatic Associated G1-G2/ESA species, the total number of Wetland Associated G1-G2/ESA species, the total number of Terrestrial Associated G1-G2/ESA species, and the total number of Unknown Habitat Association G1-G2/ESA species in each HUC12. NatureServe is a non-profit organization dedicated to developing and providing information about the world's plants, animals, and ecological communities. NatureServe works in partnership with 82 independent Natural Heritage programs and Conservation Data Centers that gather scientific information on rare species and ecosystems in the United States, Latin America, and Canada (the Natural Heritage Network). NatureServe is a leading source for biodiversity information that is essential for effective conservation action. This dataset was produced by NatureServe to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).

  14. H

    Data from: An emotion analysis dataset of course comment texts in massive...

    • dataverse.harvard.edu
    Updated Sep 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Feng; Keyi Yuan; Xiu Guan; Longhui Qiu (2022). An emotion analysis dataset of course comment texts in massive online learning course platforms [Dataset]. http://doi.org/10.7910/DVN/LC6GHO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 26, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Xiang Feng; Keyi Yuan; Xiu Guan; Longhui Qiu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Datasets are critical for emotion analysis in the machine learning field. This study aims to explore emotion analysis datasets and related benchmarks in online learning, since, currently, there are very few studies that explore the same. We have scientifically labeled the topic and nine-category emotion of 4715 comment texts in online learning platforms using the “three-person voting label method” based on the “sentence-level” and multi-category labeling dimensions with our self-developed system. After testing the consistency of the labeling results using the Fleiss Kappa method, we found that the consistency of the dataset was about 0.51, representing a moderate strength of agreement. Based on the dataset, the prediction accuracy of the Long-Short Term Memory (LSTM) method is about 0.68. This dataset provides a benchmark for the multi- category emotion dataset in the Chinese online learning field. It can provide a basis for the subsequent solution of emotion analysis, monitoring, and intervention in the education field. It can also provide a reference for constructing subsequent datasets in the education field. We need to remind you that this is a Chinese dataset. If you want to use this dataset, please contact the author and you should request for the dataset below.

  15. I

    Self-citation analysis data based on PubMed Central subset (2002-2005)

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik, Self-citation analysis data based on PubMed Central subset (2002-2005) [Dataset]. http://doi.org/10.13012/B2IDB-9665377_V1
    Explore at:
    Authors
    Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    U.S. National Science Foundation (NSF)
    Description

    Self-citation analysis data based on PubMed Central subset (2002-2005) ---------------------------------------------------------------------- Created by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik on April 5th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-Citation is the Hallmark of Productive Authors, of Any Gender. PLOS ONE. It contains files for running the self citation analysis on articles published in PubMed Central between 2002 and 2005, collected in 2015. The dataset is distributed in the form of the following tab separated text files: * Training_data_2002_2005_pmc_pair_First.txt (1.2G) - Data for first authors * Training_data_2002_2005_pmc_pair_Last.txt (1.2G) - Data for last authors * Training_data_2002_2005_pmc_pair_Middle_2nd.txt (964M) - Data for middle 2nd authors * Training_data_2002_2005_pmc_pair_txt.header.txt - Header for the data * COLUMNS_DESC.txt file - Descriptions of all columns * model_text_files.tar.gz - Text files containing model coefficients and scores for model selection. * results_all_model.tar.gz - Model coefficient and result files in numpy format used for plotting purposes. v4.reviewer contains models for analysis done after reviewer comments. * README.txt file ## Dataset creation Our experiments relied on data from multiple sources including properitery data from Thompson Rueter's (now Clarivate Analytics) Web of Science collection of MEDLINE citations. Author's interested in reproducing our experiments should personally request from Clarivate Analytics for this data. However, we do make a similar but open dataset based on citations from PubMed Central which can be utilized to get similar results to those reported in our analysis. Furthermore, we have also freely shared our datasets which can be used along with the citation datasets from Clarivate Analytics, to re-create the datased used in our experiments. These datasets are listed below. If you wish to use any of those datasets please make sure you cite both the dataset as well as the paper introducing the dataset. * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * Citation data from PubMed Central (original paper includes additional citations from Web of Science) * Author-ity 2009 dataset: - Dataset citation: Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1 - Paper citation: Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304 - Paper citation: Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2004). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158. https://doi.org/10.1002/asi.20105 * Genni 2.0 + Ethnea for identifying author gender and ethnicity: - Dataset citation: Torvik, Vetle (2018): Genni + Ethnea for the Author-ity 2009 dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9087546_V1 - Paper citation: Smith, B. N., Singh, M., & Torvik, V. I. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’13. ACM Press. https://doi.org/10.1145/2467696.2467720 - Paper citation: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA. http://hdl.handle.net/2142/88927 * MapAffil for identifying article country of affiliation: - Dataset citation: Torvik, Vetle I. (2018): MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1 - Paper citation: Torvik VI. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. D-Lib magazine : the magazine of the Digital Library Forum. 2015;21(11-12):10.1045/november2015-torvik * IMPLICIT journal similarity: - Dataset citation: Torvik, Vetle (2018): Author-implicit journal, MeSH, title-word, and affiliation-word pairs based on Author-ity 2009. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4742014_V1 * Novelty dataset for identify article level novelty: - Dataset citation: Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1 - Paper citation: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : The Magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra - Code: https://github.com/napsternxg/Novelty * Expertise dataset for identifying author expertise on articles: * Source code provided at: https://github.com/napsternxg/PubMed_SelfCitationAnalysis Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions Additional data related updates can be found at Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Self-citation analysis data based on PubMed Central subset (2002-2005) by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/PubMed_SelfCitationAnalysis.

  16. d

    Data from: Detecting and quantifying social transmission using network-based...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Aug 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Hasenjager; Ellouise Leadbeater; William Hoppitt (2020). Detecting and quantifying social transmission using network-based diffusion analysis [Dataset]. http://doi.org/10.5061/dryad.280gb5mnj
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 21, 2020
    Dataset provided by
    Dryad
    Authors
    Matthew Hasenjager; Ellouise Leadbeater; William Hoppitt
    Time period covered
    Jul 7, 2020
    Description

    Annotated tutorials and example code are provided describing the use of these data.

  17. w

    Global NoSQL Database Market Research Report: By Database Type (Document...

    • wiseguyreports.com
    Updated Sep 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global NoSQL Database Market Research Report: By Database Type (Document Store, Key-Value Store, Column Store, Graph Database), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User Industry (IT and Telecommunications, Retail, Healthcare, Banking and Financial Services), By Application (Real-Time Big Data Analytics, Content Management, Mobile Applications, Internet of Things) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/nosql-database-market
    Explore at:
    Dataset updated
    Sep 27, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20247.18(USD Billion)
    MARKET SIZE 20257.89(USD Billion)
    MARKET SIZE 203520.0(USD Billion)
    SEGMENTS COVEREDDatabase Type, Deployment Type, End User Industry, Application, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSScalability and Flexibility, Real-time Data Processing, Increased Cloud Adoption, Big Data Integration, Cost-effective Solutions
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDDataStax, Microsoft, Amazon Web Services, Teradata, Aerospike, MongoDB, Berkeley DB, Google, MarkLogic, IBM, Redis Labs, Couchbase, Cassandra, CouchDB, Oracle
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESCloud-based database solutions, Increasing demand for big data analytics, Integration with AI and machine learning, Growing adoption in IoT applications, Enhanced scalability for multi-cloud environments
    COMPOUND ANNUAL GROWTH RATE (CAGR) 9.8% (2025 - 2035)
  18. IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Aug 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338
    Explore at:
    Dataset updated
    Aug 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Article Information

    The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

    Please do cite the aforementioned article when using this dataset.

    Abstract

    The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

    ZIP Folder Content

    The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

    To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

    This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

    Datasets' Content

    Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

    Identified Key Features Within Bluetooth Dataset

    FeatureMeaning
    btle.advertising_headerBLE Advertising Packet Header
    btle.advertising_header.ch_selBLE Advertising Channel Selection Algorithm
    btle.advertising_header.lengthBLE Advertising Length
    btle.advertising_header.pdu_typeBLE Advertising PDU Type
    btle.advertising_header.randomized_rxBLE Advertising Rx Address
    btle.advertising_header.randomized_txBLE Advertising Tx Address
    btle.advertising_header.rfu.1Reserved For Future 1
    btle.advertising_header.rfu.2Reserved For Future 2
    btle.advertising_header.rfu.3Reserved For Future 3
    btle.advertising_header.rfu.4Reserved For Future 4
    btle.control.instantInstant Value Within a BLE Control Packet
    btle.crc.incorrectIncorrect CRC
    btle.extended_advertisingAdvertiser Data Information
    btle.extended_advertising.didAdvertiser Data Identifier
    btle.extended_advertising.sidAdvertiser Set Identifier
    btle.lengthBLE Length
    frame.cap_lenFrame Length Stored Into the Capture File
    frame.interface_idInterface ID
    frame.lenFrame Length Wire
    nordic_ble.board_idBoard ID
    nordic_ble.channelChannel Index
    nordic_ble.crcokIndicates if CRC is Correct
    nordic_ble.flagsFlags
    nordic_ble.packet_counterPacket Counter
    nordic_ble.packet_timePacket time (start to end)
    nordic_ble.phyPHY
    nordic_ble.protoverProtocol Version

    Identified Key Features Within IP-Based Packets Dataset

    FeatureMeaning
    http.content_lengthLength of content in an HTTP response
    http.requestHTTP request being made
    http.response.codeSequential number of an HTTP response
    http.response_numberSequential number of an HTTP response
    http.timeTime taken for an HTTP transaction
    tcp.analysis.initial_rttInitial round-trip time for TCP connection
    tcp.connection.finTCP connection termination with a FIN flag
    tcp.connection.synTCP connection initiation with SYN flag
    tcp.connection.synackTCP connection establishment with SYN-ACK flags
    tcp.flags.cwrCongestion Window Reduced flag in TCP
    tcp.flags.ecnExplicit Congestion Notification flag in TCP
    tcp.flags.finFIN flag in TCP
    tcp.flags.nsNonce Sum flag in TCP
    tcp.flags.resReserved flags in TCP
    tcp.flags.synSYN flag in TCP
    tcp.flags.urgUrgent flag in TCP
    tcp.urgent_pointerPointer to urgent data in TCP
    ip.frag_offsetFragment offset in IP packets
    eth.dst.igEthernet destination is in the internal network group
    eth.src.igEthernet source is in the internal network group
    eth.src.lgEthernet source is in the local network group
    eth.src_not_groupEthernet source is not in any network group
    arp.isannouncementIndicates if an ARP message is an announcement

    Identified Key Features Within IP-Based Flows Dataset

    FeatureMeaning
    protoTransport layer protocol of the connection
    serviceIdentification of an application protocol
    orig_bytesOriginator payload bytes
    resp_bytesResponder payload bytes
    historyConnection state history
    orig_pktsOriginator sent packets
    resp_pktsResponder sent packets
    flow_durationLength of the flow in seconds
    fwd_pkts_totForward packets total
    bwd_pkts_totBackward packets total
    fwd_data_pkts_totForward data packets total
    bwd_data_pkts_totBackward data packets total
    fwd_pkts_per_secForward packets per second
    bwd_pkts_per_secBackward packets per second
    flow_pkts_per_secFlow packets per second
    fwd_header_sizeForward header bytes
    bwd_header_sizeBackward header bytes
    fwd_pkts_payloadForward payload bytes
    bwd_pkts_payloadBackward payload bytes
    flow_pkts_payloadFlow payload bytes
    fwd_iatForward inter-arrival time
    bwd_iatBackward inter-arrival time
    flow_iatFlow inter-arrival time
    activeFlow active duration
  19. f

    Data from: CloMet: A Novel Open-Source and Modular Software Platform That...

    • acs.figshare.com
    xlsx
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordi Rodeiro; Ester Vidaña-Vila; Joan Navarro; Roger Mallol (2024). CloMet: A Novel Open-Source and Modular Software Platform That Connects Established Metabolomics Repositories and Data Analysis Resources [Dataset]. http://doi.org/10.1021/acs.jproteome.2c00602.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    ACS Publications
    Authors
    Jordi Rodeiro; Ester Vidaña-Vila; Joan Navarro; Roger Mallol
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The field of metabolomics has witnessed the development of hundreds of computational tools, but only a few have become cornerstones of this field. While MetaboLights and Metabolomics Workbench are two well-established data repositories for metabolomics data sets, Workflows4Metabolomics and MetaboAnalyst are two well-established web-based data analysis platforms for metabolomics. Yet, the raw data stored in the aforementioned repositories lack standardization in terms of the file system format used to store the associated acquisition files. Consequently, it is not straightforward to reuse available data sets as input data in the above-mentioned data analysis resources, especially for non-expert users. This paper presents CloMet, a novel open-source modular software platform that contributes to standardization, reusability, and reproducibility in the metabolomics field. CloMet, which is available through a Docker file, converts raw and NMR-based metabolomics data from MetaboLights and Metabolomics Workbench to a file format that can be used directly either in MetaboAnalyst or in Workflows4Metabolomics. We validated both CloMet and the output data using data sets from these repositories. Overall, CloMet fills the gap between well-established data repositories and web-based statistical platforms and contributes to the consolidation of a data-driven perspective of the metabolomics field by leveraging and connecting existing data and resources.

  20. m

    Cloud-based Database Market Industry Size, Share & Growth Analysis 2033

    • marketresearchintellect.com
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Intellect (2025). Cloud-based Database Market Industry Size, Share & Growth Analysis 2033 [Dataset]. https://www.marketresearchintellect.com/product/global-cloud-based-database-market-size-and-forecast/
    Explore at:
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Market Research Intellect
    License

    https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy

    Area covered
    Global
    Description

    Learn more about the Cloud-based Database Market Report by Market Research Intellect, which stood at USD 10.5 billion in 2024 and is forecast to expand to USD 25.0 billion by 2033, growing at a CAGR of 10.5%.Discover how new strategies, rising investments, and top players are shaping the future.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
Organization logo

Google Certificate BellaBeats Capstone Project

Explore at:
zip(169161 bytes)Available download formats
Dataset updated
Jan 5, 2023
Authors
Jason Porzelius
Description

Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

Section 1 - Ask:

A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

  1. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

Section 2 - Prepare:

A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

B. Key Tasks:

  1. Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
    *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

  2. Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

Search
Clear search
Close search
Google apps
Main menu