94 datasets found

Google Certificate BellaBeats Capstone Project
kaggle.com
zip
Updated Jan 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project
Explore at:
zip(169161 bytes)Available download formats
Dataset updated
Jan 5, 2023
Authors
Jason Porzelius
Description
Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

Section 1 - Ask:

A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

Section 2 - Prepare:

A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

B. Key Tasks:

Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv

Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...
Z
Example ScRNAseq Dataset 2 for Learning Web-based Tools
data.niaid.nih.gov
zenodo.org
Updated Jun 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yarlagadda, Sagnik; Giorgio, Todd D (2023). Example ScRNAseq Dataset 2 for Learning Web-based Tools [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8084705
Explore at:
Dataset updated
Jun 29, 2023
Dataset provided by
Vanderbilt University
Authors
Yarlagadda, Sagnik; Giorgio, Todd D
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is one of the three example ScRNAseq datasets used to follow the guided example analyses within "A Guide to Single-Cell RNA Sequencing Analysis Using Web-based Tools for Non-Bioinformaticians" in the FEBS Journal. This dataset can be downloaded and imported into a variety of web-based tools and used as a learning device to gain more familiarity with the tools. As described in the paper, this dataset represents the negative control (carrier only).
Predictive Modeling of E-Commerce Purchase Intent
kaggle.com
zip
Updated May 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Shamim (2025). Predictive Modeling of E-Commerce Purchase Intent [Dataset]. https://www.kaggle.com/datasets/adilshamim8/online
Explore at:
zip(273010 bytes)Available download formats
Dataset updated
May 3, 2025
Authors
Adil Shamim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Considering dataset description

The user wants an improved description of the "online_shoppers_intention" dataset. I'll base the rewrite on metadata like 12,330 sessions, collected over a year, and incorporate details about the dataset’s purpose, data components, and possible uses. I could consider writing it as a cohesive paragraph or bullet points. I'll need to include citations too—let's search for further details on OpenML. Once I find that, I'll craft a clearer, more thorough description.

Thought for 5 seconds

Overview The Online Shoppers Purchasing Intention dataset captures 12,330 distinct web‐session records collected over a one‐year span from an e-commerce site, with each session belonging to a different visitor to prevent user‐ or campaign-specific biases. Originally published in 2017 and licensed under CC BY 4.0, it was curated by Sakar et al. for benchmarking classifiers on independent and identically distributed tabular data.

Features

Numerical (10):

Administrative, Informational, ProductRelated (counts of pages visited) and their corresponding _Duration fields (total time in seconds spent on those pages).

BounceRates, ExitRates (average session‐level bounce and exit rates) and PageValues (average monetary value of pages preceding a purchase).

SpecialDay (normalized [0 – 1] indicator of how close the visit was to major shopping holidays, e.g. Valentine’s Day).

Categorical (7):

Month (Aug – Sep), OperatingSystems (8 codes), Browser (13 codes), Region (9 codes), TrafficType (20 codes), VisitorType (“New_Visitor,” “Returning_Visitor,” “Other”), and Weekend (True/False).

Target and Class Distribution

Revenue (False/True) denotes whether the session ended in a purchase.

Of the 12,330 sessions, 84.5 % (10,422) did not result in revenue, while 15.5 % (1,908) did.

Intended Use This dataset is ideal for developing and comparing binary classification models—ranging from multilayer perceptrons and LSTM networks to tree-based methods—to predict online purchasing intention in a controlled, time-invariant setting.
D
Healthcare Cloud Based Analytics Market Report | Global Forecast From 2025...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Healthcare Cloud Based Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-healthcare-cloud-based-analytics-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Healthcare Cloud Based Analytics Market Outlook

The global healthcare cloud based analytics market size was valued at approximately USD 14.8 billion in 2023, and it is anticipated to reach around USD 54.3 billion by 2032, growing at a compound annual growth rate (CAGR) of 15.7% from 2024 to 2032. One of the primary growth factors influencing this market is the increasing demand for data-driven decision-making processes in healthcare settings to enhance patient outcomes and operational efficiency.

One significant growth factor for the healthcare cloud based analytics market is the rapid digital transformation within the healthcare sector. The transition from paper-based systems to electronic health records (EHRs) and the adoption of telehealth services are driving the need for sophisticated analytics solutions that can process vast amounts of healthcare data. The accessibility and scalability offered by cloud-based solutions make them particularly attractive for healthcare providers looking to leverage patient data for better diagnostic and treatment outcomes.

Moreover, the rising focus on personalized medicine and the need for population health management are propelling the demand for healthcare cloud based analytics. Personalized medicine requires the analysis of large datasets to understand individual patient profiles and predict responses to treatments. Similarly, population health management aims to improve health outcomes by analyzing data to identify trends and intervene proactively. Cloud-based analytics platforms provide the necessary computational power and flexibility to handle these complex data requirements efficiently.

The cost-efficiency of cloud based solutions compared to traditional on-premises systems is another crucial growth driver. Healthcare organizations are under constant pressure to reduce operational costs while improving patient care quality. Cloud-based analytics solutions eliminate the need for significant upfront investments in hardware and software while offering the benefits of scalable resources and reduced IT maintenance costs. This financial advantage is particularly appealing to small and medium-sized healthcare providers who may have limited budgets for technology investments.

The integration of Business Intelligence in Healthcare is transforming the way data is utilized to improve patient care and streamline operations. By employing BI tools, healthcare organizations can analyze vast datasets to uncover insights that drive better decision-making. These tools enable healthcare providers to track patient outcomes, optimize resource allocation, and enhance overall operational efficiency. The ability to visualize data through dashboards and reports allows for a deeper understanding of patient trends and organizational performance, ultimately leading to improved healthcare delivery and patient satisfaction.

From a regional perspective, North America currently holds the largest market share in the healthcare cloud based analytics market, driven by advanced healthcare infrastructure and high adoption rates of digital healthcare technologies. However, regions like Asia Pacific are expected to witness the highest growth rates during the forecast period. Factors such as increasing healthcare expenditures, growing awareness about the benefits of healthcare analytics, and supportive government initiatives are contributing to the market expansion in these regions.

Component Analysis

The healthcare cloud based analytics market can be segmented by component into software and services. The software segment includes various analytics platforms and tools designed to process and analyze healthcare data. These software solutions are essential for enabling healthcare providers to harness the power of big data and derive actionable insights. As the volume of healthcare data continues to grow exponentially, the demand for robust and scalable analytics software solutions is expected to increase significantly. Innovations in artificial intelligence and machine learning are also enhancing the capabilities of these software solutions, making them more effective in predictive analytics and decision support.

Cloud Computing in Healthcare is revolutionizing the way healthcare data is stored, accessed, and analyzed. By leveraging cloud technology, healthcar
Cloud-based User Entity Behavior Analytics Log Data Set
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger; Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger (2023). Cloud-based User Entity Behavior Analytics Log Data Set [Dataset]. http://doi.org/10.5281/zenodo.7119953
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7119953
Dataset updated
Oct 30, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger; Max Landauer; Florian Skopik; Georg Höld; Markus Wurzenberger
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This respository contains the CLUE-LDS (CLoud-based User Entity behavior analytics Log Data Set). The data set contains log events from real users utilizing a cloud storage suitable for User Entity Behavior Analytics (UEBA). Events include logins, file accesses, link shares, config changes, etc. The data set contains around 50 million events generated by more than 5000 distinct users in more than five years (2017-07-07 to 2022-09-29 or 1910 days). The data set is complete except for 109 events missing on 2021-04-22, 2021-08-20, and 2021-09-05 due to database failure. The unpacked file size is around 14.5 GB. A detailed analysis of the data set is provided in [1].
The logs are provided in JSON format with the following attributes in the first level:
id: Unique log line identifier that starts at 1 and increases incrementally, e.g., 1.
time: Time stamp of the event in ISO format, e.g., 2021-01-01T00:00:02Z.
uid: Unique anonymized identifier for the user generating the event, e.g., old-pink-crane-sharedealer.
uidType: Specifier for uid, which is either the user name or IP address for logged out users.
type: The action carried out by the user, e.g., file_accessed.
params: Additional event parameters (e.g., paths, groups) stored in a nested dictionary.
isLocalIP: Optional flag for event origin, which is either internal (true) or external (false).
role: Optional user role: consulting, administration, management, sales, technical, or external.
location: Optional IP-based geolocation of event origin, including city, country, longitude, latitude, etc.
In the following data sample, the first object depicts a successful user login (see type: login_successful) and the second object depicts a file access (see type: file_accessed) from a remote location:
{"params": {"user": "intact-gray-marlin-trademarkagent"}, "type": "login_successful", "time": "2019-11-14T11:26:43Z", "uid": "intact-gray-marlin-trademarkagent", "id": 21567530, "uidType": "name"}
{"isLocalIP": false, "params": {"path": "/proud-copper-orangutan-artexer/doubtful-plum-ptarmigan-merchant/insufficient-amaranth-earthworm-qualitycontroller/curious-silver-galliform-tradingstandards/incredible-indigo-octopus-printfinisher/wicked-bronze-sloth-claimsmanager/frantic-aquamarine-horse-cleric"}, "type": "file_accessed", "time": "2019-11-14T11:26:51Z", "uid": "graceful-olive-spoonbill-careersofficer", "id": 21567531, "location": {"countryCode": "AT", "countryName": "Austria", "region": "4", "city": "Gmunden", "latitude": 47.915, "longitude": 13.7959, "timezone": "Europe/Vienna", "postalCode": "4810", "metroCode": null, "regionName": "Upper Austria", "isInEuropeanUnion": true, "continent": "Europe", "accuracyRadius": 50}, "uidType": "ipaddress"}
The data set was generated at the premises of Huemer Group, a midsize IT service provider located in Vienna, Austria. Huemer Group offers a range of Infrastructure-as-a-Service solutions for enterprises, including cloud computing and storage. In particular, their cloud storage solution called hBOX enables customers to upload their data, synchronize them with multiple devices, share files with others, create versions and backups of their documents, collaborate with team members in shared data spaces, and query the stored documents using search terms. The hBOX extends the open-source project Nextcloud with interfaces and functionalities tailored to the requirements of customers.
The data set comprises only normal user behavior, but can be used to evaluate anomaly detection approaches by simulating account hijacking. We provide an implementation for identifying similar users, switching pairs of users to simulate changes of behavior patterns, and a sample detection approach in our github repo.
Acknowledgements: Partially funded by the FFG project DECEPT (873980). The authors thank Walter Huemer, Oskar Kruschitz, Kevin Truckenthanner, and Christian Aigner from Huemer Group for supporting the collection of the data set.
If you use the dataset, please cite the following publication:
[1] M. Landauer, F. Skopik, G. Höld, and M. Wurzenberger. "A User and Entity Behavior Analytics Log Data Set for Anomaly Detection in Cloud Computing". 2022 IEEE International Conference on Big Data - 6th International Workshop on Big Data Analytics for Cyber Intelligence and Defense (BDA4CID 2022), December 17-20, 2022, Osaka, Japan. IEEE. [PDF]
f
Data Sheet 2_Visual analysis of multi-omics data.csv
frontiersin.figshare.com
csv
Updated Sep 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Austin Swart; Ron Caspi; Suzanne Paley; Peter D. Karp (2024). Data Sheet 2_Visual analysis of multi-omics data.csv [Dataset]. http://doi.org/10.3389/fbinf.2024.1395981.s002
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.3389/fbinf.2024.1395981.s002
Dataset updated
Sep 10, 2024
Dataset provided by
Frontiers
Authors
Austin Swart; Ron Caspi; Suzanne Paley; Peter D. Karp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a tool for multi-omics data analysis that enables simultaneous visualization of up to four types of omics data on organism-scale metabolic network diagrams. The tool’s interactive web-based metabolic charts depict the metabolic reactions, pathways, and metabolites of a single organism as described in a metabolic pathway database for that organism; the charts are constructed using automated graphical layout algorithms. The multi-omics visualization facility paints each individual omics dataset onto a different “visual channel” of the metabolic-network diagram. For example, a transcriptomics dataset might be displayed by coloring the reaction arrows within the metabolic chart, while a companion proteomics dataset is displayed as reaction arrow thicknesses, and a complementary metabolomics dataset is displayed as metabolite node colors. Once the network diagrams are painted with omics data, semantic zooming provides more details within the diagram as the user zooms in. Datasets containing multiple time points can be displayed in an animated fashion. The tool will also graph data values for individual reactions or metabolites designated by the user. The user can interactively adjust the mapping from data value ranges to the displayed colors and thicknesses to provide more informative diagrams.
D
Dataset Versioning For Analytics Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Oct 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Dataset Versioning For Analytics Market Research Report 2033 [Dataset]. https://dataintelo.com/report/dataset-versioning-for-analytics-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Oct 1, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Dataset Versioning for Analytics Market Outlook

According to our latest research, the global dataset versioning for analytics market size reached USD 527.4 million in 2024. The market is experiencing robust expansion with a remarkable CAGR of 18.2% during the forecast period. By 2033, the market is projected to achieve a value of USD 2,330.6 million. This growth is primarily driven by the escalating demand for efficient data management, regulatory compliance, and the proliferation of AI and machine learning applications across diverse industries.

The primary growth driver in the dataset versioning for analytics market is the exponential increase in data volume and complexity across organizations of all sizes. As enterprises continue to generate and utilize vast amounts of structured and unstructured data, the need for robust dataset versioning solutions has become imperative. These solutions enable organizations to track, manage, and analyze different versions of datasets, ensuring data integrity, reproducibility, and transparency throughout the analytics lifecycle. The surge in adoption of advanced analytics, machine learning, and artificial intelligence further amplifies the necessity for dataset versioning, as it facilitates the training, validation, and deployment of models with consistent and reliable data sources. In addition, the integration of dataset versioning tools with popular analytics platforms and cloud services has made these solutions more accessible and scalable, catering to the evolving needs of modern data-driven enterprises.

Another significant factor fueling market growth is the rising emphasis on data governance and regulatory compliance across industries such as BFSI, healthcare, and government. Stringent regulations like GDPR, HIPAA, and CCPA mandate organizations to maintain accurate records of data usage, lineage, and modifications. Dataset versioning solutions play a pivotal role in helping organizations meet these compliance requirements by providing comprehensive audit trails, access controls, and data lineage tracking. This not only mitigates the risk of non-compliance penalties but also enhances organizational trust and credibility. Furthermore, the growing awareness about the strategic importance of data governance in driving business value and mitigating operational risks has prompted enterprises to invest in sophisticated dataset versioning tools, thereby propelling market expansion.

The proliferation of cloud computing and the increasing adoption of hybrid and multi-cloud architectures are also contributing to the growth of the dataset versioning for analytics market. Cloud-based dataset versioning solutions offer unparalleled scalability, flexibility, and cost-efficiency, enabling organizations to manage and version datasets seamlessly across distributed environments. The shift towards cloud-native analytics and the integration of dataset versioning with cloud data lakes, warehouses, and analytics platforms have further accelerated market adoption. Additionally, advancements in automation, AI-driven data cataloging, and self-service analytics are enhancing the capabilities of dataset versioning tools, making them indispensable for organizations seeking to maximize the value of their data assets while minimizing operational complexities.

From a regional perspective, North America continues to dominate the dataset versioning for analytics market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of major technology vendors, high adoption rates of advanced analytics, and a mature regulatory landscape. However, the Asia Pacific region is witnessing the fastest growth, driven by rapid digital transformation, increasing investments in AI and analytics, and the emergence of data-centric industries. Europe also holds a significant market share, supported by stringent data protection regulations and growing awareness about data governance. The Middle East & Africa and Latin America are gradually catching up, with increasing adoption of cloud-based analytics and regulatory initiatives promoting data management best practices.

Component Analysis

The dataset versioning for analytics market is segmented by component into software and services. The software segment holds the dominant share, driven by the widespread adoption of standalone and integrated dataset versioning platforms that cater to various data management and analytics requirements. These s
Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and...
technavio.com
pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and Africa, South America - US, China, UK, Germany, Japan - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/cloud-analytics-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Jul 22, 2024
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2024 - 2028
Description
Snapshot img

Cloud Analytics Market Size 2024-2028

The cloud analytics market size is forecast to increase by USD 74.08 billion at a CAGR of 24.4% between 2023 and 2028.

The market is experiencing significant growth due to several key trends. The adoption of hybrid and multi-cloud setups is on the rise, as these configurations enhance data connectivity and flexibility. Another trend driving market growth is the increasing use of cloud security applications to safeguard sensitive data. However, concerns regarding confidential data security and privacy remain a challenge for market growth. Organizations must ensure robust security measures are in place to mitigate risks and maintain trust with their customers. Overall, the market is poised for continued expansion as businesses seek to leverage the benefits of cloud technologies for data processing and data analytics.

What will be the Size of the Cloud Analytics Market During the Forecast Period?

Request Free Sample

The market is experiencing significant growth due to the increasing volume of data generated by businesses and the demand for advanced analytics solutions. Cloud-based analytics enables organizations to process and analyze large datasets from various data sources, including unstructured data, in real-time. This is crucial for businesses looking to make data-driven decisions and gain valuable insights to optimize their operations and meet customer requirements. Key industries such as sales and marketing, customer service, and finance are adopting cloud analytics to improve key performance indicators and gain a competitive edge. Both Small and Medium-sized Enterprises (SMEs) and large enterprises are embracing cloud analytics, with solutions available on private, public, and multi-cloud platforms. Big data technology, such as machine learning and artificial intelligence, are integral to cloud analytics, enabling advanced data analytics and business intelligence. Cloud analytics provides businesses with the flexibility to store and process data In the cloud, reducing the need for expensive on-premises data storage and computation. Hybrid environments are also gaining popularity, allowing businesses to leverage the benefits of both private and public clouds. Overall, the market is poised for continued growth as businesses increasingly rely on data-driven insights to inform their decision-making processes.

How is this Cloud Analytics Industry segmented and which is the largest segment?

The cloud analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2017-2022 for the following segments.

Solution Hosted data warehouse solutions Cloud BI tools Complex event processing Others Deployment Public cloud Hybrid cloud Private cloud Geography North America US Europe Germany UK APAC China Japan Middle East and Africa South America

By Solution Insights

The hosted data warehouse solutions segment is estimated to witness significant growth during the forecast period.

Hosted data warehouses enable organizations to centralize and analyze large datasets from multiple sources, facilitating advanced analytics solutions and real-time insights. By utilizing cloud-based infrastructure, businesses can reduce operational costs through eliminating licensing expenses, hardware investments, and maintenance fees. Additionally, cloud solutions offer network security measures, such as Software Defined Networking and Network integration, ensuring data protection. Cloud analytics caters to diverse industries, including SMEs and large enterprises, addressing requirements for sales and marketing, customer service, and key performance indicators. Advanced analytics capabilities, including predictive analytics, automated decision making, and fraud prevention, are essential for data-driven decision making and business optimization.

Furthermore, cloud platforms provide access to specialized talent, big data technology, and AI, enhancing customer experiences and digital business opportunities. Data connectivity and data processing in real-time are crucial for network agility and application performance. Hosted data warehouses offer computational power and storage capabilities, ensuring efficient data utilization and enterprise information management. Cloud service providers offer various cloud environments, including private, public, multi-cloud, and hybrid, catering to diverse business needs. Compliance and security concerns are addressed through cybersecurity frameworks and data security measures, ensuring data breaches and thefts are minimized.

Get a glance at the Cloud Analytics Industry report of share of various segments Request Free Sample

The Hosted data warehouse solutions s
m
Data from: Cyber Attack Evaluation Dataset for Deep Packet Inspection and...
data.mendeley.com
Updated Oct 18, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shishir Kumar Shandilya (2022). Cyber Attack Evaluation Dataset for Deep Packet Inspection and Analysis [Dataset]. http://doi.org/10.17632/3szjvt3w78.1
Explore at:
Unique identifier
https://doi.org/10.17632/3szjvt3w78.1
Dataset updated
Oct 18, 2022
Authors
Shishir Kumar Shandilya
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To determine the effectiveness of any defense mechanism, there is a need for comprehensive real-time network data that solely references various attack scenarios based on older software versions or unprotected ports, and so on. This presented dataset has entire network data at the time of several cyber attacks to enable experimentation on challenges based on implementing defense mechanisms on a larger scale. For collecting the data, we captured the network traffic of configured virtual machines using Wireshark and tcpdump. To analyze the impact of several cyber attack scenarios, this dataset presents a set of ten computers connected to Router1 on VLAN1 in a Docker Bridge network, that try and exploit each other. It includes browsing the web and downloading foreign packages including malicious ones. Also, services like FTP and SSH were exploited using several attack mechanisms. The presented dataset shows the importance of updating and patching systems to protect themselves to a greater extent, by following attack tactics on older versions of packages as compared to the newer and updated ones. This dataset also includes an Apache Server hosted on the different subset on VLAN2 which is connected to the VLAN1 to demonstrate isolation and cross-VLAN communication. The services on this web server were also exploited by the previously stated ten computers. The attack types include: Distributed Denial of Service, SQL Injection, Account Takeover, Service Exploitation (SSH, FTP), DNS and ARP Spoofing, Scanning and Firewall Searching and Indexing (using Nmap), Hammering the services to brute-force passwords and usernames, Malware attack, Spoofing and Man-in-the-Middle Attack. The attack scenarios also show various scanning mechanisms and the impact of Insider Threats on the entire network.

Global Real-Time Index Database Market Research Report: By End Use...

wiseguyreports.com

Updated Sep 15, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Real-Time Index Database Market Research Report: By End Use (Financial Services, Healthcare, Telecommunications, Retail, Government), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By Database Type (Relational Database, NoSQL Database, Time-Series Database), By Application (Data Analytics, Real-Time Monitoring, Predictive Analysis, Reporting) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/real-time-index-database-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	2.29(USD Billion)
MARKET SIZE 2025	2.49(USD Billion)
MARKET SIZE 2035	5.8(USD Billion)
SEGMENTS COVERED	End Use, Deployment Type, Database Type, Application, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	growing demand for real-time analytics, increasing data volume and variety, rising cloud adoption trends, need for enhanced decision-making, regulatory compliance and data governance
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Nasdaq, Fitch Ratings, Tickdata, Thomson Reuters, MSCI, St. Louis Federal Reserve, FTSE Russell, Bloomberg, Morningstar, IHS Markit, S&P Dow Jones Indices, FactSet, S&P Global, Refinitiv
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Cloud-based solutions integration, Enhanced data analytics capabilities, Adoption in fintech applications, Real-time data accessibility demands, Rising importance of accurate indexing.
COMPOUND ANNUAL GROWTH RATE (CAGR)	8.8% (2025 - 2035)

d
Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...
datarade.ai
.json, .csv
Updated Aug 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataplex (2024). Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-global-social-media-data-1-1m-mill-dataplex
Explore at:
.json, .csvAvailable download formats
Dataset updated
Aug 12, 2024
Dataset authored and provided by
Dataplex
Area covered
Mexico, Jersey, Holy See, Botswana, Gambia, Macao, Christmas Island, Chile, Côte d'Ivoire, Martinique
Description
The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

Dataset Overview:

This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

Sourced Directly from Reddit:

All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

Key Features:

Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.

User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.

Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.

AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

Use Cases:

Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.

Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.

Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.

Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

Data Quality and Reliability:

The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

Integration and Usability:

The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

User-Friendly Structure and Metadata:

The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

Ideal For:

Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.

Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.

Researchers: Explore the social dynamics of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...
E-Commerce Customer Behavior & Sales Analysis -TR
kaggle.com
zip
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UmutUygurr (2025). E-Commerce Customer Behavior & Sales Analysis -TR [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/e-commerce-customer-behavior-and-sales-analysis-tr
Explore at:
zip(138245 bytes)Available download formats
Dataset updated
Oct 29, 2025
Authors
UmutUygurr
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🛒 E-Commerce Customer Behavior and Sales Dataset 📊 Dataset Overview This comprehensive dataset contains 5,000 e-commerce transactions from a Turkish online retail platform, spanning from January 2023 to March 2024. The dataset provides detailed insights into customer demographics, purchasing behavior, product preferences, and engagement metrics.

🎯 Use Cases This dataset is perfect for:

Customer Segmentation Analysis: Identify distinct customer groups based on behavior Sales Forecasting: Predict future sales trends and patterns Recommendation Systems: Build product recommendation engines Customer Lifetime Value (CLV) Prediction: Estimate customer value Churn Analysis: Identify customers at risk of leaving Marketing Campaign Optimization: Target customers effectively Price Optimization: Analyze price sensitivity across categories Delivery Performance Analysis: Optimize logistics and shipping 📁 Dataset Structure The dataset contains 18 columns with the following features:

Order Information Order_ID: Unique identifier for each order (ORD_XXXXXX format) Date: Transaction date (2023-01-01 to 2024-03-26) Customer Demographics Customer_ID: Unique customer identifier (CUST_XXXXX format) Age: Customer age (18-75 years) Gender: Customer gender (Male, Female, Other) City: Customer city (10 major Turkish cities) Product Information Product_Category: 8 categories (Electronics, Fashion, Home & Garden, Sports, Books, Beauty, Toys, Food) Unit_Price: Price per unit (in TRY/Turkish Lira) Quantity: Number of units purchased (1-5) Transaction Details Discount_Amount: Discount applied (if any) Total_Amount: Final transaction amount after discount Payment_Method: Payment method used (5 types) Customer Behavior Metrics Device_Type: Device used for purchase (Mobile, Desktop, Tablet) Session_Duration_Minutes: Time spent on website (1-120 minutes) Pages_Viewed: Number of pages viewed during session (1-50) Is_Returning_Customer: Whether customer has purchased before (True/False) Post-Purchase Metrics Delivery_Time_Days: Delivery duration (1-30 days) Customer_Rating: Customer satisfaction rating (1-5 stars) 📈 Key Statistics Total Records: 5,000 transactions Date Range: January 2023 - March 2024 (15 months) Average Transaction Value: ~450 TRY Customer Satisfaction: 3.9/5.0 average rating Returning Customer Rate: 60% Mobile Usage: 55% of transactions 🔍 Data Quality ✅ No missing values ✅ Consistent formatting across all fields ✅ Realistic data distributions ✅ Proper data types for all columns ✅ Logical relationships between features 💡 Sample Analysis Ideas Customer Segmentation with K-Means Clustering

Segment customers based on spending, frequency, and recency Sales Trend Analysis

Identify seasonal patterns and peak shopping periods Product Category Performance

Compare revenue, ratings, and return rates across categories Device-Based Behavior Analysis

Understand how device choice affects purchasing patterns Predictive Modeling

Build models to predict customer ratings or purchase amounts City-Level Market Analysis

Compare market performance across different cities 🛠️ Technical Details File Format: CSV (Comma-Separated Values) Encoding: UTF-8 File Size: ~500 KB Delimiter: Comma (,) 📚 Column Descriptions Column Name Data Type Description Example Order_ID String Unique order identifier ORD_001337 Customer_ID String Unique customer identifier CUST_01337 Date DateTime Transaction date 2023-06-15 Age Integer Customer age 35 Gender String Customer gender Female City String Customer city Istanbul Product_Category String Product category Electronics Unit_Price Float Price per unit 1299.99 Quantity Integer Units purchased 2 Discount_Amount Float Discount applied 129.99 Total_Amount Float Final amount paid 2469.99 Payment_Method String Payment method Credit Card Device_Type String Device used Mobile Session_Duration_Minutes Integer Session time 15 Pages_Viewed Integer Pages viewed 8 Is_Returning_Customer Boolean Returning customer True Delivery_Time_Days Integer Delivery duration 3 Customer_Rating Integer Satisfaction rating 5 🎓 Learning Outcomes By working with this dataset, you can learn:

Data cleaning and preprocessing techniques Exploratory Data Analysis (EDA) with Python/R Statistical analysis and hypothesis testing Machine learning model development Data visualization best practices Business intelligence and reporting 📝 Citation If you use this dataset in your research or project, please cite:

E-Commerce Customer Behavior and Sales Dataset (2024) Turkish Online Retail Platform Data (2023-2024) Available on Kaggle ⚖️ License This dataset is released under the CC0: Public Domain license. You are free to use it for any purpose.

🤝 Contribution Found any issues or have suggestions? Feel free to provide feedback!

📞 Contact For questions or collaborations, please reach out through Kaggle.

Happy Analyzing! 🚀

Keywords: e-c...
EnviroAtlas - NatureServe Analysis of Imperiled or Federally Listed Species...
catalog.data.gov
s.cnmilf.com
+1more
Updated Jul 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency, Office of Research and Development-Sustainable and Healthy Communities Research Program, EnviroAtlas (Point of Contact) (2025). EnviroAtlas - NatureServe Analysis of Imperiled or Federally Listed Species by HUC-12 for the Conterminous United States [Dataset]. https://catalog.data.gov/dataset/enviroatlas-natureserve-analysis-of-imperiled-or-federally-listed-species-by-huc-12-for-the-con4
Explore at:
Dataset updated
Jul 26, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Contiguous United States, United States
Description
This EnviroAtlas dataset includes analysis by NatureServe of species that are Imperiled (G1/G2) or Listed under the U.S. Endangered Species Act (ESA) by 12-digit Hydrologic Units (HUCs). The analysis results are for use and publication by both the LandScope America website and by the EnviroAtlas. Results are provided for the total number of Aquatic Associated G1-G2/ESA species, the total number of Wetland Associated G1-G2/ESA species, the total number of Terrestrial Associated G1-G2/ESA species, and the total number of Unknown Habitat Association G1-G2/ESA species in each HUC12. NatureServe is a non-profit organization dedicated to developing and providing information about the world's plants, animals, and ecological communities. NatureServe works in partnership with 82 independent Natural Heritage programs and Conservation Data Centers that gather scientific information on rare species and ecosystems in the United States, Latin America, and Canada (the Natural Heritage Network). NatureServe is a leading source for biodiversity information that is essential for effective conservation action. This dataset was produced by NatureServe to support research and online mapping activities related to EnviroAtlas. EnviroAtlas (https://www.epa.gov/enviroatlas) allows the user to interact with a web-based, easy-to-use, mapping application to view and analyze multiple ecosystem services for the contiguous United States. The dataset is available as downloadable data (https://edg.epa.gov/data/Public/ORD/EnviroAtlas) or as an EnviroAtlas map service. Additional descriptive information about each attribute in this dataset can be found in its associated EnviroAtlas Fact Sheet (https://www.epa.gov/enviroatlas/enviroatlas-fact-sheets).
H
Data from: An emotion analysis dataset of course comment texts in massive...
dataverse.harvard.edu
Updated Sep 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Feng; Keyi Yuan; Xiu Guan; Longhui Qiu (2022). An emotion analysis dataset of course comment texts in massive online learning course platforms [Dataset]. http://doi.org/10.7910/DVN/LC6GHO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LC6GHO
Dataset updated
Sep 26, 2022
Dataset provided by
Harvard Dataverse
Authors
Xiang Feng; Keyi Yuan; Xiu Guan; Longhui Qiu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Datasets are critical for emotion analysis in the machine learning field. This study aims to explore emotion analysis datasets and related benchmarks in online learning, since, currently, there are very few studies that explore the same. We have scientifically labeled the topic and nine-category emotion of 4715 comment texts in online learning platforms using the “three-person voting label method” based on the “sentence-level” and multi-category labeling dimensions with our self-developed system. After testing the consistency of the labeling results using the Fleiss Kappa method, we found that the consistency of the dataset was about 0.51, representing a moderate strength of agreement. Based on the dataset, the prediction accuracy of the Long-Short Term Memory (LSTM) method is about 0.68. This dataset provides a benchmark for the multi- category emotion dataset in the Chinese online learning field. It can provide a basis for the subsequent solution of emotion analysis, monitoring, and intervention in the education field. It can also provide a reference for constructing subsequent datasets in the education field. We need to remind you that this is a Chinese dataset. If you want to use this dataset, please contact the author and you should request for the dataset below.
I
Self-citation analysis data based on PubMed Central subset (2002-2005)
databank.illinois.edu
aws-databank-alb.library.illinois.edu
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik, Self-citation analysis data based on PubMed Central subset (2002-2005) [Dataset]. http://doi.org/10.13012/B2IDB-9665377_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9665377_V1
Authors
Shubhanshu Mishra; Brent D Fegley; Jana Diesner; Vetle I. Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
U.S. National Institutes of Health (NIH)
U.S. National Science Foundation (NSF)
Description
Self-citation analysis data based on PubMed Central subset (2002-2005) ---------------------------------------------------------------------- Created by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik on April 5th, 2018 ## Introduction This is a dataset created as part of the publication titled: Mishra S, Fegley BD, Diesner J, Torvik VI (2018) Self-Citation is the Hallmark of Productive Authors, of Any Gender. PLOS ONE. It contains files for running the self citation analysis on articles published in PubMed Central between 2002 and 2005, collected in 2015. The dataset is distributed in the form of the following tab separated text files: * Training_data_2002_2005_pmc_pair_First.txt (1.2G) - Data for first authors * Training_data_2002_2005_pmc_pair_Last.txt (1.2G) - Data for last authors * Training_data_2002_2005_pmc_pair_Middle_2nd.txt (964M) - Data for middle 2nd authors * Training_data_2002_2005_pmc_pair_txt.header.txt - Header for the data * COLUMNS_DESC.txt file - Descriptions of all columns * model_text_files.tar.gz - Text files containing model coefficients and scores for model selection. * results_all_model.tar.gz - Model coefficient and result files in numpy format used for plotting purposes. v4.reviewer contains models for analysis done after reviewer comments. * README.txt file ## Dataset creation Our experiments relied on data from multiple sources including properitery data from Thompson Rueter's (now Clarivate Analytics) Web of Science collection of MEDLINE citations. Author's interested in reproducing our experiments should personally request from Clarivate Analytics for this data. However, we do make a similar but open dataset based on citations from PubMed Central which can be utilized to get similar results to those reported in our analysis. Furthermore, we have also freely shared our datasets which can be used along with the citation datasets from Clarivate Analytics, to re-create the datased used in our experiments. These datasets are listed below. If you wish to use any of those datasets please make sure you cite both the dataset as well as the paper introducing the dataset. * MEDLINE 2015 baseline: https://www.nlm.nih.gov/bsd/licensee/2015_stats/baseline_doc.html * Citation data from PubMed Central (original paper includes additional citations from Web of Science) * Author-ity 2009 dataset: - Dataset citation: Torvik, Vetle I.; Smalheiser, Neil R. (2018): Author-ity 2009 - PubMed author name disambiguated dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4222651_V1 - Paper citation: Torvik, V. I., & Smalheiser, N. R. (2009). Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3), 1–29. https://doi.org/10.1145/1552303.1552304 - Paper citation: Torvik, V. I., Weeber, M., Swanson, D. R., & Smalheiser, N. R. (2004). A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for Information Science and Technology, 56(2), 140–158. https://doi.org/10.1002/asi.20105 * Genni 2.0 + Ethnea for identifying author gender and ethnicity: - Dataset citation: Torvik, Vetle (2018): Genni + Ethnea for the Author-ity 2009 dataset. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-9087546_V1 - Paper citation: Smith, B. N., Singh, M., & Torvik, V. I. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries - JCDL ’13. ACM Press. https://doi.org/10.1145/2467696.2467720 - Paper citation: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington DC, USA. http://hdl.handle.net/2142/88927 * MapAffil for identifying article country of affiliation: - Dataset citation: Torvik, Vetle I. (2018): MapAffil 2016 dataset -- PubMed author affiliations mapped to cities and their geocodes worldwide. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4354331_V1 - Paper citation: Torvik VI. MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide. D-Lib magazine : the magazine of the Digital Library Forum. 2015;21(11-12):10.1045/november2015-torvik * IMPLICIT journal similarity: - Dataset citation: Torvik, Vetle (2018): Author-implicit journal, MeSH, title-word, and affiliation-word pairs based on Author-ity 2009. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-4742014_V1 * Novelty dataset for identify article level novelty: - Dataset citation: Mishra, Shubhanshu; Torvik, Vetle I. (2018): Conceptual novelty scores for PubMed articles. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-5060298_V1 - Paper citation: Mishra S, Torvik VI. Quantifying Conceptual Novelty in the Biomedical Literature. D-Lib magazine : The Magazine of the Digital Library Forum. 2016;22(9-10):10.1045/september2016-mishra - Code: https://github.com/napsternxg/Novelty * Expertise dataset for identifying author expertise on articles: * Source code provided at: https://github.com/napsternxg/PubMed_SelfCitationAnalysis Note: The dataset is based on a snapshot of PubMed (which includes Medline and PubMed-not-Medline records) taken in the first week of October, 2016. Check here for information to get PubMed/MEDLINE, and NLMs data Terms and Conditions Additional data related updates can be found at Torvik Research Group ## Acknowledgments This work was made possible in part with funding to VIT from NIH grant P01AG039347 and NSF grant 1348742. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ## License Self-citation analysis data based on PubMed Central subset (2002-2005) by Shubhanshu Mishra, Brent D. Fegley, Jana Diesner, and Vetle Torvik is licensed under a Creative Commons Attribution 4.0 International License. Permissions beyond the scope of this license may be available at https://github.com/napsternxg/PubMed_SelfCitationAnalysis.
d
Data from: Detecting and quantifying social transmission using network-based...
datadryad.org
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Aug 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Hasenjager; Ellouise Leadbeater; William Hoppitt (2020). Detecting and quantifying social transmission using network-based diffusion analysis [Dataset]. http://doi.org/10.5061/dryad.280gb5mnj
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.280gb5mnj
Dataset updated
Aug 21, 2020
Dataset provided by
Dryad
Authors
Matthew Hasenjager; Ellouise Leadbeater; William Hoppitt
Time period covered
Jul 7, 2020
Description
Annotated tutorials and example code are provided describing the use of these data.

Global NoSQL Database Market Research Report: By Database Type (Document...

wiseguyreports.com

Updated Sep 27, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global NoSQL Database Market Research Report: By Database Type (Document Store, Key-Value Store, Column Store, Graph Database), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User Industry (IT and Telecommunications, Retail, Healthcare, Banking and Financial Services), By Application (Real-Time Big Data Analytics, Content Management, Mobile Applications, Internet of Things) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/nosql-database-market

Explore at:

Dataset updated

Sep 27, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	7.18(USD Billion)
MARKET SIZE 2025	7.89(USD Billion)
MARKET SIZE 2035	20.0(USD Billion)
SEGMENTS COVERED	Database Type, Deployment Type, End User Industry, Application, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Scalability and Flexibility, Real-time Data Processing, Increased Cloud Adoption, Big Data Integration, Cost-effective Solutions
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	DataStax, Microsoft, Amazon Web Services, Teradata, Aerospike, MongoDB, Berkeley DB, Google, MarkLogic, IBM, Redis Labs, Couchbase, Cassandra, CouchDB, Oracle
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Cloud-based database solutions, Increasing demand for big data analytics, Integration with AI and machine learning, Growing adoption in IoT applications, Enhanced scalability for multi-cloud environments
COMPOUND ANNUAL GROWTH RATE (CAGR)	9.8% (2025 - 2035)

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

zenodo.org
data.niaid.nih.gov
+1more

Updated Aug 30, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa (2024). IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT [Dataset]. http://doi.org/10.5281/zenodo.8116338

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.8116338

Dataset updated

Aug 30, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

José Areia; José Areia; Ivo Afonso Bispo; Ivo Afonso Bispo; Leonel Santos; Leonel Santos; Rogério Luís Costa; Rogério Luís Costa

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Article Information

The work involved in developing the dataset and benchmarking its use of machine learning is set out in the article ‘IoMT-TrafficData: Dataset and Tools for Benchmarking Intrusion Detection in Internet of Medical Things’. DOI: 10.1109/ACCESS.2024.3437214.

Please do cite the aforementioned article when using this dataset.

Abstract

The increasing importance of securing the Internet of Medical Things (IoMT) due to its vulnerabilities to cyber-attacks highlights the need for an effective intrusion detection system (IDS). In this study, our main objective was to develop a Machine Learning Model for the IoMT to enhance the security of medical devices and protect patients’ private data. To address this issue, we built a scenario that utilised the Internet of Things (IoT) and IoMT devices to simulate real-world attacks. We collected and cleaned data, pre-processed it, and provided it into our machine-learning model to detect intrusions in the network. Our results revealed significant improvements in all performance metrics, indicating robustness and reproducibility in real-world scenarios. This research has implications in the context of IoMT and cybersecurity, as it helps mitigate vulnerabilities and lowers the number of breaches occurring with the rapid growth of IoMT devices. The use of machine learning algorithms for intrusion detection systems is essential, and our study provides valuable insights and a road map for future research and the deployment of such systems in live environments. By implementing our findings, we can contribute to a safer and more secure IoMT ecosystem, safeguarding patient privacy and ensuring the integrity of medical data.

ZIP Folder Content

The ZIP folder comprises two main components: Captures and Datasets. Within the captures folder, we have included all the captures used in this project. These captures are organized into separate folders corresponding to the type of network analysis: BLE or IP-Based. Similarly, the datasets folder follows a similar organizational approach. It contains datasets categorized by type: BLE, IP-Based Packet, and IP-Based Flows.

To cater to diverse analytical needs, the datasets are provided in two formats: CSV (Comma-Separated Values) and pickle. The CSV format facilitates seamless integration with various data analysis tools, while the pickle format preserves the intricate structures and relationships within the dataset.

This organization enables researchers to easily locate and utilize the specific captures and datasets they require, based on their preferred network analysis type or dataset type. The availability of different formats further enhances the flexibility and usability of the provided data.

Datasets' Content

Within this dataset, three sub-datasets are available, namely BLE, IP-Based Packet, and IP-Based Flows. Below is a table of the features selected for each dataset and consequently used in the evaluation model within the provided work.

Identified Key Features Within Bluetooth Dataset

Feature	Meaning
btle.advertising_header	BLE Advertising Packet Header
btle.advertising_header.ch_sel	BLE Advertising Channel Selection Algorithm
btle.advertising_header.length	BLE Advertising Length
btle.advertising_header.pdu_type	BLE Advertising PDU Type
btle.advertising_header.randomized_rx	BLE Advertising Rx Address
btle.advertising_header.randomized_tx	BLE Advertising Tx Address
btle.advertising_header.rfu.1	Reserved For Future 1
btle.advertising_header.rfu.2	Reserved For Future 2
btle.advertising_header.rfu.3	Reserved For Future 3
btle.advertising_header.rfu.4	Reserved For Future 4
btle.control.instant	Instant Value Within a BLE Control Packet
btle.crc.incorrect	Incorrect CRC
btle.extended_advertising	Advertiser Data Information
btle.extended_advertising.did	Advertiser Data Identifier
btle.extended_advertising.sid	Advertiser Set Identifier
btle.length	BLE Length
frame.cap_len	Frame Length Stored Into the Capture File
frame.interface_id	Interface ID
frame.len	Frame Length Wire
nordic_ble.board_id	Board ID
nordic_ble.channel	Channel Index
nordic_ble.crcok	Indicates if CRC is Correct
nordic_ble.flags	Flags
nordic_ble.packet_counter	Packet Counter
nordic_ble.packet_time	Packet time (start to end)
nordic_ble.phy	PHY
nordic_ble.protover	Protocol Version

Identified Key Features Within IP-Based Packets Dataset

Feature	Meaning
http.content_length	Length of content in an HTTP response
http.request	HTTP request being made
http.response.code	Sequential number of an HTTP response
http.response_number	Sequential number of an HTTP response
http.time	Time taken for an HTTP transaction
tcp.analysis.initial_rtt	Initial round-trip time for TCP connection
tcp.connection.fin	TCP connection termination with a FIN flag
tcp.connection.syn	TCP connection initiation with SYN flag
tcp.connection.synack	TCP connection establishment with SYN-ACK flags
tcp.flags.cwr	Congestion Window Reduced flag in TCP
tcp.flags.ecn	Explicit Congestion Notification flag in TCP
tcp.flags.fin	FIN flag in TCP
tcp.flags.ns	Nonce Sum flag in TCP
tcp.flags.res	Reserved flags in TCP
tcp.flags.syn	SYN flag in TCP
tcp.flags.urg	Urgent flag in TCP
tcp.urgent_pointer	Pointer to urgent data in TCP
ip.frag_offset	Fragment offset in IP packets
eth.dst.ig	Ethernet destination is in the internal network group
eth.src.ig	Ethernet source is in the internal network group
eth.src.lg	Ethernet source is in the local network group
eth.src_not_group	Ethernet source is not in any network group
arp.isannouncement	Indicates if an ARP message is an announcement

Identified Key Features Within IP-Based Flows Dataset

Feature	Meaning
proto	Transport layer protocol of the connection
service	Identification of an application protocol
orig_bytes	Originator payload bytes
resp_bytes	Responder payload bytes
history	Connection state history
orig_pkts	Originator sent packets
resp_pkts	Responder sent packets
flow_duration	Length of the flow in seconds
fwd_pkts_tot	Forward packets total
bwd_pkts_tot	Backward packets total
fwd_data_pkts_tot	Forward data packets total
bwd_data_pkts_tot	Backward data packets total
fwd_pkts_per_sec	Forward packets per second
bwd_pkts_per_sec	Backward packets per second
flow_pkts_per_sec	Flow packets per second
fwd_header_size	Forward header bytes
bwd_header_size	Backward header bytes
fwd_pkts_payload	Forward payload bytes
bwd_pkts_payload	Backward payload bytes
flow_pkts_payload	Flow payload bytes
fwd_iat	Forward inter-arrival time
bwd_iat	Backward inter-arrival time
flow_iat	Flow inter-arrival time
active	Flow active duration

f
Data from: CloMet: A Novel Open-Source and Modular Software Platform That...
acs.figshare.com
xlsx
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jordi Rodeiro; Ester Vidaña-Vila; Joan Navarro; Roger Mallol (2024). CloMet: A Novel Open-Source and Modular Software Platform That Connects Established Metabolomics Repositories and Data Analysis Resources [Dataset]. http://doi.org/10.1021/acs.jproteome.2c00602.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.2c00602.s002
Dataset updated
Feb 9, 2024
Dataset provided by
ACS Publications
Authors
Jordi Rodeiro; Ester Vidaña-Vila; Joan Navarro; Roger Mallol
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The field of metabolomics has witnessed the development of hundreds of computational tools, but only a few have become cornerstones of this field. While MetaboLights and Metabolomics Workbench are two well-established data repositories for metabolomics data sets, Workflows4Metabolomics and MetaboAnalyst are two well-established web-based data analysis platforms for metabolomics. Yet, the raw data stored in the aforementioned repositories lack standardization in terms of the file system format used to store the associated acquisition files. Consequently, it is not straightforward to reuse available data sets as input data in the above-mentioned data analysis resources, especially for non-expert users. This paper presents CloMet, a novel open-source modular software platform that contributes to standardization, reusability, and reproducibility in the metabolomics field. CloMet, which is available through a Docker file, converts raw and NMR-based metabolomics data from MetaboLights and Metabolomics Workbench to a file format that can be used directly either in MetaboAnalyst or in Workflows4Metabolomics. We validated both CloMet and the output data using data sets from these repositories. Overall, CloMet fills the gap between well-established data repositories and web-based statistical platforms and contributes to the consolidation of a data-driven perspective of the metabolomics field by leveraging and connecting existing data and resources.
m
Cloud-based Database Market Industry Size, Share & Growth Analysis 2033
marketresearchintellect.com
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Intellect (2025). Cloud-based Database Market Industry Size, Share & Growth Analysis 2033 [Dataset]. https://www.marketresearchintellect.com/product/global-cloud-based-database-market-size-and-forecast/
Explore at:
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Market Research Intellect
License
https://www.marketresearchintellect.com/privacy-policyhttps://www.marketresearchintellect.com/privacy-policy
Area covered
Global
Description
Learn more about the Cloud-based Database Market Report by Market Research Intellect, which stood at USD 10.5 billion in 2024 and is forecast to expand to USD 25.0 billion by 2033, growing at a CAGR of 10.5%.Discover how new strategies, rising investments, and top players are shaping the future.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jason Porzelius (2023). Google Certificate BellaBeats Capstone Project [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-certificate-bellabeats-capstone-project

Google Certificate BellaBeats Capstone Project

Explore at:

zip(169161 bytes)Available download formats

Dataset updated

Jan 5, 2023

Authors

Jason Porzelius

Description

Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

Section 1 - Ask:

A. Guiding Questions:
1. Who are the key stakeholders and what are their goals for the data analysis project? 2. What is the business task that this data analysis project is attempting to solve?

B. Key Tasks: 1. Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team.

Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

Section 2 - Prepare:

A. Guiding Questions: 1. Where is the data stored and organized? 2. Are there any problems with the data? 3. How does the data help answer the business question?

B. Key Tasks:

Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016.
*Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDay_merged.csv -dailyActivity_merged.csv
Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual ...

Clear search

Close search

Google apps

Main menu

Google Certificate BellaBeats Capstone Project

Example ScRNAseq Dataset 2 for Learning Web-based Tools

Predictive Modeling of E-Commerce Purchase Intent

Healthcare Cloud Based Analytics Market Report | Global Forecast From 2025...

Healthcare Cloud Based Analytics Market Outlook

Component Analysis

Cloud-based User Entity Behavior Analytics Log Data Set

Data Sheet 2_Visual analysis of multi-omics data.csv

Dataset Versioning For Analytics Market Research Report 2033

Dataset Versioning for Analytics Market Outlook

Component Analysis

Cloud Analytics Market Analysis North America, Europe, APAC, Middle East and...

Snapshot img

Data from: Cyber Attack Evaluation Dataset for Deep Packet Inspection and...

Global Real-Time Index Database Market Research Report: By End Use...

Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...

E-Commerce Customer Behavior & Sales Analysis -TR

EnviroAtlas - NatureServe Analysis of Imperiled or Federally Listed Species...

Data from: An emotion analysis dataset of course comment texts in massive...

Self-citation analysis data based on PubMed Central subset (2002-2005)

Data from: Detecting and quantifying social transmission using network-based...

Global NoSQL Database Market Research Report: By Database Type (Document...

IoMT-TrafficData: A Dataset for Benchmarking Intrusion Detection in IoMT

Article Information

Abstract

ZIP Folder Content

Datasets' Content

Data from: CloMet: A Novel Open-Source and Modular Software Platform That...

Cloud-based Database Market Industry Size, Share & Growth Analysis 2033

Google Certificate BellaBeats Capstone Project