100+ datasets found

f
Data from: Data Nuggets: A Method for Reducing Big Data While Preserving...
tandf.figshare.com
tar
Updated Jun 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure [Dataset]. http://doi.org/10.6084/m9.figshare.25594361.v1
Explore at:
tarAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25594361.v1
Dataset updated
Jun 11, 2024
Dataset provided by
Taylor & Francis
Authors
Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Big data, with N × P dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order P*N(N−1)2. To circumvent this problem, typically the clustering technique is applied to a random sample drawn from the dataset; however, a weakness is that the structure of the dataset, particularly at the edges, is not necessarily maintained. We propose a new solution through the concept of “data nuggets”, which reduces a large dataset into a small collection of nuggets of data, each containing a center, weight, and scale parameter. The data nuggets are then input into algorithms that compute methods such as principal components analysis and clustering in a more computationally efficient manner. We show the consistency of the data nuggets based covariance estimator and apply the methodology of data nuggets to perform exploratory analysis of a flow cytometry dataset containing over one million observations using PCA and K-means clustering for weighted observations. Supplementary materials for this article are available online.
B
Big Data Technology and Service Report
archivemarketresearch.com
doc, pdf, ppt
Updated Aug 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Big Data Technology and Service Report [Dataset]. https://www.archivemarketresearch.com/reports/big-data-technology-and-service-557036
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Aug 28, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Big Data Technology and Services market is experiencing robust growth, projected to reach a market size of $38.28 billion in 2025. While the provided CAGR is missing, considering the rapid adoption of big data solutions across various industries and the continuous innovation in areas like AI and machine learning, a conservative estimate of a 15% CAGR for the forecast period (2025-2033) seems plausible. This would translate to significant market expansion over the next decade. Key drivers include the increasing volume of data generated by businesses and individuals, the need for improved data analytics capabilities for better decision-making, and the growing adoption of cloud-based big data solutions. Furthermore, the rising demand for real-time data processing and insights across sectors like finance, healthcare, and retail fuels market growth. While data security and privacy concerns represent a restraint, the development of robust security protocols and regulatory frameworks is mitigating this risk. The market is segmented across various technologies (e.g., Hadoop, NoSQL databases, data warehousing), services (e.g., data integration, data analytics, consulting), and deployment models (cloud, on-premise). Leading players like IBM, Microsoft, and others are constantly innovating and expanding their offerings, fostering competition and driving market evolution. The market's growth is further propelled by trends such as the increasing adoption of advanced analytics techniques, the integration of big data with IoT (Internet of Things) devices, and the rising demand for specialized big data skills. The diverse applications of big data across various sectors ensure sustained growth, creating opportunities for both established players and emerging startups. The competitive landscape is characterized by a mix of large technology vendors and specialized service providers, with ongoing mergers and acquisitions shaping the market structure. Continued investment in research and development in areas like data visualization and predictive analytics will be crucial for maintaining the market's momentum. Geographical expansion into developing economies presents further growth opportunities. The predicted CAGR and market size reflect a strong growth trajectory, making it an attractive investment opportunity for stakeholders.
e
Introduction to Hadoop and Hadoop Architecture
paper.erudition.co.in
html
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Introduction to Hadoop and Hadoop Architecture [Dataset]. https://paper.erudition.co.in/makaut/btech-in-electronics-and-instrumentation-engineering/8/big-data-analysis
Explore at:
htmlAvailable download formats
Dataset updated
Dec 3, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Introduction to Hadoop and Hadoop Architecture of Big Data Analysis, 8th Semester , Applied Electronics and Instrumentation Engineering
Predicting social capital by mining Facebook data
figshare.com
pdf
Updated Dec 31, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Chen (2015). Predicting social capital by mining Facebook data [Dataset]. http://doi.org/10.6084/m9.figshare.1448769.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1448769.v1
Dataset updated
Dec 31, 2015
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Lucia Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Psychology still relies often on questionnaires to gather data. Despite ubiquitous computing and internet, these are often still conducted with paper and clipboard. Where the internet is used, standalone questionnaires from bulk providers like Survey Monkey are the norm. For our studies related to social networks on online disclosure, we have developed a custom site for online questionnaires, designed to engage participants and allow linking of data from one study to the next – PsyQu.com PsyQu is a modern website developed around a database and as such has a ‘schema’. This data structure encapsulates the project, researcher(s) and participant(s) in a manner that allows for participants to link multiple attempts at multiple studies under their single account. This will allow cross-linked and longitudinal studies to be performed. By moving beyond standalone questionnaires, we hope to discover new correlative and predictive patterns between online behavior and other psychological dimensions. At present, the site is in alpha testing mode with only 1 group of researchers and 3 studies: social capital, online self-disclosure and personality. In the social capital study, we used a standard scale for investigating online social capital and social trust, in an attempt to find out differences between various groups. A paper survey was also conducted in order to compare with the online survey since there has been debate on the reliability of online participation. We will present the website, initial results of the social capital study.
D
Data Architecture Modernization Report
marketresearchforecast.com
doc, pdf, ppt
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Architecture Modernization Report [Dataset]. https://www.marketresearchforecast.com/reports/data-architecture-modernization-537734
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Aug 10, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Architecture Modernization market is experiencing robust growth, driven by the increasing need for businesses to adapt to the ever-evolving digital landscape. The market's expansion is fueled by several key factors, including the rising adoption of cloud computing, the exponential growth of data volumes, and the imperative for enhanced data security and compliance. Organizations are increasingly recognizing the strategic value of modernizing their data architectures to improve agility, scalability, and overall efficiency. This modernization involves migrating legacy systems to cloud-based platforms, adopting advanced analytics tools, and implementing robust data governance frameworks. The shift towards real-time data processing and the increasing demand for data-driven decision-making are further accelerating market growth. Competition is fierce, with established players like NTT DATA and Rackspace competing alongside specialized analytics firms and cloud providers. The market is segmented based on deployment type (cloud, on-premise, hybrid), organization size (small, medium, large enterprises), and industry vertical (BFSI, healthcare, retail, etc.). While the precise market size is unavailable, reasonable estimates suggest a substantial and rapidly growing market exceeding $10 billion by 2025, exhibiting a Compound Annual Growth Rate (CAGR) of approximately 15% over the forecast period (2025-2033). Despite the rapid growth, challenges remain. These include the complexities of migrating legacy systems, the need for skilled professionals experienced in modern data architectures, and the high initial investment costs associated with modernization projects. Data security and privacy concerns also pose significant hurdles. However, the long-term benefits of improved data management, enhanced operational efficiency, and the ability to gain valuable insights from data are expected to outweigh these challenges, driving continued market growth. The market’s future hinges on ongoing technological innovation, the increasing affordability of cloud-based solutions, and the growing awareness among businesses of the importance of data-driven decision-making.
d
Data Set Big Data Analytics in Business for Marketing Research (2011-2024)
search.dataone.org
dataverse.harvard.edu
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DJAKARIYA, CHOIRON; Tang Herman, Robertus (2025). Data Set Big Data Analytics in Business for Marketing Research (2011-2024) [Dataset]. http://doi.org/10.7910/DVN/YUIK4L
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/YUIK4L
Dataset updated
Oct 29, 2025
Dataset provided by
Harvard Dataverse
Authors
DJAKARIYA, CHOIRON; Tang Herman, Robertus
Time period covered
Jan 1, 2011 - Jan 1, 2024
Description
The research & publication Big Data Analytics in Business for Marketing Research: A Retrospective of Domain and Knowledge Structure, which was indexed by Scopus between 2011 to 2024. The data contains 448 documents data: authors, authors ID Sggggg, title, year, source title, volume, issue, article number in Scopus DOJ, link, affiliation, abstract, index keywords, references, corespondence address, editors, publisher, conference name, conference date, conference code, ISSN. language, document type, access type, and EID
r
International Journal of Engineering and Advanced Technology Acceptance Rate...
researchhelpdesk.org
Updated May 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Engineering and Advanced Technology Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/552/international-journal-of-engineering-and-advanced-technology
Explore at:
Dataset updated
May 1, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Engineering and Advanced Technology Acceptance Rate - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level
d
Data from: GeoThermalCloud framework for fusion of big data and...
catalog.data.gov
gdr.openei.org
+2more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Los Alamos National Laboratory (2025). GeoThermalCloud framework for fusion of big data and multi-physics models in Nevada and Southwest New Mexico [Dataset]. https://catalog.data.gov/dataset/geothermalcloud-framework-for-fusion-of-big-data-and-multi-physics-models-in-nevada-and-so-31a4e
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Los Alamos National Laboratory
Area covered
New Mexico
Description
Our GeoThermalCloud framework is designed to process geothermal datasets using a novel toolbox for unsupervised and physics-informed machine learning called SmartTensors. More information about GeoThermalCloud can be found at the GeoThermalCloud GitHub Repository. More information about SmartTensors can be found at the SmartTensors Github Repository and the SmartTensors page at LANL.gov. Links to these pages are included in this submission. GeoThermalCloud.jl is a repository containing all the data and codes required to demonstrate applications of machine learning methods for geothermal exploration. GeoThermalCloud.jl includes: - site data - simulation scripts - jupyter notebooks - intermediate results - code outputs - summary figures - readme markdown files GeoThermalCloud.jl showcases the machine learning analyses performed for the following geothermal sites: - Brady: geothermal exploration of the Brady geothermal site, Nevada - SWNM: geothermal exploration of the Southwest New Mexico (SWNM) region - GreatBasin: geothermal exploration of the Great Basin region, Nevada Reports, research papers, and presentations summarizing these machine learning analyses are also available and will be posted soon.
Data from: From a Monolithic Big Data System to a Microservices Event-Driven...
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodrigo Laigner; Rodrigo Laigner (2020). From a Monolithic Big Data System to a Microservices Event-Driven Architecture [Dataset]. http://doi.org/10.5281/zenodo.3606316
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3606316
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rodrigo Laigner; Rodrigo Laigner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
[Context] Data-intensive systems, a.k.a. big data systems (BDS), are software systems that handle a large volume of data in the presence of performance quality attributes, such as scalability and availability. Before the advent of big data management systems (e.g. Cassandra) and frameworks (e.g. Spark), organizations had to cope with large data volumes with custom-tailored solutions. In particular, a decade ago, Tecgraf/PUC-Rio developed a system to monitor truck fleet in real-time and proactively detect events from the positioning data received. Over the years, the system evolved into a complex and large obsolescent code base involving a hard maintenance process. [Goal] We report our experience on replacing a legacy BDS with a microservice-based event-driven system. [Method] We applied action research, investigating the reasons that motivate the adoption of a microservice-based event-driven architecture, intervening to define the new architecture, and documenting the challenges and lessons learned. [Results] We perceived that the resulting architecture enabled easier maintenance and fault-isolation. However, the myriad of technologies and the complex data flow were perceived as drawbacks. Based on the challenges faced, we highlight opportunities to improve the design of big data reactive systems. [Conclusions] We believe that our experience provides helpful takeaways for practitioners modernizing systems with data-intensive requirements.

Global Floating Data Center Market Research Report: By Application (Cloud...

wiseguyreports.com

Updated Aug 10, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Floating Data Center Market Research Report: By Application (Cloud Computing, Data Storage, Big Data Analytics, Content Delivery), By Power Source (Renewable Energy, Nuclear Energy, Diesel Generator, Hybrid Systems), By Structure Type (Semisphere, Platform, Modular, Ship-Based), By Cooling Technology (Liquid Cooling, Air Cooling, Immersion Cooling) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/floating-data-center-market

Explore at:

Dataset updated

Aug 10, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Aug 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	3.5(USD Billion)
MARKET SIZE 2025	3.99(USD Billion)
MARKET SIZE 2035	15.0(USD Billion)
SEGMENTS COVERED	Application, Power Source, Structure Type, Cooling Technology, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	sustainability focus, rising energy costs, increasing data demand, climate adaptation solutions, technological advancements
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	NVIDIA, Equinix, Microsoft, Google, Oracle, Arm Holdings, Apple, Digital Realty, Amazon, Dell Technologies, Huawei, Hewlett Packard Enterprise, Alibaba, Intel, IBM
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Sustainable cooling solutions, Renewable energy integration, Disaster recovery support, Enhanced scalability options, Coastal data redundancy services
COMPOUND ANNUAL GROWTH RATE (CAGR)	14.2% (2025 - 2035)

D
Data Preparation Platform Report
datainsightsmarket.com
doc, pdf, ppt
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Preparation Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/data-preparation-platform-1368457
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Sep 20, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Preparation Platform market is poised for substantial growth, estimated to reach $15,600 million by the study's end in 2033, up from $6,000 million in the base year of 2025. This trajectory is fueled by a Compound Annual Growth Rate (CAGR) of approximately 12.5% over the forecast period. The proliferation of big data and the increasing need for clean, usable data across all business functions are primary drivers. Organizations are recognizing that effective data preparation is foundational to accurate analytics, informed decision-making, and successful AI/ML initiatives. This has led to a surge in demand for platforms that can automate and streamline the complex, time-consuming process of data cleansing, transformation, and enrichment. The market's expansion is further propelled by the growing adoption of cloud-based solutions, offering scalability, flexibility, and cost-efficiency, particularly for Small & Medium Enterprises (SMEs). Key trends shaping the Data Preparation Platform market include the integration of AI and machine learning for automated data profiling and anomaly detection, enhanced collaboration features to facilitate teamwork among data professionals, and a growing focus on data governance and compliance. While the market exhibits robust growth, certain restraints may temper its pace. These include the complexity of integrating data preparation tools with existing IT infrastructures, the shortage of skilled data professionals capable of leveraging advanced platform features, and concerns around data security and privacy. Despite these challenges, the market is expected to witness continuous innovation and strategic partnerships among leading companies like Microsoft, Tableau, and Alteryx, aiming to provide more comprehensive and user-friendly solutions to meet the evolving demands of a data-driven world. Here's a comprehensive report description on Data Preparation Platforms, incorporating the requested information, values, and structure:

Global Data Center Architecture Market Research Report: By Architecture Type...

wiseguyreports.com

Updated Aug 21, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Data Center Architecture Market Research Report: By Architecture Type (Modular Architecture, Traditional Architecture, Open Modular Architecture), By Component (Hardware, Software, Services), By Application (Cloud Computing, Big Data Analytics, Disaster Recovery), By End Use (BFSI, IT and Telecommunications, Healthcare, Retail) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/data-center-architecture-market

Explore at:

Dataset updated

Aug 21, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Aug 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	40.9(USD Billion)
MARKET SIZE 2025	43.7(USD Billion)
MARKET SIZE 2035	85.0(USD Billion)
SEGMENTS COVERED	Architecture Type, Component, Application, End Use, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	scalability and flexibility, cost efficiency, energy efficiency, advanced technologies adoption, regulatory compliance
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	NVIDIA, Equinix, Arista Networks, Microsoft, Cisco Systems, Google, Alibaba Group, Oracle, Lenovo, SAP, Digital Realty, Dell Technologies, Amazon, Hewlett Packard Enterprise, Intel, IBM
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Edge computing expansion, Hybrid cloud integration, Increasing demand for automation, Sustainable energy initiatives, AI and machine learning adoption
COMPOUND ANNUAL GROWTH RATE (CAGR)	6.9% (2025 - 2035)

Maintenance costs, ML and big data
kaggle.com
zip
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mario Jesenia (2025). Maintenance costs, ML and big data [Dataset]. https://www.kaggle.com/datasets/mariojesenia/maintenance-costs-ml-and-big-data
Explore at:
zip(7809 bytes)Available download formats
Dataset updated
May 16, 2025
Authors
Mario Jesenia
Description
This dataset provides panel data from 82 industrial organizations, each observed consistently over a 5-year period (2019–2023). The dataset is designed to support analysis of how machine learning (ML) and big data technologies are integrated into smart maintenance operations across different industrial sectors. Each organization is uniquely identified and assigned a fixed organizational structure—either centralized, semi-centralized, or decentralized—that remains constant across time.

The dataset includes the following variables:

maintenance_cost_reduction: annual percentage change in maintenance costs, representing efficiency gains or losses.

model_prediction_accuracy: the accuracy (%) of predictive ML models in forecasting failures.

failure_risk_score: a normalized score (0–1) indicating the predicted risk of system failure.

model_training_frequency: the regularity of model retraining (monthly, quarterly, yearly).

data_pipeline_latency_hr: average latency (in hours) in processing and transmitting maintenance data through digital pipelines.

bigdata_storage_utilization_percent: percentage utilization of the organization’s big data storage infrastructure.

The organizations represented in this dataset operate across advanced industrial sectors such as manufacturing, transportation, utilities, energy, and aerospace logistics. Geographically, the entities are based in the United States, Germany, South Korea, Japan, and the Netherlands, countries recognized for their leadership in AI integration, industrial analytics, and data-driven operations.

Data was gathered through structured interviews with IT specialists, plant maintenance managers, and operational analytics teams. The data design reflects realistic organizational behaviors and technological performance patterns, making it well-suited for research on predictive maintenance, digital infrastructure readiness, and performance benchmarking in smart manufacturing.
MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data...
springernature.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiangming Sun; Nina Jeliazkova; Vladimir Chupakin; Jose-Felipe Golib-Dzib; Ola Engkvist; Lars Carlsson; Jรถrg Wegner; Hugo Ceulemans; Ivan Georgiev; Vedrin Jeliazkov; Nikolay Kochev; Thomas Ashby; Hongming Chen (2023). MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics [Dataset]. http://doi.org/10.6084/m9.figshare.c.3711712_D2.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3711712_D2.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jiangming Sun; Nina Jeliazkova; Vladimir Chupakin; Jose-Felipe Golib-Dzib; Ola Engkvist; Lars Carlsson; Jรถrg Wegner; Hugo Ceulemans; Ivan Georgiev; Vedrin Jeliazkov; Nikolay Kochev; Thomas Ashby; Hongming Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2: Table S1. The list of selected activity types in the PubChem.
Big Data Driven Architecture for Real Time Systemwide Safety Assurance,...
data.nasa.gov
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Big Data Driven Architecture for Real Time Systemwide Safety Assurance, Phase I [Dataset]. https://data.nasa.gov/dataset/Big-Data-Driven-Architecture-for-Real-Time-Systemw/muws-ntpg
Explore at:
csv, application/rssxml, json, tsv, xml, application/rdfxmlAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
NASA has the aim of researching aviation Real-time System-wide Safety Assurance (RSSA) with a focus on the development of prognostic decision support tools as one of its new aeronautics research pillars. The vision of RSSA is to accelerate the discovery of previously unknown safety threats in real time and enable rapid mitigation of safety risks through analysis of massive amounts of aviation data. Our innovation supports this vision by designing a hybrid architecture combining traditional database technology and real-time streaming analytics in a Big Data environment. The innovation includes three major components: a Batch Processing framework, Traditional Databases and Streaming Analytics. It addresses at least three major needs within the aviation safety community. First, the innovation supports the creation of future data-driven safety prognostic decision support tools that must pull data from heterogeneous data sources and seamlessly combine them to be effective for NAS stakeholders. Second, our innovation opens up the possibility to provide real-time NAS performance analytics desired by key aviation stakeholders. Third, our proposed architecture provides a mechanism for safety risk accuracy evaluations. To accomplish this innovation, we have three technical objectives and related work plan efforts. The first objective is the determination of the system and functional requirements. We identify the system and functional requirements from aviation safety stakeholders for a set of use cases by investigating how they would use the system and what data processing functions they need to support their decisions. The second objective is to create a Big Data technology-driven architecture. Here we explore and identify the best technologies for the components in the system including Big Data processing and architectural techniques adapted for aviation data applications. Finally, our third objective is the development and demonstration of a proof-of-concept.
M
Data Lake Market to Surpassing USD 90 billion by 2032
scoop.market.us
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market.us Scoop (2024). Data Lake Market to Surpassing USD 90 billion by 2032 [Dataset]. https://scoop.market.us/data-lake-market-news/
Explore at:
Dataset updated
Jul 23, 2024
Dataset authored and provided by
Market.us Scoop
License
https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

The global Data Lake Market is projected to witness substantial growth, reaching approximately USD 90 billion by 2032, marking a significant increase from its 2022 value of USD 16.6 billion. This growth trajectory is expected to unfold steadily, with a Compound Annual Growth Rate (CAGR) of 21.3% from 2023 to 2032.

A Data Lake is a centralized repository designed to store, process, and secure large volumes of structured and unstructured data from multiple sources. It allows for the storage of data in its natural format, without the need to first structure it, making it a flexible option for big data and real-time analytics. Data Lakes support the analysis of data through various methods, including machine learning, predictive analytics, data discovery, and profiling.

The Data Lake market is experiencing rapid growth, driven by the increasing volume of data generated by businesses, the need for advanced analytics to understand customer behavior, and the adoption of cloud computing. Companies are investing in Data Lake solutions to gain insights that can improve decision-making, enhance operational efficiency, and create personalized customer experiences. The market is also seeing innovation in terms of security, data management, and integration capabilities, enabling more robust and scalable data ecosystems. As organizations continue to recognize the value of data-driven strategies, the demand for Data Lake technologies is expected to rise, marking a significant trend in the data management landscape
https://market.us/wp-content/uploads/2023/09/Global-Data-Lake-Market-1024x616.jpg" alt="Global Data Lake Market" class="wp-image-106106">
d
Data from: Structure-from-motion point cloud of Mud Creek, Big Sur,...
catalog.data.gov
data.usgs.gov
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Structure-from-motion point cloud of Mud Creek, Big Sur, California, 1967-10-18 [Dataset]. https://catalog.data.gov/dataset/structure-from-motion-point-cloud-of-mud-creek-big-sur-california-1967-10-18
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Big Sur, California
Description
Presented here is a point cloud produced by the U.S. Geological Survey (USGS) from historical U.S. Air Force vertical aerial imagery, covering the area of the Mud Creek landslide on California State Route 1 (SR1), Mud Creek, Big Sur, California. The point cloud is referenced to previously published lidar data and contains RGB information as well as XYZ. Point cloud coordinates are in NAD83 UTM Zone 10 meters. Imagery was downloaded from USGS Eros Data Center and processed using structure-from-motion photogrammetry with Agisoft PhotoScan version 1.2.8 through 1.3.2. Point clouds were clipped to an AOI using LASTools. The AOI was created from a KMZ in Google Earth and transformed to a shapefile using ArcMap 10.5.
Data Wrangling Market Size, Share, Growth, Forecast, By Component...
verifiedmarketresearch.com
Updated Jun 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2025). Data Wrangling Market Size, Share, Growth, Forecast, By Component (Solutions, Services), By Deployment Mode (On-premises, Cloud-based), By End-user Industry (Banking, Financial Services, and Insurance (BFSI), Healthcare & Life Sciences, Retail & E-commerce, IT & Telecom, Government & Public Sector, Manufacturing) [Dataset]. https://www.verifiedmarketresearch.com/product/data-wrangling-market/
Explore at:
Dataset updated
Jun 18, 2025
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2026 - 2032
Area covered
Global
Description
Data Wrangling Market size was valued at USD 1.99 Billion in 2024 and is projected to reach USD 4.07 Billion by 2032, growing at a CAGR of 9.4% during the forecast period 2026-2032.• Big Data Analytics Growth: Organizations are generating massive volumes of unstructured and semi-structured data from diverse sources including social media, IoT devices, and digital transactions. Data wrangling tools become essential for cleaning, transforming, and preparing this complex data for meaningful analytics and business intelligence applications.• Machine Learning and AI Adoption: The rapid expansion of artificial intelligence and machine learning initiatives requires high-quality, properly formatted training datasets. Data wrangling solutions enable data scientists to efficiently prepare, clean, and structure raw data for model training, driving sustained market demand across AI-focused organizations.
f
fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf
frontiersin.figshare.com
pdf
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Werner (2023). fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf [Dataset]. http://doi.org/10.3389/fdata.2019.00044.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00044.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Martin Werner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper provides an abstract analysis of parallel processing strategies for spatial and spatio-temporal data. It isolates aspects such as data locality and computational locality as well as redundancy and locally sequential access as central elements of parallel algorithm design for spatial data. Furthermore, the paper gives some examples from simple and advanced GIS and spatial data analysis highlighting both that big data systems have been around long before the current hype of big data and that they follow some design principles which are inevitable for spatial data including distributed data structures and messaging, which are, however, incompatible with the popular MapReduce paradigm. Throughout this discussion, the need for a replacement or extension of the MapReduce paradigm for spatial data is derived. This paradigm should be able to deal with the imperfect data locality inherent to spatial data hindering full independence of non-trivial computational tasks. We conclude that more research is needed and that spatial big data systems should pick up more concepts like graphs, shortest paths, raster data, events, and streams at the same time instead of solving exactly the set of spatially separable problems such as line simplifications or range queries in manydifferent ways.

Global Object-Oriented Database Software Market Research Report: By...

wiseguyreports.com

Updated Sep 15, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Object-Oriented Database Software Market Research Report: By Application (Data Management, Software Development, Business Intelligence, Web Development), By Deployment Model (On-Premises, Cloud-Based, Hybrid), By End User (Large Enterprises, Small and Medium Enterprises, Government Organizations, Educational Institutions), By Industry Vertical (Information Technology, Healthcare, Finance, Telecommunications) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/object-oriented-databas-software-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	4.49(USD Billion)
MARKET SIZE 2025	4.72(USD Billion)
MARKET SIZE 2035	7.8(USD Billion)
SEGMENTS COVERED	Application, Deployment Model, End User, Industry Vertical, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	increased data complexity, demand for scalability, integration with IoT, rising big data applications, need for real-time processing
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	IBM, Redis, Objectivity, Oracle, Neo4j, InterSystems, SAP, SQLite, Microsoft, Versant, Cassandra, MongoDB, MarkLogic, BaseX, Couchbase, PostgresXL
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Rising demand for real-time analytics, Integration with IoT applications, Increased adoption of cloud-based solutions, Growing need for big data management, Enhanced support for complex data structures
COMPOUND ANNUAL GROWTH RATE (CAGR)	5.1% (2025 - 2035)

Facebook

Twitter

Click to copy link

Link copied

Cite

Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure [Dataset]. http://doi.org/10.6084/m9.figshare.25594361.v1

Data from: Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure

Explore at:

tarAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.25594361.v1

Dataset updated

Jun 11, 2024

Dataset provided by

Taylor & Francis

Authors

Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Big data, with N × P dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order P*N(N−1)2. To circumvent this problem, typically the clustering technique is applied to a random sample drawn from the dataset; however, a weakness is that the structure of the dataset, particularly at the edges, is not necessarily maintained. We propose a new solution through the concept of “data nuggets”, which reduces a large dataset into a small collection of nuggets of data, each containing a center, weight, and scale parameter. The data nuggets are then input into algorithms that compute methods such as principal components analysis and clustering in a more computationally efficient manner. We show the consistency of the data nuggets based covariance estimator and apply the methodology of data nuggets to perform exploratory analysis of a flow cytometry dataset containing over one million observations using PCA and K-means clustering for weighted observations. Supplementary materials for this article are available online.

Clear search

Close search

Google apps

Main menu

Data from: Data Nuggets: A Method for Reducing Big Data While Preserving...

Big Data Technology and Service Report

Introduction to Hadoop and Hadoop Architecture

Predicting social capital by mining Facebook data

Data Architecture Modernization Report

Data Set Big Data Analytics in Business for Marketing Research (2011-2024)

International Journal of Engineering and Advanced Technology Acceptance Rate...

Data from: GeoThermalCloud framework for fusion of big data and...

Data from: From a Monolithic Big Data System to a Microservices Event-Driven...

Global Floating Data Center Market Research Report: By Application (Cloud...

Data Preparation Platform Report

Global Data Center Architecture Market Research Report: By Architecture Type...

Maintenance costs, ML and big data

MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data...

Big Data Driven Architecture for Real Time Systemwide Safety Assurance,...

Data Lake Market to Surpassing USD 90 billion by 2032

Introduction

Data from: Structure-from-motion point cloud of Mud Creek, Big Sur,...

Data Wrangling Market Size, Share, Growth, Forecast, By Component...

fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf

Global Object-Oriented Database Software Market Research Report: By...

Data from: Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure