100+ datasets found
  1. f

    Data from: Data Nuggets: A Method for Reducing Big Data While Preserving...

    • tandf.figshare.com
    tar
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure [Dataset]. http://doi.org/10.6084/m9.figshare.25594361.v1
    Explore at:
    tarAvailable download formats
    Dataset updated
    Jun 11, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Big data, with N × P dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order P*N(N−1)2. To circumvent this problem, typically the clustering technique is applied to a random sample drawn from the dataset; however, a weakness is that the structure of the dataset, particularly at the edges, is not necessarily maintained. We propose a new solution through the concept of “data nuggets”, which reduces a large dataset into a small collection of nuggets of data, each containing a center, weight, and scale parameter. The data nuggets are then input into algorithms that compute methods such as principal components analysis and clustering in a more computationally efficient manner. We show the consistency of the data nuggets based covariance estimator and apply the methodology of data nuggets to perform exploratory analysis of a flow cytometry dataset containing over one million observations using PCA and K-means clustering for weighted observations. Supplementary materials for this article are available online.

  2. B

    Big Data Technology and Service Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Aug 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Big Data Technology and Service Report [Dataset]. https://www.archivemarketresearch.com/reports/big-data-technology-and-service-557036
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Aug 28, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data Technology and Services market is experiencing robust growth, projected to reach a market size of $38.28 billion in 2025. While the provided CAGR is missing, considering the rapid adoption of big data solutions across various industries and the continuous innovation in areas like AI and machine learning, a conservative estimate of a 15% CAGR for the forecast period (2025-2033) seems plausible. This would translate to significant market expansion over the next decade. Key drivers include the increasing volume of data generated by businesses and individuals, the need for improved data analytics capabilities for better decision-making, and the growing adoption of cloud-based big data solutions. Furthermore, the rising demand for real-time data processing and insights across sectors like finance, healthcare, and retail fuels market growth. While data security and privacy concerns represent a restraint, the development of robust security protocols and regulatory frameworks is mitigating this risk. The market is segmented across various technologies (e.g., Hadoop, NoSQL databases, data warehousing), services (e.g., data integration, data analytics, consulting), and deployment models (cloud, on-premise). Leading players like IBM, Microsoft, and others are constantly innovating and expanding their offerings, fostering competition and driving market evolution. The market's growth is further propelled by trends such as the increasing adoption of advanced analytics techniques, the integration of big data with IoT (Internet of Things) devices, and the rising demand for specialized big data skills. The diverse applications of big data across various sectors ensure sustained growth, creating opportunities for both established players and emerging startups. The competitive landscape is characterized by a mix of large technology vendors and specialized service providers, with ongoing mergers and acquisitions shaping the market structure. Continued investment in research and development in areas like data visualization and predictive analytics will be crucial for maintaining the market's momentum. Geographical expansion into developing economies presents further growth opportunities. The predicted CAGR and market size reflect a strong growth trajectory, making it an attractive investment opportunity for stakeholders.

  3. e

    Introduction to Hadoop and Hadoop Architecture

    • paper.erudition.co.in
    html
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2025). Introduction to Hadoop and Hadoop Architecture [Dataset]. https://paper.erudition.co.in/makaut/btech-in-electronics-and-instrumentation-engineering/8/big-data-analysis
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 3, 2025
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Introduction to Hadoop and Hadoop Architecture of Big Data Analysis, 8th Semester , Applied Electronics and Instrumentation Engineering

  4. Predicting social capital by mining Facebook data

    • figshare.com
    pdf
    Updated Dec 31, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucia Chen (2015). Predicting social capital by mining Facebook data [Dataset]. http://doi.org/10.6084/m9.figshare.1448769.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Dec 31, 2015
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Lucia Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Psychology still relies often on questionnaires to gather data. Despite ubiquitous computing and internet, these are often still conducted with paper and clipboard. Where the internet is used, standalone questionnaires from bulk providers like Survey Monkey are the norm. For our studies related to social networks on online disclosure, we have developed a custom site for online questionnaires, designed to engage participants and allow linking of data from one study to the next – PsyQu.com PsyQu is a modern website developed around a database and as such has a ‘schema’. This data structure encapsulates the project, researcher(s) and participant(s) in a manner that allows for participants to link multiple attempts at multiple studies under their single account. This will allow cross-linked and longitudinal studies to be performed. By moving beyond standalone questionnaires, we hope to discover new correlative and predictive patterns between online behavior and other psychological dimensions. At present, the site is in alpha testing mode with only 1 group of researchers and 3 studies: social capital, online self-disclosure and personality. In the social capital study, we used a standard scale for investigating online social capital and social trust, in an attempt to find out differences between various groups. A paper survey was also conducted in order to compare with the online survey since there has been debate on the reliability of online participation. We will present the website, initial results of the social capital study.

  5. D

    Data Architecture Modernization Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Architecture Modernization Report [Dataset]. https://www.marketresearchforecast.com/reports/data-architecture-modernization-537734
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Aug 10, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Architecture Modernization market is experiencing robust growth, driven by the increasing need for businesses to adapt to the ever-evolving digital landscape. The market's expansion is fueled by several key factors, including the rising adoption of cloud computing, the exponential growth of data volumes, and the imperative for enhanced data security and compliance. Organizations are increasingly recognizing the strategic value of modernizing their data architectures to improve agility, scalability, and overall efficiency. This modernization involves migrating legacy systems to cloud-based platforms, adopting advanced analytics tools, and implementing robust data governance frameworks. The shift towards real-time data processing and the increasing demand for data-driven decision-making are further accelerating market growth. Competition is fierce, with established players like NTT DATA and Rackspace competing alongside specialized analytics firms and cloud providers. The market is segmented based on deployment type (cloud, on-premise, hybrid), organization size (small, medium, large enterprises), and industry vertical (BFSI, healthcare, retail, etc.). While the precise market size is unavailable, reasonable estimates suggest a substantial and rapidly growing market exceeding $10 billion by 2025, exhibiting a Compound Annual Growth Rate (CAGR) of approximately 15% over the forecast period (2025-2033). Despite the rapid growth, challenges remain. These include the complexities of migrating legacy systems, the need for skilled professionals experienced in modern data architectures, and the high initial investment costs associated with modernization projects. Data security and privacy concerns also pose significant hurdles. However, the long-term benefits of improved data management, enhanced operational efficiency, and the ability to gain valuable insights from data are expected to outweigh these challenges, driving continued market growth. The market’s future hinges on ongoing technological innovation, the increasing affordability of cloud-based solutions, and the growing awareness among businesses of the importance of data-driven decision-making.

  6. d

    Data Set Big Data Analytics in Business for Marketing Research (2011-2024)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DJAKARIYA, CHOIRON; Tang Herman, Robertus (2025). Data Set Big Data Analytics in Business for Marketing Research (2011-2024) [Dataset]. http://doi.org/10.7910/DVN/YUIK4L
    Explore at:
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    DJAKARIYA, CHOIRON; Tang Herman, Robertus
    Time period covered
    Jan 1, 2011 - Jan 1, 2024
    Description

    The research & publication Big Data Analytics in Business for Marketing Research: A Retrospective of Domain and Knowledge Structure, which was indexed by Scopus between 2011 to 2024. The data contains 448 documents data: authors, authors ID Sggggg, title, year, source title, volume, issue, article number in Scopus DOJ, link, affiliation, abstract, index keywords, references, corespondence address, editors, publisher, conference name, conference date, conference code, ISSN. language, document type, access type, and EID

  7. r

    International Journal of Engineering and Advanced Technology Acceptance Rate...

    • researchhelpdesk.org
    Updated May 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Engineering and Advanced Technology Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/552/international-journal-of-engineering-and-advanced-technology
    Explore at:
    Dataset updated
    May 1, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Engineering and Advanced Technology Acceptance Rate - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level

  8. d

    Data from: GeoThermalCloud framework for fusion of big data and...

    • catalog.data.gov
    • gdr.openei.org
    • +2more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Los Alamos National Laboratory (2025). GeoThermalCloud framework for fusion of big data and multi-physics models in Nevada and Southwest New Mexico [Dataset]. https://catalog.data.gov/dataset/geothermalcloud-framework-for-fusion-of-big-data-and-multi-physics-models-in-nevada-and-so-31a4e
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Los Alamos National Laboratory
    Area covered
    New Mexico
    Description

    Our GeoThermalCloud framework is designed to process geothermal datasets using a novel toolbox for unsupervised and physics-informed machine learning called SmartTensors. More information about GeoThermalCloud can be found at the GeoThermalCloud GitHub Repository. More information about SmartTensors can be found at the SmartTensors Github Repository and the SmartTensors page at LANL.gov. Links to these pages are included in this submission. GeoThermalCloud.jl is a repository containing all the data and codes required to demonstrate applications of machine learning methods for geothermal exploration. GeoThermalCloud.jl includes: - site data - simulation scripts - jupyter notebooks - intermediate results - code outputs - summary figures - readme markdown files GeoThermalCloud.jl showcases the machine learning analyses performed for the following geothermal sites: - Brady: geothermal exploration of the Brady geothermal site, Nevada - SWNM: geothermal exploration of the Southwest New Mexico (SWNM) region - GreatBasin: geothermal exploration of the Great Basin region, Nevada Reports, research papers, and presentations summarizing these machine learning analyses are also available and will be posted soon.

  9. Data from: From a Monolithic Big Data System to a Microservices Event-Driven...

    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Laigner; Rodrigo Laigner (2020). From a Monolithic Big Data System to a Microservices Event-Driven Architecture [Dataset]. http://doi.org/10.5281/zenodo.3606316
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rodrigo Laigner; Rodrigo Laigner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    [Context] Data-intensive systems, a.k.a. big data systems (BDS), are software systems that handle a large volume of data in the presence of performance quality attributes, such as scalability and availability. Before the advent of big data management systems (e.g. Cassandra) and frameworks (e.g. Spark), organizations had to cope with large data volumes with custom-tailored solutions. In particular, a decade ago, Tecgraf/PUC-Rio developed a system to monitor truck fleet in real-time and proactively detect events from the positioning data received. Over the years, the system evolved into a complex and large obsolescent code base involving a hard maintenance process. [Goal] We report our experience on replacing a legacy BDS with a microservice-based event-driven system. [Method] We applied action research, investigating the reasons that motivate the adoption of a microservice-based event-driven architecture, intervening to define the new architecture, and documenting the challenges and lessons learned. [Results] We perceived that the resulting architecture enabled easier maintenance and fault-isolation. However, the myriad of technologies and the complex data flow were perceived as drawbacks. Based on the challenges faced, we highlight opportunities to improve the design of big data reactive systems. [Conclusions] We believe that our experience provides helpful takeaways for practitioners modernizing systems with data-intensive requirements.

  10. w

    Global Floating Data Center Market Research Report: By Application (Cloud...

    • wiseguyreports.com
    Updated Aug 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Floating Data Center Market Research Report: By Application (Cloud Computing, Data Storage, Big Data Analytics, Content Delivery), By Power Source (Renewable Energy, Nuclear Energy, Diesel Generator, Hybrid Systems), By Structure Type (Semisphere, Platform, Modular, Ship-Based), By Cooling Technology (Liquid Cooling, Air Cooling, Immersion Cooling) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/floating-data-center-market
    Explore at:
    Dataset updated
    Aug 10, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20243.5(USD Billion)
    MARKET SIZE 20253.99(USD Billion)
    MARKET SIZE 203515.0(USD Billion)
    SEGMENTS COVEREDApplication, Power Source, Structure Type, Cooling Technology, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSsustainability focus, rising energy costs, increasing data demand, climate adaptation solutions, technological advancements
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDNVIDIA, Equinix, Microsoft, Google, Oracle, Arm Holdings, Apple, Digital Realty, Amazon, Dell Technologies, Huawei, Hewlett Packard Enterprise, Alibaba, Intel, IBM
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESSustainable cooling solutions, Renewable energy integration, Disaster recovery support, Enhanced scalability options, Coastal data redundancy services
    COMPOUND ANNUAL GROWTH RATE (CAGR) 14.2% (2025 - 2035)
  11. D

    Data Preparation Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Preparation Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/data-preparation-platform-1368457
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Sep 20, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Data Preparation Platform market is poised for substantial growth, estimated to reach $15,600 million by the study's end in 2033, up from $6,000 million in the base year of 2025. This trajectory is fueled by a Compound Annual Growth Rate (CAGR) of approximately 12.5% over the forecast period. The proliferation of big data and the increasing need for clean, usable data across all business functions are primary drivers. Organizations are recognizing that effective data preparation is foundational to accurate analytics, informed decision-making, and successful AI/ML initiatives. This has led to a surge in demand for platforms that can automate and streamline the complex, time-consuming process of data cleansing, transformation, and enrichment. The market's expansion is further propelled by the growing adoption of cloud-based solutions, offering scalability, flexibility, and cost-efficiency, particularly for Small & Medium Enterprises (SMEs). Key trends shaping the Data Preparation Platform market include the integration of AI and machine learning for automated data profiling and anomaly detection, enhanced collaboration features to facilitate teamwork among data professionals, and a growing focus on data governance and compliance. While the market exhibits robust growth, certain restraints may temper its pace. These include the complexity of integrating data preparation tools with existing IT infrastructures, the shortage of skilled data professionals capable of leveraging advanced platform features, and concerns around data security and privacy. Despite these challenges, the market is expected to witness continuous innovation and strategic partnerships among leading companies like Microsoft, Tableau, and Alteryx, aiming to provide more comprehensive and user-friendly solutions to meet the evolving demands of a data-driven world. Here's a comprehensive report description on Data Preparation Platforms, incorporating the requested information, values, and structure:

  12. w

    Global Data Center Architecture Market Research Report: By Architecture Type...

    • wiseguyreports.com
    Updated Aug 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Data Center Architecture Market Research Report: By Architecture Type (Modular Architecture, Traditional Architecture, Open Modular Architecture), By Component (Hardware, Software, Services), By Application (Cloud Computing, Big Data Analytics, Disaster Recovery), By End Use (BFSI, IT and Telecommunications, Healthcare, Retail) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/data-center-architecture-market
    Explore at:
    Dataset updated
    Aug 21, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202440.9(USD Billion)
    MARKET SIZE 202543.7(USD Billion)
    MARKET SIZE 203585.0(USD Billion)
    SEGMENTS COVEREDArchitecture Type, Component, Application, End Use, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSscalability and flexibility, cost efficiency, energy efficiency, advanced technologies adoption, regulatory compliance
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDNVIDIA, Equinix, Arista Networks, Microsoft, Cisco Systems, Google, Alibaba Group, Oracle, Lenovo, SAP, Digital Realty, Dell Technologies, Amazon, Hewlett Packard Enterprise, Intel, IBM
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESEdge computing expansion, Hybrid cloud integration, Increasing demand for automation, Sustainable energy initiatives, AI and machine learning adoption
    COMPOUND ANNUAL GROWTH RATE (CAGR) 6.9% (2025 - 2035)
  13. Maintenance costs, ML and big data

    • kaggle.com
    zip
    Updated May 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Jesenia (2025). Maintenance costs, ML and big data [Dataset]. https://www.kaggle.com/datasets/mariojesenia/maintenance-costs-ml-and-big-data
    Explore at:
    zip(7809 bytes)Available download formats
    Dataset updated
    May 16, 2025
    Authors
    Mario Jesenia
    Description

    This dataset provides panel data from 82 industrial organizations, each observed consistently over a 5-year period (2019–2023). The dataset is designed to support analysis of how machine learning (ML) and big data technologies are integrated into smart maintenance operations across different industrial sectors. Each organization is uniquely identified and assigned a fixed organizational structure—either centralized, semi-centralized, or decentralized—that remains constant across time.

    The dataset includes the following variables:

    • maintenance_cost_reduction: annual percentage change in maintenance costs, representing efficiency gains or losses.
    • model_prediction_accuracy: the accuracy (%) of predictive ML models in forecasting failures.
    • failure_risk_score: a normalized score (0–1) indicating the predicted risk of system failure.
    • model_training_frequency: the regularity of model retraining (monthly, quarterly, yearly).
    • data_pipeline_latency_hr: average latency (in hours) in processing and transmitting maintenance data through digital pipelines.
    • bigdata_storage_utilization_percent: percentage utilization of the organization’s big data storage infrastructure.

    The organizations represented in this dataset operate across advanced industrial sectors such as manufacturing, transportation, utilities, energy, and aerospace logistics. Geographically, the entities are based in the United States, Germany, South Korea, Japan, and the Netherlands, countries recognized for their leadership in AI integration, industrial analytics, and data-driven operations.

    Data was gathered through structured interviews with IT specialists, plant maintenance managers, and operational analytics teams. The data design reflects realistic organizational behaviors and technological performance patterns, making it well-suited for research on predictive maintenance, digital infrastructure readiness, and performance benchmarking in smart manufacturing.

  14. MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data...

    • springernature.figshare.com
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiangming Sun; Nina Jeliazkova; Vladimir Chupakin; Jose-Felipe Golib-Dzib; Ola Engkvist; Lars Carlsson; Jรถrg Wegner; Hugo Ceulemans; Ivan Georgiev; Vedrin Jeliazkov; Nikolay Kochev; Thomas Ashby; Hongming Chen (2023). MOESM2 of ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics [Dataset]. http://doi.org/10.6084/m9.figshare.c.3711712_D2.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jiangming Sun; Nina Jeliazkova; Vladimir Chupakin; Jose-Felipe Golib-Dzib; Ola Engkvist; Lars Carlsson; Jรถrg Wegner; Hugo Ceulemans; Ivan Georgiev; Vedrin Jeliazkov; Nikolay Kochev; Thomas Ashby; Hongming Chen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 2: Table S1. The list of selected activity types in the PubChem.

  15. Big Data Driven Architecture for Real Time Systemwide Safety Assurance,...

    • data.nasa.gov
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Big Data Driven Architecture for Real Time Systemwide Safety Assurance, Phase I [Dataset]. https://data.nasa.gov/dataset/Big-Data-Driven-Architecture-for-Real-Time-Systemw/muws-ntpg
    Explore at:
    csv, application/rssxml, json, tsv, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    NASA has the aim of researching aviation Real-time System-wide Safety Assurance (RSSA) with a focus on the development of prognostic decision support tools as one of its new aeronautics research pillars. The vision of RSSA is to accelerate the discovery of previously unknown safety threats in real time and enable rapid mitigation of safety risks through analysis of massive amounts of aviation data. Our innovation supports this vision by designing a hybrid architecture combining traditional database technology and real-time streaming analytics in a Big Data environment. The innovation includes three major components: a Batch Processing framework, Traditional Databases and Streaming Analytics. It addresses at least three major needs within the aviation safety community. First, the innovation supports the creation of future data-driven safety prognostic decision support tools that must pull data from heterogeneous data sources and seamlessly combine them to be effective for NAS stakeholders. Second, our innovation opens up the possibility to provide real-time NAS performance analytics desired by key aviation stakeholders. Third, our proposed architecture provides a mechanism for safety risk accuracy evaluations. To accomplish this innovation, we have three technical objectives and related work plan efforts. The first objective is the determination of the system and functional requirements. We identify the system and functional requirements from aviation safety stakeholders for a set of use cases by investigating how they would use the system and what data processing functions they need to support their decisions. The second objective is to create a Big Data technology-driven architecture. Here we explore and identify the best technologies for the components in the system including Big Data processing and architectural techniques adapted for aviation data applications. Finally, our third objective is the development and demonstration of a proof-of-concept.

  16. M

    Data Lake Market to Surpassing USD 90 billion by 2032

    • scoop.market.us
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.us Scoop (2024). Data Lake Market to Surpassing USD 90 billion by 2032 [Dataset]. https://scoop.market.us/data-lake-market-news/
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset authored and provided by
    Market.us Scoop
    License

    https://scoop.market.us/privacy-policyhttps://scoop.market.us/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Global
    Description

    Introduction

    The global Data Lake Market is projected to witness substantial growth, reaching approximately USD 90 billion by 2032, marking a significant increase from its 2022 value of USD 16.6 billion. This growth trajectory is expected to unfold steadily, with a Compound Annual Growth Rate (CAGR) of 21.3% from 2023 to 2032.

    A Data Lake is a centralized repository designed to store, process, and secure large volumes of structured and unstructured data from multiple sources. It allows for the storage of data in its natural format, without the need to first structure it, making it a flexible option for big data and real-time analytics. Data Lakes support the analysis of data through various methods, including machine learning, predictive analytics, data discovery, and profiling.

    The Data Lake market is experiencing rapid growth, driven by the increasing volume of data generated by businesses, the need for advanced analytics to understand customer behavior, and the adoption of cloud computing. Companies are investing in Data Lake solutions to gain insights that can improve decision-making, enhance operational efficiency, and create personalized customer experiences. The market is also seeing innovation in terms of security, data management, and integration capabilities, enabling more robust and scalable data ecosystems. As organizations continue to recognize the value of data-driven strategies, the demand for Data Lake technologies is expected to rise, marking a significant trend in the data management landscape

    https://market.us/wp-content/uploads/2023/09/Global-Data-Lake-Market-1024x616.jpg" alt="Global Data Lake Market" class="wp-image-106106">
  17. d

    Data from: Structure-from-motion point cloud of Mud Creek, Big Sur,...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Structure-from-motion point cloud of Mud Creek, Big Sur, California, 1967-10-18 [Dataset]. https://catalog.data.gov/dataset/structure-from-motion-point-cloud-of-mud-creek-big-sur-california-1967-10-18
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Big Sur, California
    Description

    Presented here is a point cloud produced by the U.S. Geological Survey (USGS) from historical U.S. Air Force vertical aerial imagery, covering the area of the Mud Creek landslide on California State Route 1 (SR1), Mud Creek, Big Sur, California. The point cloud is referenced to previously published lidar data and contains RGB information as well as XYZ. Point cloud coordinates are in NAD83 UTM Zone 10 meters. Imagery was downloaded from USGS Eros Data Center and processed using structure-from-motion photogrammetry with Agisoft PhotoScan version 1.2.8 through 1.3.2. Point clouds were clipped to an AOI using LASTools. The AOI was created from a KMZ in Google Earth and transformed to a shapefile using ArcMap 10.5.

  18. Data Wrangling Market Size, Share, Growth, Forecast, By Component...

    • verifiedmarketresearch.com
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2025). Data Wrangling Market Size, Share, Growth, Forecast, By Component (Solutions, Services), By Deployment Mode (On-premises, Cloud-based), By End-user Industry (Banking, Financial Services, and Insurance (BFSI), Healthcare & Life Sciences, Retail & E-commerce, IT & Telecom, Government & Public Sector, Manufacturing) [Dataset]. https://www.verifiedmarketresearch.com/product/data-wrangling-market/
    Explore at:
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Data Wrangling Market size was valued at USD 1.99 Billion in 2024 and is projected to reach USD 4.07 Billion by 2032, growing at a CAGR of 9.4% during the forecast period 2026-2032.• Big Data Analytics Growth: Organizations are generating massive volumes of unstructured and semi-structured data from diverse sources including social media, IoT devices, and digital transactions. Data wrangling tools become essential for cleaning, transforming, and preparing this complex data for meaningful analytics and business intelligence applications.• Machine Learning and AI Adoption: The rapid expansion of artificial intelligence and machine learning initiatives requires high-quality, properly formatted training datasets. Data wrangling solutions enable data scientists to efficiently prepare, clean, and structure raw data for model training, driving sustained market demand across AI-focused organizations.

  19. f

    fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf

    • frontiersin.figshare.com
    pdf
    Updated Jun 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Werner (2023). fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf [Dataset]. http://doi.org/10.3389/fdata.2019.00044.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Martin Werner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper provides an abstract analysis of parallel processing strategies for spatial and spatio-temporal data. It isolates aspects such as data locality and computational locality as well as redundancy and locally sequential access as central elements of parallel algorithm design for spatial data. Furthermore, the paper gives some examples from simple and advanced GIS and spatial data analysis highlighting both that big data systems have been around long before the current hype of big data and that they follow some design principles which are inevitable for spatial data including distributed data structures and messaging, which are, however, incompatible with the popular MapReduce paradigm. Throughout this discussion, the need for a replacement or extension of the MapReduce paradigm for spatial data is derived. This paradigm should be able to deal with the imperfect data locality inherent to spatial data hindering full independence of non-trivial computational tasks. We conclude that more research is needed and that spatial big data systems should pick up more concepts like graphs, shortest paths, raster data, events, and streams at the same time instead of solving exactly the set of spatially separable problems such as line simplifications or range queries in manydifferent ways.

  20. w

    Global Object-Oriented Database Software Market Research Report: By...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Object-Oriented Database Software Market Research Report: By Application (Data Management, Software Development, Business Intelligence, Web Development), By Deployment Model (On-Premises, Cloud-Based, Hybrid), By End User (Large Enterprises, Small and Medium Enterprises, Government Organizations, Educational Institutions), By Industry Vertical (Information Technology, Healthcare, Finance, Telecommunications) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/object-oriented-databas-software-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20244.49(USD Billion)
    MARKET SIZE 20254.72(USD Billion)
    MARKET SIZE 20357.8(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Model, End User, Industry Vertical, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSincreased data complexity, demand for scalability, integration with IoT, rising big data applications, need for real-time processing
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDIBM, Redis, Objectivity, Oracle, Neo4j, InterSystems, SAP, SQLite, Microsoft, Versant, Cassandra, MongoDB, MarkLogic, BaseX, Couchbase, PostgresXL
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESRising demand for real-time analytics, Integration with IoT applications, Increased adoption of cloud-based solutions, Growing need for big data management, Enhanced support for complex data structures
    COMPOUND ANNUAL GROWTH RATE (CAGR) 5.1% (2025 - 2035)
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler (2024). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure [Dataset]. http://doi.org/10.6084/m9.figshare.25594361.v1

Data from: Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure

Related Article
Explore at:
tarAvailable download formats
Dataset updated
Jun 11, 2024
Dataset provided by
Taylor & Francis
Authors
Traymon E. Beavers; Ge Cheng; Yajie Duan; Javier Cabrera; Mariusz Lubomirski; Dhammika Amaratunga; Jeffrey E. Teigler
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Big data, with N × P dimension where N is extremely large, has created new challenges for data analysis, particularly in the realm of creating meaningful clusters of data. Clustering techniques, such as K-means or hierarchical clustering are popular methods for performing exploratory analysis on large datasets. Unfortunately, these methods are not always possible to apply to big data due to memory or time constraints generated by calculations of order P*N(N−1)2. To circumvent this problem, typically the clustering technique is applied to a random sample drawn from the dataset; however, a weakness is that the structure of the dataset, particularly at the edges, is not necessarily maintained. We propose a new solution through the concept of “data nuggets”, which reduces a large dataset into a small collection of nuggets of data, each containing a center, weight, and scale parameter. The data nuggets are then input into algorithms that compute methods such as principal components analysis and clustering in a more computationally efficient manner. We show the consistency of the data nuggets based covariance estimator and apply the methodology of data nuggets to perform exploratory analysis of a flow cytometry dataset containing over one million observations using PCA and K-means clustering for weighted observations. Supplementary materials for this article are available online.

Search
Clear search
Close search
Google apps
Main menu