60 datasets found

Top challenges for big data analytics implementation in companies worldwide...
statista.com
Updated Jul 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Top challenges for big data analytics implementation in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933143/worldwide-big-data-implementation-problems/
Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
Worldwide
Description
The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.
d
Making Predictions using Large Scale Gaussian Processes
catalog.data.gov
s.cnmilf.com
+2more
Updated Aug 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Making Predictions using Large Scale Gaussian Processes [Dataset]. https://catalog.data.gov/dataset/making-predictions-using-large-scale-gaussian-processes
Explore at:
Dataset updated
Aug 22, 2025
Dataset provided by
Dashlink
Description
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
d
Very large dataset
staging-elsevier.digitalcommonsdata.com
Updated Oct 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johan Michotte (2019). Very large dataset [Dataset]. http://doi.org/10.1234/m9hy8c25y8.3
Explore at:
Unique identifier
https://doi.org/10.1234/m9hy8c25y8.3
Dataset updated
Oct 18, 2019
Authors
Johan Michotte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset has 10000 files 4GB storage usage
R
Welding Issues Big Dataset
universe.roboflow.com
zip
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
themeppci (2024). Welding Issues Big Dataset [Dataset]. https://universe.roboflow.com/themeppci/welding-issues-big
Explore at:
zipAvailable download formats
Dataset updated
Jun 4, 2024
Dataset authored and provided by
themeppci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Defect Bounding Boxes
Description
Welding Issues Big

## Overview Welding Issues Big is a dataset for object detection tasks - it contains Defect annotations for 4,442 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
v
Blog | Big Data for a Big Problem: Putting Data To Work To Tackle Obesity
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
odgavaprod.ogopendata.com
+1more
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edward L. Hunter (2025). Blog | Big Data for a Big Problem: Putting Data To Work To Tackle Obesity [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/blog-big-data-for-a-big-problem-putting-data-to-work-to-tackle-obesity
Explore at:
Dataset updated
Mar 26, 2025
Dataset provided by
Edward L. Hunter
Description
This blog post was posted by Edward L. Hunter on July 8, 2015
T
Large-scale Complete Graph Instances for Max-Cut Problem
dataverse.tdl.org
bin, txt, xz
Updated Aug 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Haibo Wang; Haibo Wang (2020). Large-scale Complete Graph Instances for Max-Cut Problem [Dataset]. http://doi.org/10.18738/T8/VLTIVC
Explore at:
xz(172125152), bin(1536524090), bin(2147483648), xz(98175936), txt(5137330), txt(5137279), xz(134167388), txt(5174216), xz(98159108), xz(178009968), txt(5174020), xz(513017872), txt(5173850), txt(6083204), txt(5173733), xz(98171644), txt(5137841), xz(45174340), txt(5173984), xz(171327112), txt(5138111), txt(6083531), txt(6082702), xz(123330676), txt(6082696), txt(5173666), xz(478046760), xz(404951700), bin(1926382939), xz(45165184), xz(122607416), xz(116427424), txt(6082882), xz(98179824), xz(179297000), xz(45184404), xz(98151912), xz(404959576), xz(134248388), xz(98165232), xz(405003180), xz(124922204), txt(5137836), txt(5174031), xz(171772352), txt(6082026), txt(5137932), bin(1536492803), txt(5173315), xz(98158308), xz(134217088), xz(98155148), xz(485951972), xz(98184988), txt(6082927), xz(98153952), txt(5137923), txt(5138054), xz(116471392), xz(45189008), xz(171241808), xz(45180500), xz(486298524), txt(5137856), txt(5173578), xz(98180584), xz(45182196), xz(45162348), txt(5137969), xz(116497716), xz(404932740), txt(5173819), xz(499968212), xz(98171196), xz(116482564), txt(5138045), xz(404934592), xz(404954100), xz(171696036), xz(485603312), txt(5137732), txt(6082406), txt(5173314), txt(5173519), xz(171363612), xz(45158568), xz(171166024), xz(171039480), txt(5137890), xz(180509172), xz(404940772), xz(171322904), xz(485464200), xz(177151788), xz(488985788), xz(38905716), xz(134295232), xz(404960960), txt(5173802), xz(45165144), xz(45167492), txt(6082821), xz(404686156), xz(404953904), xz(404997444), xz(98134016), xz(487704180), bin(1926379561), xz(404956440), xz(45168016), xz(405002600), txt(6083428), txt(5137442), xz(134253504), xz(487360600), xz(98168276), xz(486621060), txt(5138228), txt(6082419), xz(477277816), xz(483625664), txt(5173387), txt(6083250), xz(45178540), txt(5173503), xz(473025628), xz(45181488), xz(404961272), txt(6082843), txt(6082880), xz(45177460), xz(126957828), xz(116514368), xz(474729052), txt(6082387), xz(98151880), xz(405009468), xz(171580760), xz(126369752), xz(171450508)Available download formats
Unique identifier
https://doi.org/10.18738/T8/VLTIVC
Dataset updated
Aug 18, 2020
Dataset provided by
Texas Data Repository
Authors
Haibo Wang; Haibo Wang
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
There are 180 large-scale high-density (97-99%) instances for Max-Cut problems with Q matrix from 1000 by 1000 to 90000 by 90000. For 90000 by 90000, the file are broken to multiple 2GB pieces such as MC90000_*.txt.gz_a, ...MC90000_*.txt.gz.d . To recover the large data file after you download the pieces, use copy /b file1 + file2 + file3 + file4 filetogether for example, for MC90000_1.txt data copy /b MC90000_1.txt.gz_a +....+ MC90000_1.txt.gz_e MC90000_1.txt.gz gunzip MC90000_1.txt.gz There are three different types of weights on the instances. The MCxx_yy_a.txt.xz instance has 1 and -1 weight. The MCxx_yy_b.txt.xz instance has random value between -10 and 10. The MCxx_yy_c.txt.xz instance has random value between -1000 and 1000. All data files are compressed with XZ tool. For each instance, there is a text-file in the following format (rudy-output format): n m h_1 t_1 c_{h_1,t_1} h_2 t_2 c_{h_2,t_2} ... h_n t_n c_{h_n,t_n} where n is the number of nodes, m the number of edges and for each edge, h_i and t_i are the end-nodes and c_{h_i,t_i} the weight. Nodes are numbered from 1 up to n. All instances are generated as complete graph
Opinions on how big of a problem worldwide food waste is in the U.S. in...
statista.com
Updated Jul 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Opinions on how big of a problem worldwide food waste is in the U.S. in 2022, by age [Dataset]. https://www.statista.com/statistics/1362436/opinions-on-how-big-of-a-problem-worldwide-food-waste-is-in-the-united-states-by-age/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 3, 2022 - Oct 6, 2022
Area covered
United States
Description
In 2022, approximately ** percent of survey respondents between the ages of ** and ** in the United States stated that food waste is a very big problem in the world. This was by far the lowest share among all age groups. In all other age groups, the share of respondents who considered this issue a very big problem was at least ** percent or more.
Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Making Predictions using Large Scale Gaussian Processes - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/making-predictions-using-large-scale-gaussian-processes
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.
c
Reproduction materials for: How Big Is the “Lemons” Problem? Historical...
archive.ciser.cornell.edu
Updated Apr 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierre Mérel; Ariel Ortiz-Bobea; Emmanuel Paroissien (2022). Reproduction materials for: How Big Is the “Lemons” Problem? Historical Evidence from French Wines [Dataset]. http://doi.org/10.6077/zqcs-2544
Explore at:
Unique identifier
https://doi.org/10.6077/zqcs-2544
Dataset updated
Apr 27, 2022
Authors
Pierre Mérel; Ariel Ortiz-Bobea; Emmanuel Paroissien
Area covered
French
Description
PI-Provided Abstract: This paper provides empirical evidence on the welfare losses associated with asymmetric information about product quality in a competitive market. When consumers cannot observe product characteristics at the time of purchase, atomistic producers have no incentive to supply costly quality. We compare wine prices across administrative districts around the enactment of historic regulations aimed at certifying the quality of more than 250 French appellation wines to identify welfare losses from asymmetric information. We estimate that these losses amount to more than 7% of total market value, suggesting an important role for credible certification schemes.

Additional keywords: asymmetric information, adverse selection, quality uncertainty, welfare, wine appellation
Big-Math-RL-Verified
huggingface.co
Updated Feb 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SynthLabs (2025). Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified
Explore at:
Dataset updated
Feb 21, 2025
Dataset provided by
Synth Labs
Authors
SynthLabs
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
w
Dataset of books called Kevin and the big problem
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called Kevin and the big problem [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Kevin+and+the+big+problem
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is Kevin and the big problem. It features 7 columns including author, publication date, language, and book publisher.
m
Hierarchical Hub Facility Location Problem Large Data Set
data.mendeley.com
Updated May 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arup Kumar Bhattacharjee (2025). Hierarchical Hub Facility Location Problem Large Data Set [Dataset]. http://doi.org/10.17632/sttdkdd2vz.2
Explore at:
Unique identifier
https://doi.org/10.17632/sttdkdd2vz.2
Dataset updated
May 30, 2025
Authors
Arup Kumar Bhattacharjee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this dataset we deliver nine different benchmark datasets considering 100, 300, 400, 500, 600, 800, 900, 1000 and 2000 zones in the city of Kolkata and Mumbai. All benchmarks have cost and weight derived from the buildings in the concerned area of both the cities.
Respondents who think food waste is a big problem in the United States 2022
statista.com
Updated Oct 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Respondents who think food waste is a big problem in the United States 2022 [Dataset]. https://www.statista.com/statistics/1346264/food-waste-problem-opinions-united-states/
Explore at:
Dataset updated
Oct 3, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 3, 2022 - Oct 6, 2022
Area covered
United States
Description
How big of a problem is food waste in the United States? 1,000 United States adult citizens responded to this question. Around ** percent of those surveyed said it is "a very big problem." Notably, ** percent of respondents were unsure if food waste is a big problem.
n
Data from: Advanced Computational Methods for Large-Scale Optimization...
curate.nd.edu
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhihao Xu (2025). Advanced Computational Methods for Large-Scale Optimization Problems [Dataset]. http://doi.org/10.7274/28786112.v1
Explore at:
Unique identifier
https://doi.org/10.7274/28786112.v1
Dataset updated
May 12, 2025
Dataset provided by
University of Notre Dame
Authors
Zhihao Xu
License
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Description
With the development of science and technology, large-scale optimization tasks have become integral to cutting-edge engineering. The challenges of solving these problems arises from ever-growing system sizes, intricate physical space, and the computational cost required to accurately model and optimize target objectives. Taking the design of advanced functional materials as an example, the high-dimensional parameter space and high-fidelity physical simulations can demand immense computational resources for searching and iterations. Although emerging machine learning techniques have been combined with conventional experimental and simulation approaches to explore the design space and identify high-performance solutions, these methods are still limited to a small part of the design space around those materials have been well investigated.

Over the past several decades, continuous development of both hardware and algorithms have addressed some of the challenges. High-performance computing (HPC) architectures and heterogeneous systems have greatly expanded the capacity to perform large-scale calculations and optimizations; On the other hand, the emergence of machine learning frameworks and algorithms have dramatically facilitated the development of advanced models and enable the integration of AI-driven techniques into traditional experiments and simulations more seamlessly. In recent years, quantum computing (QC) has received widespread attention due to its powerful performance on solving global optima and is regarded as a promising solution to large-scale and non-linear optimization problems in the future, and in the meantime, the quantum computing principles also expand the capacity of classical algorithms on exploring high-dimensional combinatorial spaces. In this dissertation, we will show the power of the integration of machine learning algorithms, quantum algorithms and HPC architectures on tackling the challenges of solving large-scale optimization problems.

In the first part of this dissertation, we introduced an optimization algorithm based on a Quantum-inspired Genetic Algorithm (QGA) to design planar multilayer (PML) for transparent radiative cooler (TRC) applications. Results of numerical experiments showed that our QGA-facilitated optimization algorithm can converge to comparable solutions as quantum annealing (QA) and the QGA overperformed on classical genetic algorithm (CGA) on both convergence speed and global search capacity. Our work shows that quantum heuristic algorithms will become powerful tools for addressing the challenges traditional optimization algorithm faced when solving large-scale optimization problems with complex search space.

In the second part of the dissertation, we proposed a quantum annealing-assisted lattice optimization (QALO) algorithm for high-entropy alloy (HEA) systems. The algorithm is developed based on the active learning framework that integrates the field-aware factorization machine (FFM), quantum annealing (QA) and machine learning potential (MLP). When applying to optimize the bulk grain configuration of the NbMoTaW alloy system, our algorithm can quickly obtain low-energy microstructures and the results successfully reproduce the Nb segregation and W enrichment in the bulk phase driven by thermodynamic driving force, which usually be observed in the experiments and MC/MD simulations. This work highlights the potential of quantum computing in exploring the large design space for HEA systems.

In the third part of the dissertation, we employed the Distributed Quantum Approximate Optimization Algorithm (DQAOA) to address large-scale combinatorial optimization problems that exceed the limits of conventional computational resources. This was achieved through a divide-and-conquer strategy, in which the original problem is decomposed into smaller sub-tasks that are solved in parallel on a high-performance computing (HPC) system. To further enhance convergence efficiency, we introduced an Impact Factor Directed (IFD) decomposition method. By calculating impact factors and leveraging a targeted traversal strategy, IFD captures local structural features of the problem, making it effective for both dense and sparse instances. Finally, we explored the integration of DQAOA with the Quantum Framework (QFw) on the Frontier HPC system, demonstrating the potential for efficient management of large-scale circuit execution workloads across CPUs and GPUs.
Share of adults who think select issues are a big problem by party U.S. 2020...
statista.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Share of adults who think select issues are a big problem by party U.S. 2020 [Dataset]. https://www.statista.com/statistics/1137235/share-adults-select-issues-big-problem-party-us/
Explore at:
Dataset updated
Jul 9, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Jun 16, 2020 - Jun 22, 2020
Area covered
United States
Description
As of June 2020, ** percent of Democratic respondents thought the way racial and ethnic minorities are treated by the criminal justice system is a very big problem in the United States today. This is compared to ** percent of Republican respondents who thought it was a very a big problem.
d
Data from: A Review of International Large-Scale Assessments in Education...
catalog.data.gov
datasets.ai
Updated Mar 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-
Explore at:
Dataset updated
Mar 30, 2021
Dataset provided by
U.S. Department of State
Description
The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
B
Big Data Technology Market Report
marketresearchforecast.com
doc, pdf, ppt
Updated Aug 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Big Data Technology Market Report [Dataset]. https://www.marketresearchforecast.com/reports/big-data-technology-market-1717
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Aug 6, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Big Data Technology Market size was valued at USD 349.40 USD Billion in 2023 and is projected to reach USD 918.16 USD Billion by 2032, exhibiting a CAGR of 14.8 % during the forecast period. Big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems that wouldn’t have been able to tackle before. Big data technology is defined as software-utility. This technology is primarily designed to analyze, process and extract information from a large data set and a huge set of extremely complex structures. This is very difficult for traditional data processing software to deal with. Among the larger concepts of rage in technology, big data technologies are widely associated with many other technologies such as deep learning, machine learning, artificial intelligence (AI), and Internet of Things (IoT) that are massively augmented. In combination with these technologies, big data technologies are focused on analyzing and handling large amounts of real-time data and batch-related data. Recent developments include: February 2024: - SQream, a GPU data analytics platform, partnered with Dataiku, an AI and machine learning platform, to deliver a comprehensive solution for efficiently generating big data analytics and business insights by handling complex data., October 2023: - MultiversX (ELGD), a blockchain infrastructure firm, formed a partnership with Google Cloud to enhance Web3’s presence by integrating big data analytics and artificial intelligence tools. The collaboration aims to offer new possibilities for developers and startups., May 2023: - Vpon Big Data Group partnered with VIOOH, a digital out-of-home advertising (DOOH) supply-side platform, to display the unique advertising content generated by Vpon’s AI visual content generator "InVnity" with VIOOH's digital outdoor advertising inventories. This partnership pioneers the future of outdoor advertising by using AI and big data solutions., May 2023: - Salesforce launched the next generation of Tableau for users to automate data analysis and generate actionable insights., March 2023: - SAP SE, a German multinational software company, entered a partnership with AI companies, including Databricks, Collibra NV, and DataRobot, Inc., to introduce the next generation of data management portfolio., November 2022: - Thai Oil and Retail Corporation PTT Oil and Retail Business Public Company implemented the Cloudera Data Platform to deliver insights and enhance customer engagement. The implementation offered a unified and personalized experience across 1,900 gas stations and 3,000 retail branches., November 2022: - IBM launched new software for enterprises to break down data and analytics silos that helped users make data-driven decisions. The software helps to streamline how users access and discover analytics and planning tools from multiple vendors in a single dashboard view., September 2022: - ActionIQ, a global leader in CX solutions, and Teradata, a leading software company, entered a strategic partnership and integrated AIQ’s new HybridCompute Technology with Teradata VantageCloud analytics and data platform.. Key drivers for this market are: Increasing Adoption of AI, ML, and Data Analytics to Boost Market Growth . Potential restraints include: Rising Concerns on Information Security and Privacy to Hinder Market Growth. Notable trends are: Rising Adoption of Big Data and Business Analytics among End-use Industries.
c
Global Big Data in the Oil and Gas Sector Market Report 2025 Edition, Market...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2025). Global Big Data in the Oil and Gas Sector Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/big-data-in-the-oil-and-gas-sector-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jul 15, 2025
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global Big Data in Oil and Gas Sector market size is projected to reach USD XX million by 2024 and is expected to expand at a compound annual growth rate (CAGR) of XX% from 2024 to 2031.

The global Big Data in Oil and Gas Sector market is anticipated to grow significantly, with a projected CAGR of XX% between 2024 and 2031. North America is expected to hold a major market share of more than XX%, with a market size of USD XX million in 2024, and is forecasted to grow at a CAGR of XX% from 2024 to 2031 due to the advanced technological infrastructure and the high adoption rate of digital technologies in the oil and gas sector. The upstream application segment held the highest Big Data in Oil and Gas Sector market revenue share in 2024, attributed to the critical role of big data in exploration and production activities, optimizing reservoir performance, and minimizing risks.

Market Dynamics - Key Drivers of the Big Data in Oil and Gas Sector

Integration of Advanced Analytics for Enhanced Decision-Making Drives the Big Data in Oil & Gas Market

The Big Data in Oil & Gas market is driven by the adoption of advanced analytics, where cost efficiency is a major achievement. Big data analytics processes complex datasets for better predictions and optimisations. Its affordability relative to other precious metals like gold and platinum further amplifies its appeal. As Big Data is further integrated, the development of the Oil & Gas Sector is buoyed by enhancing decision-making, efficiency, and safety.

For instance, ExxonMobil, in their "2020 Energy & Carbon Summary" report, highlighted the use of advanced seismic imaging and data analytics to improve the accuracy of subsurface exploration, thereby reducing drilling risks and enhancing operational efficiency.

IoT Deployment for Real-Time Monitoring and Efficiency Further Propel the Big Data in Oil & Gas Market

The rising demand for monitored infographics and data analytics is to fuel the Big Data in the Oil & Gas market. The deployment of IoT devices facilitates real-time monitoring and operational efficiency. This development aligns with the broader shift towards self-sufficiency and positive capital allocations. As IoT sensors on equipment and in operations provide critical data for predictive maintenance and decision-making, contributing to the shift from capital expenditure to operational expenditure in multiple outsourced activities for the businesses.

Schlumberger, in their "Digital Transformation in the Oil and Gas Industry" report, discussed implementing IoT solutions to monitor well operations, which has led to significant improvements in maintenance strategies and operational efficiencies.

Market Dynamics - Key Restraints of the Big Data in Oil and Gas Sector

Data Security and Privacy Concerns is a Challenge for the Big Data in Oil & Gas Market

With the companies storing all the its data on every aspect of business for a more efficient future working, there is still room for avoidable threats. The rising demand for big data might come with the threat of Data security and privacy are significant concerns with the increasing use of big data analytics, given the oil and gas sector's sensitive nature. Cyber threats limit the adoption of big data solutions, limiting the demand for Big data in the Oil & Gas market.

The International Energy Agency (IEA), in its "Digitalization & Energy" report, highlighted the cybersecurity challenges facing the energy sector, emphasizing the need for robust security measures in the adoption of digital technologies, including big data analytics.

Integration and Interoperability Challenges will Restraint the Big Data in Oil & Gas Market

Data access, analysis, and storage are becoming more and more of an issue for businesses. Compatibility and interoperability issues arise when big data technologies are integrated with legacy systems. The integration process is made more difficult by the diversity of data sources and formats. Most firms are finding it necessary to evaluate new technologies and legacy infrastructure as the needs of Big Data outpace those of traditional relational databases.

A study by Deloitte, titled "Digital Transformation: Shaping the Future of the Oil and Gas Industry", identified integration of new technologies with existin...
Data from: Sizing the Problem of Improving Discovery and Access to...
figshare.com
xlsx
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Read (2016). Sizing the Problem of Improving Discovery and Access to NIH-funded Data: A preliminary study [Dataset]. http://doi.org/10.6084/m9.figshare.1285515.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1285515.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kevin Read
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To inform efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by National Institutes of Health (NIH)-funded researchers. Of particular interest is characterizing those datasets that are not deposited in a known data repository or registry, e.g., those for which a related journal article does not indicate that underlying data have been deposited in a known repository. Such “invisible” datasets comprise the “long tail” of biomedical data and pose significant practical challenges to ongoing efforts to improve discoverability of and access to biomedical research data. This study identified datasets used to support the NIH-funded research reported in articles published in 2011 and cited in PubMed® and deposited in PubMed Central® (PMC). After searching for all articles that acknowledged NIH support, we first identified articles that contained explicit mention of datasets being deposited in recognized repositories. Thirty members of the NIH staff then analyzed a random sample of the remaining articles to estimate how many and what types of datasets were used per article. Two reviewers independently examined each paper. Each dataset is titled Bigdata_randomsample_xxxx_xx. The xxxx refers to the set of articles the annotator looked at, while the xxidentifies the annotator that did the analysis. Within each dataset, the author has listed the number of datasets they identified within the articles that they looked at. For every dataset that was found, the annotators were asked to insert a new row into the spreadsheet, and then describe the dataset they found (e.g., type of data, subject of study, etc.). Each row in the spreadsheet was always prepended by the PubMed Identifier (PMID) where the dataset was found. Finally, the files 2013-08-07_Bigdatastudy_dataanalysis, Dataanalysis_ack_si_datasets, and Datasets additional random sample mention vs deposit 20150313 refer to the analysis that was performed based on each annotator's analysis of the publications they were assigned, and the data deposits identified from the analysis.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Top challenges for big data analytics implementation in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933143/worldwide-big-data-implementation-problems/

Top challenges for big data analytics implementation in companies worldwide 2017

Explore at:

Dataset updated

Jul 10, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2017

Area covered

Worldwide

Description

The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.

Clear search

Close search

Google apps

Main menu

Top challenges for big data analytics implementation in companies worldwide...

Making Predictions using Large Scale Gaussian Processes

Very large dataset

Welding Issues Big Dataset

Welding Issues Big

Blog | Big Data for a Big Problem: Putting Data To Work To Tackle Obesity

Large-scale Complete Graph Instances for Max-Cut Problem

Opinions on how big of a problem worldwide food waste is in the U.S. in...

Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...

Reproduction materials for: How Big Is the “Lemons” Problem? Historical...

Big-Math-RL-Verified

Dataset of books called Kevin and the big problem

Hierarchical Hub Facility Location Problem Large Data Set

Respondents who think food waste is a big problem in the United States 2022

Data from: Advanced Computational Methods for Large-Scale Optimization...

Share of adults who think select issues are a big problem by party U.S. 2020...

Data from: A Review of International Large-Scale Assessments in Education...

Online Feature Selection and Its Applications

Big Data Technology Market Report

Global Big Data in the Oil and Gas Sector Market Report 2025 Edition, Market...

Data from: Sizing the Problem of Improving Discovery and Access to...

Top challenges for big data analytics implementation in companies worldwide 2017