60 datasets found
  1. Top challenges for big data analytics implementation in companies worldwide...

    • statista.com
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Top challenges for big data analytics implementation in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933143/worldwide-big-data-implementation-problems/
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2017
    Area covered
    Worldwide
    Description

    The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.

  2. d

    Making Predictions using Large Scale Gaussian Processes

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Aug 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Making Predictions using Large Scale Gaussian Processes [Dataset]. https://catalog.data.gov/dataset/making-predictions-using-large-scale-gaussian-processes
    Explore at:
    Dataset updated
    Aug 22, 2025
    Dataset provided by
    Dashlink
    Description

    One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.

  3. d

    Very large dataset

    • staging-elsevier.digitalcommonsdata.com
    Updated Oct 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johan Michotte (2019). Very large dataset [Dataset]. http://doi.org/10.1234/m9hy8c25y8.3
    Explore at:
    Dataset updated
    Oct 18, 2019
    Authors
    Johan Michotte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset has 10000 files 4GB storage usage

  4. R

    Welding Issues Big Dataset

    • universe.roboflow.com
    zip
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    themeppci (2024). Welding Issues Big Dataset [Dataset]. https://universe.roboflow.com/themeppci/welding-issues-big
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2024
    Dataset authored and provided by
    themeppci
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Defect Bounding Boxes
    Description

    Welding Issues Big

    ## Overview
    
    Welding Issues Big is a dataset for object detection tasks - it contains Defect annotations for 4,442 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. v

    Blog | Big Data for a Big Problem: Putting Data To Work To Tackle Obesity

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • odgavaprod.ogopendata.com
    • +1more
    Updated Mar 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edward L. Hunter (2025). Blog | Big Data for a Big Problem: Putting Data To Work To Tackle Obesity [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/blog-big-data-for-a-big-problem-putting-data-to-work-to-tackle-obesity
    Explore at:
    Dataset updated
    Mar 26, 2025
    Dataset provided by
    Edward L. Hunter
    Description

    This blog post was posted by Edward L. Hunter on July 8, 2015

  6. T

    Large-scale Complete Graph Instances for Max-Cut Problem

    • dataverse.tdl.org
    bin, txt, xz
    Updated Aug 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haibo Wang; Haibo Wang (2020). Large-scale Complete Graph Instances for Max-Cut Problem [Dataset]. http://doi.org/10.18738/T8/VLTIVC
    Explore at:
    xz(172125152), bin(1536524090), bin(2147483648), xz(98175936), txt(5137330), txt(5137279), xz(134167388), txt(5174216), xz(98159108), xz(178009968), txt(5174020), xz(513017872), txt(5173850), txt(6083204), txt(5173733), xz(98171644), txt(5137841), xz(45174340), txt(5173984), xz(171327112), txt(5138111), txt(6083531), txt(6082702), xz(123330676), txt(6082696), txt(5173666), xz(478046760), xz(404951700), bin(1926382939), xz(45165184), xz(122607416), xz(116427424), txt(6082882), xz(98179824), xz(179297000), xz(45184404), xz(98151912), xz(404959576), xz(134248388), xz(98165232), xz(405003180), xz(124922204), txt(5137836), txt(5174031), xz(171772352), txt(6082026), txt(5137932), bin(1536492803), txt(5173315), xz(98158308), xz(134217088), xz(98155148), xz(485951972), xz(98184988), txt(6082927), xz(98153952), txt(5137923), txt(5138054), xz(116471392), xz(45189008), xz(171241808), xz(45180500), xz(486298524), txt(5137856), txt(5173578), xz(98180584), xz(45182196), xz(45162348), txt(5137969), xz(116497716), xz(404932740), txt(5173819), xz(499968212), xz(98171196), xz(116482564), txt(5138045), xz(404934592), xz(404954100), xz(171696036), xz(485603312), txt(5137732), txt(6082406), txt(5173314), txt(5173519), xz(171363612), xz(45158568), xz(171166024), xz(171039480), txt(5137890), xz(180509172), xz(404940772), xz(171322904), xz(485464200), xz(177151788), xz(488985788), xz(38905716), xz(134295232), xz(404960960), txt(5173802), xz(45165144), xz(45167492), txt(6082821), xz(404686156), xz(404953904), xz(404997444), xz(98134016), xz(487704180), bin(1926379561), xz(404956440), xz(45168016), xz(405002600), txt(6083428), txt(5137442), xz(134253504), xz(487360600), xz(98168276), xz(486621060), txt(5138228), txt(6082419), xz(477277816), xz(483625664), txt(5173387), txt(6083250), xz(45178540), txt(5173503), xz(473025628), xz(45181488), xz(404961272), txt(6082843), txt(6082880), xz(45177460), xz(126957828), xz(116514368), xz(474729052), txt(6082387), xz(98151880), xz(405009468), xz(171580760), xz(126369752), xz(171450508)Available download formats
    Dataset updated
    Aug 18, 2020
    Dataset provided by
    Texas Data Repository
    Authors
    Haibo Wang; Haibo Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are 180 large-scale high-density (97-99%) instances for Max-Cut problems with Q matrix from 1000 by 1000 to 90000 by 90000. For 90000 by 90000, the file are broken to multiple 2GB pieces such as MC90000_*.txt.gz_a, ...MC90000_*.txt.gz.d . To recover the large data file after you download the pieces, use copy /b file1 + file2 + file3 + file4 filetogether for example, for MC90000_1.txt data copy /b MC90000_1.txt.gz_a +....+ MC90000_1.txt.gz_e MC90000_1.txt.gz gunzip MC90000_1.txt.gz There are three different types of weights on the instances. The MCxx_yy_a.txt.xz instance has 1 and -1 weight. The MCxx_yy_b.txt.xz instance has random value between -10 and 10. The MCxx_yy_c.txt.xz instance has random value between -1000 and 1000. All data files are compressed with XZ tool. For each instance, there is a text-file in the following format (rudy-output format): n m h_1 t_1 c_{h_1,t_1} h_2 t_2 c_{h_2,t_2} ... h_n t_n c_{h_n,t_n} where n is the number of nodes, m the number of edges and for each edge, h_i and t_i are the end-nodes and c_{h_i,t_i} the weight. Nodes are numbered from 1 up to n. All instances are generated as complete graph

  7. Opinions on how big of a problem worldwide food waste is in the U.S. in...

    • statista.com
    Updated Jul 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Opinions on how big of a problem worldwide food waste is in the U.S. in 2022, by age [Dataset]. https://www.statista.com/statistics/1362436/opinions-on-how-big-of-a-problem-worldwide-food-waste-is-in-the-united-states-by-age/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 3, 2022 - Oct 6, 2022
    Area covered
    United States
    Description

    In 2022, approximately ** percent of survey respondents between the ages of ** and ** in the United States stated that food waste is a very big problem in the world. This was by far the lowest share among all age groups. In all other age groups, the share of respondents who considered this issue a very big problem was at least ** percent or more.

  8. Making Predictions using Large Scale Gaussian Processes - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Making Predictions using Large Scale Gaussian Processes - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/making-predictions-using-large-scale-gaussian-processes
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    One of the key problems that arises in many areas is to estimate a potentially nonlinear function [tex] G(x, \theta)[/tex] given input and output samples tex [/tex] so that [tex]y approx G(x, \theta)[/tex]. There are many approaches to addressing this regression problem. Neural networks, regression trees, and many other methods have been developed to estimate [tex]$G$[/tex] given the input output pair tex [/tex]. One method that I have worked with is called Gaussian process regression. There many good texts and papers on the subject. For more technical information on the method and its applications see: http://www.gaussianprocess.org/ A key problem that arises in developing these models on very large data sets is that it ends up requiring an [tex]O(N^3)[/tex] computation where N is the number of data points and the training sample. Obviously this becomes very problematic when N is large. I discussed this problem with Leslie Foster, a mathematics professor at San Jose State University. He, along with some of his students, developed a method to address this problem based on Cholesky decomposition and pivoting. He also shows that this leads to a numerically stable result. If ou're interested in some light reading, I’d suggest you take a look at his recent paper (which was accepted in the Journal of Machine Learning Research) posted on dashlink. We've also posted code for you to try it out. Let us know how it goes. If you are interested in applications of this method in the area of prognostics, check out our new paper on the subject which was published in IEEE Transactions on Systems, Man, and Cybernetics.

  9. c

    Reproduction materials for: How Big Is the “Lemons” Problem? Historical...

    • archive.ciser.cornell.edu
    Updated Apr 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierre Mérel; Ariel Ortiz-Bobea; Emmanuel Paroissien (2022). Reproduction materials for: How Big Is the “Lemons” Problem? Historical Evidence from French Wines [Dataset]. http://doi.org/10.6077/zqcs-2544
    Explore at:
    Dataset updated
    Apr 27, 2022
    Authors
    Pierre Mérel; Ariel Ortiz-Bobea; Emmanuel Paroissien
    Area covered
    French
    Description

    PI-Provided Abstract: This paper provides empirical evidence on the welfare losses associated with asymmetric information about product quality in a competitive market. When consumers cannot observe product characteristics at the time of purchase, atomistic producers have no incentive to supply costly quality. We compare wine prices across administrative districts around the enactment of historic regulations aimed at certifying the quality of more than 250 French appellation wines to identify welfare losses from asymmetric information. We estimate that these losses amount to more than 7% of total market value, suggesting an important role for credible certification schemes.

    Additional keywords: asymmetric information, adverse selection, quality uncertainty, welfare, wine appellation

  10. Big-Math-RL-Verified

    • huggingface.co
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SynthLabs (2025). Big-Math-RL-Verified [Dataset]. https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified
    Explore at:
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Synth Labs
    Authors
    SynthLabs
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

    Big-Math is the largest open-source dataset of high-quality mathematical problems, curated specifically for reinforcement learning (RL) training in language models. With over 250,000 rigorously filtered and verified problems, Big-Math bridges the gap between quality and quantity, establishing a robust foundation for advancing reasoning in LLMs.

    Request Early Access to Private… See the full description on the dataset page: https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified.
    
  11. w

    Dataset of books called Kevin and the big problem

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called Kevin and the big problem [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=Kevin+and+the+big+problem
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is Kevin and the big problem. It features 7 columns including author, publication date, language, and book publisher.

  12. m

    Hierarchical Hub Facility Location Problem Large Data Set

    • data.mendeley.com
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arup Kumar Bhattacharjee (2025). Hierarchical Hub Facility Location Problem Large Data Set [Dataset]. http://doi.org/10.17632/sttdkdd2vz.2
    Explore at:
    Dataset updated
    May 30, 2025
    Authors
    Arup Kumar Bhattacharjee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this dataset we deliver nine different benchmark datasets considering 100, 300, 400, 500, 600, 800, 900, 1000 and 2000 zones in the city of Kolkata and Mumbai. All benchmarks have cost and weight derived from the buildings in the concerned area of both the cities.

  13. Respondents who think food waste is a big problem in the United States 2022

    • statista.com
    Updated Oct 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Respondents who think food waste is a big problem in the United States 2022 [Dataset]. https://www.statista.com/statistics/1346264/food-waste-problem-opinions-united-states/
    Explore at:
    Dataset updated
    Oct 3, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 3, 2022 - Oct 6, 2022
    Area covered
    United States
    Description

    How big of a problem is food waste in the United States? 1,000 United States adult citizens responded to this question. Around ** percent of those surveyed said it is "a very big problem." Notably, ** percent of respondents were unsure if food waste is a big problem.

  14. n

    Data from: Advanced Computational Methods for Large-Scale Optimization...

    • curate.nd.edu
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhihao Xu (2025). Advanced Computational Methods for Large-Scale Optimization Problems [Dataset]. http://doi.org/10.7274/28786112.v1
    Explore at:
    Dataset updated
    May 12, 2025
    Dataset provided by
    University of Notre Dame
    Authors
    Zhihao Xu
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    With the development of science and technology, large-scale optimization tasks have become integral to cutting-edge engineering. The challenges of solving these problems arises from ever-growing system sizes, intricate physical space, and the computational cost required to accurately model and optimize target objectives. Taking the design of advanced functional materials as an example, the high-dimensional parameter space and high-fidelity physical simulations can demand immense computational resources for searching and iterations. Although emerging machine learning techniques have been combined with conventional experimental and simulation approaches to explore the design space and identify high-performance solutions, these methods are still limited to a small part of the design space around those materials have been well investigated.

    Over the past several decades, continuous development of both hardware and algorithms have addressed some of the challenges. High-performance computing (HPC) architectures and heterogeneous systems have greatly expanded the capacity to perform large-scale calculations and optimizations; On the other hand, the emergence of machine learning frameworks and algorithms have dramatically facilitated the development of advanced models and enable the integration of AI-driven techniques into traditional experiments and simulations more seamlessly. In recent years, quantum computing (QC) has received widespread attention due to its powerful performance on solving global optima and is regarded as a promising solution to large-scale and non-linear optimization problems in the future, and in the meantime, the quantum computing principles also expand the capacity of classical algorithms on exploring high-dimensional combinatorial spaces. In this dissertation, we will show the power of the integration of machine learning algorithms, quantum algorithms and HPC architectures on tackling the challenges of solving large-scale optimization problems.

    In the first part of this dissertation, we introduced an optimization algorithm based on a Quantum-inspired Genetic Algorithm (QGA) to design planar multilayer (PML) for transparent radiative cooler (TRC) applications. Results of numerical experiments showed that our QGA-facilitated optimization algorithm can converge to comparable solutions as quantum annealing (QA) and the QGA overperformed on classical genetic algorithm (CGA) on both convergence speed and global search capacity. Our work shows that quantum heuristic algorithms will become powerful tools for addressing the challenges traditional optimization algorithm faced when solving large-scale optimization problems with complex search space.

    In the second part of the dissertation, we proposed a quantum annealing-assisted lattice optimization (QALO) algorithm for high-entropy alloy (HEA) systems. The algorithm is developed based on the active learning framework that integrates the field-aware factorization machine (FFM), quantum annealing (QA) and machine learning potential (MLP). When applying to optimize the bulk grain configuration of the NbMoTaW alloy system, our algorithm can quickly obtain low-energy microstructures and the results successfully reproduce the Nb segregation and W enrichment in the bulk phase driven by thermodynamic driving force, which usually be observed in the experiments and MC/MD simulations. This work highlights the potential of quantum computing in exploring the large design space for HEA systems.

    In the third part of the dissertation, we employed the Distributed Quantum Approximate Optimization Algorithm (DQAOA) to address large-scale combinatorial optimization problems that exceed the limits of conventional computational resources. This was achieved through a divide-and-conquer strategy, in which the original problem is decomposed into smaller sub-tasks that are solved in parallel on a high-performance computing (HPC) system. To further enhance convergence efficiency, we introduced an Impact Factor Directed (IFD) decomposition method. By calculating impact factors and leveraging a targeted traversal strategy, IFD captures local structural features of the problem, making it effective for both dense and sparse instances. Finally, we explored the integration of DQAOA with the Quantum Framework (QFw) on the Frontier HPC system, demonstrating the potential for efficient management of large-scale circuit execution workloads across CPUs and GPUs.

  15. Share of adults who think select issues are a big problem by party U.S. 2020...

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of adults who think select issues are a big problem by party U.S. 2020 [Dataset]. https://www.statista.com/statistics/1137235/share-adults-select-issues-big-problem-party-us/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jun 16, 2020 - Jun 22, 2020
    Area covered
    United States
    Description

    As of June 2020, ** percent of Democratic respondents thought the way racial and ethnic minorities are treated by the criminal justice system is a very big problem in the United States today. This is compared to ** percent of Republican respondents who thought it was a very a big problem.

  16. d

    Data from: A Review of International Large-Scale Assessments in Education...

    • catalog.data.gov
    • datasets.ai
    Updated Mar 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). A Review of International Large-Scale Assessments in Education Assessing Component Skills and Collecting Contextual Data [Dataset]. https://catalog.data.gov/dataset/a-review-of-international-large-scale-assessments-in-education-assessing-component-skills-
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    U.S. Department of State
    Description

    The OECD has initiated PISA for Development (PISA-D) in response to the rising need of developing countries to collect data about their education systems and the capacity of their student bodies. This report aims to compare and contrast approaches regarding the instruments that are used to collect data on (a) component skills and cognitive instruments, (b) contextual frameworks, and (c) the implementation of the different international assessments, as well as approaches to include children who are not at school, and the ways in which data are used. It then seeks to identify assessment practices in these three areas that will be useful for developing countries. This report reviews the major international and regional large-scale educational assessments: large-scale international surveys, school-based surveys and household-based surveys. For each of the issues discussed, there is a description of the prevailing international situation, followed by a consideration of the issue for developing countries and then a description of the relevance of the issue to PISA for Development.

  17. s

    Online Feature Selection and Its Applications

    • researchdata.smu.edu.sg
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
    Explore at:
    Dataset updated
    May 31, 2023
    Dataset provided by
    SMU Research Data Repository (RDR)
    Authors
    HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/

  18. B

    Big Data Technology Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Aug 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Big Data Technology Market Report [Dataset]. https://www.marketresearchforecast.com/reports/big-data-technology-market-1717
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Aug 6, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data Technology Market size was valued at USD 349.40 USD Billion in 2023 and is projected to reach USD 918.16 USD Billion by 2032, exhibiting a CAGR of 14.8 % during the forecast period. Big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems that wouldn’t have been able to tackle before. Big data technology is defined as software-utility. This technology is primarily designed to analyze, process and extract information from a large data set and a huge set of extremely complex structures. This is very difficult for traditional data processing software to deal with. Among the larger concepts of rage in technology, big data technologies are widely associated with many other technologies such as deep learning, machine learning, artificial intelligence (AI), and Internet of Things (IoT) that are massively augmented. In combination with these technologies, big data technologies are focused on analyzing and handling large amounts of real-time data and batch-related data. Recent developments include: February 2024: - SQream, a GPU data analytics platform, partnered with Dataiku, an AI and machine learning platform, to deliver a comprehensive solution for efficiently generating big data analytics and business insights by handling complex data., October 2023: - MultiversX (ELGD), a blockchain infrastructure firm, formed a partnership with Google Cloud to enhance Web3’s presence by integrating big data analytics and artificial intelligence tools. The collaboration aims to offer new possibilities for developers and startups., May 2023: - Vpon Big Data Group partnered with VIOOH, a digital out-of-home advertising (DOOH) supply-side platform, to display the unique advertising content generated by Vpon’s AI visual content generator "InVnity" with VIOOH's digital outdoor advertising inventories. This partnership pioneers the future of outdoor advertising by using AI and big data solutions., May 2023: - Salesforce launched the next generation of Tableau for users to automate data analysis and generate actionable insights., March 2023: - SAP SE, a German multinational software company, entered a partnership with AI companies, including Databricks, Collibra NV, and DataRobot, Inc., to introduce the next generation of data management portfolio., November 2022: - Thai Oil and Retail Corporation PTT Oil and Retail Business Public Company implemented the Cloudera Data Platform to deliver insights and enhance customer engagement. The implementation offered a unified and personalized experience across 1,900 gas stations and 3,000 retail branches., November 2022: - IBM launched new software for enterprises to break down data and analytics silos that helped users make data-driven decisions. The software helps to streamline how users access and discover analytics and planning tools from multiple vendors in a single dashboard view., September 2022: - ActionIQ, a global leader in CX solutions, and Teradata, a leading software company, entered a strategic partnership and integrated AIQ’s new HybridCompute Technology with Teradata VantageCloud analytics and data platform.. Key drivers for this market are: Increasing Adoption of AI, ML, and Data Analytics to Boost Market Growth . Potential restraints include: Rising Concerns on Information Security and Privacy to Hinder Market Growth. Notable trends are: Rising Adoption of Big Data and Business Analytics among End-use Industries.

  19. c

    Global Big Data in the Oil and Gas Sector Market Report 2025 Edition, Market...

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2025). Global Big Data in the Oil and Gas Sector Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/big-data-in-the-oil-and-gas-sector-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Big Data in Oil and Gas Sector market size is projected to reach USD XX million by 2024 and is expected to expand at a compound annual growth rate (CAGR) of XX% from 2024 to 2031.

    The global Big Data in Oil and Gas Sector market is anticipated to grow significantly, with a projected CAGR of XX% between 2024 and 2031.
    North America is expected to hold a major market share of more than XX%, with a market size of USD XX million in 2024, and is forecasted to grow at a CAGR of XX% from 2024 to 2031 due to the advanced technological infrastructure and the high adoption rate of digital technologies in the oil and gas sector.
    The upstream application segment held the highest Big Data in Oil and Gas Sector market revenue share in 2024, attributed to the critical role of big data in exploration and production activities, optimizing reservoir performance, and minimizing risks.
    

    Market Dynamics - Key Drivers of the Big Data in Oil and Gas Sector

    Integration of Advanced Analytics for Enhanced Decision-Making Drives the Big Data in Oil & Gas Market

    The Big Data in Oil & Gas market is driven by the adoption of advanced analytics, where cost efficiency is a major achievement. Big data analytics processes complex datasets for better predictions and optimisations. Its affordability relative to other precious metals like gold and platinum further amplifies its appeal. As Big Data is further integrated, the development of the Oil & Gas Sector is buoyed by enhancing decision-making, efficiency, and safety.

    For instance, ExxonMobil, in their "2020 Energy & Carbon Summary" report, highlighted the use of advanced seismic imaging and data analytics to improve the accuracy of subsurface exploration, thereby reducing drilling risks and enhancing operational efficiency.

    IoT Deployment for Real-Time Monitoring and Efficiency Further Propel the Big Data in Oil & Gas Market

    The rising demand for monitored infographics and data analytics is to fuel the Big Data in the Oil & Gas market. The deployment of IoT devices facilitates real-time monitoring and operational efficiency. This development aligns with the broader shift towards self-sufficiency and positive capital allocations. As IoT sensors on equipment and in operations provide critical data for predictive maintenance and decision-making, contributing to the shift from capital expenditure to operational expenditure in multiple outsourced activities for the businesses.

    Schlumberger, in their "Digital Transformation in the Oil and Gas Industry" report, discussed implementing IoT solutions to monitor well operations, which has led to significant improvements in maintenance strategies and operational efficiencies.

    Market Dynamics - Key Restraints of the Big Data in Oil and Gas Sector

    Data Security and Privacy Concerns is a Challenge for the Big Data in Oil & Gas Market

    With the companies storing all the its data on every aspect of business for a more efficient future working, there is still room for avoidable threats. The rising demand for big data might come with the threat of Data security and privacy are significant concerns with the increasing use of big data analytics, given the oil and gas sector's sensitive nature. Cyber threats limit the adoption of big data solutions, limiting the demand for Big data in the Oil & Gas market.

    The International Energy Agency (IEA), in its "Digitalization & Energy" report, highlighted the cybersecurity challenges facing the energy sector, emphasizing the need for robust security measures in the adoption of digital technologies, including big data analytics.

    Integration and Interoperability Challenges will Restraint the Big Data in Oil & Gas Market

    Data access, analysis, and storage are becoming more and more of an issue for businesses. Compatibility and interoperability issues arise when big data technologies are integrated with legacy systems. The integration process is made more difficult by the diversity of data sources and formats. Most firms are finding it necessary to evaluate new technologies and legacy infrastructure as the needs of Big Data outpace those of traditional relational databases.

    A study by Deloitte, titled "Digital Transformation: Shaping the Future of the Oil and Gas Industry", identified integration of new technologies with existin...

  20. Data from: Sizing the Problem of Improving Discovery and Access to...

    • figshare.com
    xlsx
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Read (2016). Sizing the Problem of Improving Discovery and Access to NIH-funded Data: A preliminary study [Dataset]. http://doi.org/10.6084/m9.figshare.1285515.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Read
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To inform efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by National Institutes of Health (NIH)-funded researchers. Of particular interest is characterizing those datasets that are not deposited in a known data repository or registry, e.g., those for which a related journal article does not indicate that underlying data have been deposited in a known repository. Such “invisible” datasets comprise the “long tail” of biomedical data and pose significant practical challenges to ongoing efforts to improve discoverability of and access to biomedical research data. This study identified datasets used to support the NIH-funded research reported in articles published in 2011 and cited in PubMed® and deposited in PubMed Central® (PMC). After searching for all articles that acknowledged NIH support, we first identified articles that contained explicit mention of datasets being deposited in recognized repositories. Thirty members of the NIH staff then analyzed a random sample of the remaining articles to estimate how many and what types of datasets were used per article. Two reviewers independently examined each paper. Each dataset is titled Bigdata_randomsample_xxxx_xx. The xxxx refers to the set of articles the annotator looked at, while the xxidentifies the annotator that did the analysis. Within each dataset, the author has listed the number of datasets they identified within the articles that they looked at. For every dataset that was found, the annotators were asked to insert a new row into the spreadsheet, and then describe the dataset they found (e.g., type of data, subject of study, etc.). Each row in the spreadsheet was always prepended by the PubMed Identifier (PMID) where the dataset was found. Finally, the files 2013-08-07_Bigdatastudy_dataanalysis, Dataanalysis_ack_si_datasets, and Datasets additional random sample mention vs deposit 20150313 refer to the analysis that was performed based on each annotator's analysis of the publications they were assigned, and the data deposits identified from the analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Top challenges for big data analytics implementation in companies worldwide 2017 [Dataset]. https://www.statista.com/statistics/933143/worldwide-big-data-implementation-problems/
Organization logo

Top challenges for big data analytics implementation in companies worldwide 2017

Explore at:
Dataset updated
Jul 10, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2017
Area covered
Worldwide
Description

The statistic shows the problems that organizations face when using big data technologies worldwide as of 2017. Around ** percent of respondents stated that inadequate analytical know-how was a major problem that their organization faced when using big data technologies as of 2017.

Search
Clear search
Close search
Google apps
Main menu