77 datasets found
  1. h

    massive-scenario

    • huggingface.co
    Updated Sep 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    will brown (2024). massive-scenario [Dataset]. https://huggingface.co/datasets/willcb/massive-scenario
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2024
    Authors
    will brown
    Description

    willcb/massive-scenario dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. f

    Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...

    • tandf.figshare.com
    pdf
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peiyao Huang; Shuwei Li; Xinyuan Song (2025). Additive Hazards Regression Analysis of Massive Interval-Censored Data via Data Splitting [Dataset]. http://doi.org/10.6084/m9.figshare.27103243.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 12, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Peiyao Huang; Shuwei Li; Xinyuan Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has not attended to interval-censored outcomes, which are ubiquitous in cross-sectional or periodical follow-up studies. In this work, we propose an easily implemented divide-and-combine approach for analyzing massive interval-censored survival data under the additive hazards model. We establish the asymptotic properties of the proposed estimator, including the consistency and asymptotic normality. In addition, the divide-and-combine estimator is shown to be asymptotically equivalent to the full-data-based estimator obtained from analyzing all data together. Simulation studies suggest that, relative to the full-data-based approach, the proposed divide-and-combine approach has desirable advantage in terms of computation time, making it more applicable to large-scale data analysis. An application to a set of interval-censored data also demonstrates the practical utility of the proposed method.

  3. f

    Data from: S8 Fig -

    • plos.figshare.com
    zip
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc (2023). S8 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0289211.s009
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images—a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.

  4. R

    Oct Large Dataset

    • universe.roboflow.com
    zip
    Updated Sep 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Willoughby (2021). Oct Large Dataset [Dataset]. https://universe.roboflow.com/james-willoughby/oct-large
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 20, 2021
    Dataset authored and provided by
    James Willoughby
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    OCT Large

    ## Overview
    
    OCT Large is a dataset for object detection tasks - it contains Objects annotations for 239 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  5. h

    Massive-STEPS-Moscow

    • huggingface.co
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CRUISE Research Group (UNSW) (2025). Massive-STEPS-Moscow [Dataset]. https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Moscow
    Explore at:
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    CRUISE Research Group (UNSW)
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Moscow
    Description

    Massive-STEPS-Moscow

      Dataset Summary
    

    Massive-STEPSis a large-scale dataset of semantic trajectories intended for understanding POI check-ins. The dataset is derived from the Semantic Trails Dataset and Foursquare Open Source Places, and includes check-in data from 12 cities across 10 countries. The dataset is designed to facilitate research in various domains, including trajectory prediction, POI recommendation, and urban modeling. Massive-STEPS emphasizes the… See the full description on the dataset page: https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Moscow.

  6. u

    Data from: Supporting data for "Efficient phylogenetic tree inference for...

    • investigacion.usc.gal
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Piñeiro, César; Pichel, Juan, Carlos; Piñeiro, César; Pichel, Juan, Carlos (2024). Supporting data for "Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze one million taxa" [Dataset]. https://investigacion.usc.gal/documentos/67321c6baea56d4af04833f2
    Explore at:
    Dataset updated
    2024
    Authors
    Piñeiro, César; Pichel, Juan, Carlos; Piñeiro, César; Pichel, Juan, Carlos
    Description

    Phylogenies play a crucial role in biological research. Unfortunately, the search for the optimal phylogenetic tree incurs significant computational costs, and most of the existing state-of-the-art tools cannot deal with extremely large datasets in a reasonable time.
    New VeryFastTree (version 4.0) is able to construct a tree on a single server using single precision arithmetic from a massive one million alignment dataset in only 36 hours, which is 3Ă— and 3.2Ă— faster than its previous version and FastTree-2, respectively.
    Experimental results establish VeryFastTree as the fastest tool in the state-of-the-art for maximum-likelihood phylogeny estimation. It is publicly available at https://github.com/citiususc/veryfasttree. In addition, VeryFastTree is included as a package in Bioconda, MacPorts, and all Debian-based Linux distributions.

  7. D

    Data Center Processor Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Center Processor Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-center-processor-market-20618
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Center Processor market, valued at $11.98 billion in 2025, is projected to experience robust growth, driven by the increasing demand for high-performance computing in data centers globally. A Compound Annual Growth Rate (CAGR) of 7.80% from 2025 to 2033 indicates a significant expansion of this market. This growth is fueled by several key factors. The proliferation of cloud computing and big data analytics necessitates powerful processors capable of handling massive datasets and complex computations. Furthermore, the rise of artificial intelligence (AI) and machine learning (ML) applications, demanding significant processing power, is a major catalyst. The increasing adoption of virtualization and containerization technologies also contributes to the market's expansion, as these technologies require efficient resource management and powerful processors. Competition among major players like Intel, NVIDIA, AMD, and others drives innovation and fuels the development of advanced processor technologies, further enhancing performance and efficiency. Segmentation within the market includes CPUs, GPUs, FPGAs, and ASICs, each catering to specific computing needs. The market is geographically diverse, with North America, Europe, and Asia representing significant regional markets. While precise regional breakdowns are unavailable, it's reasonable to expect North America and Asia to hold the largest market shares given their strong presence in technology innovation and adoption. The market's restraints include the high initial investment costs associated with adopting advanced processor technologies and the potential for supply chain disruptions. However, these challenges are likely to be offset by the long-term benefits of improved performance, efficiency, and scalability that advanced processors offer. The continuous development of energy-efficient processors will also be a critical factor in mitigating the environmental impact and overall cost of operation. Looking ahead, the integration of specialized accelerators like SmartNICs and DPUs will continue to gain traction, further optimizing data center performance and efficiency. This signifies a shift towards specialized hardware tailored to specific workloads, increasing the complexity and potential for growth within the market. Data Center Processor Market: A Comprehensive Analysis (2019-2033) This insightful report provides a detailed analysis of the dynamic Data Center Processor market, encompassing the historical period (2019-2024), base year (2025), estimated year (2025), and forecast period (2025-2033). Valued at several billion USD, this market is experiencing significant growth driven by the increasing demand for high-performance computing, artificial intelligence, and big data analytics. The report delves into market segmentation, key players, emerging trends, and growth catalysts, offering a comprehensive understanding of this crucial technology sector. This report is essential for industry stakeholders, investors, and anyone seeking to understand the future of data center infrastructure. Recent developments include: February 2024: Arm Holdings released a new set of blueprints for making chips that it claims could cut the time required to develop data center processors to less than a year. Arm's technology for creating data center processors is already used by Amazon.com, Microsoft, and Ampere Computing, which supplies chips to Oracle. Arm announced a new generation of designs for the computing "cores" - the most central part of a data center chip., February 2024: Faraday announced that it collaborated with Arm and Intel to develop 64-core Intel 18A processors for its system-on-chip (SoCs) evaluation platform. The chips would be made by Intel Foundry Services (IFS) using its 18A fabrication process. The processor would integrate with Arm’s Neoverse Compute Subsystems and form part of Faraday’s SoC evaluation platform to support the development of data center servers, high-performance computing-related ASICs, and custom SoCs.. Key drivers for this market are: Increasing Deployment of AI in HPC Data Centers, Increasing Deployment of Data Center Facilities and Cloud-based Services. Potential restraints include: Increasing Deployment of AI in HPC Data Centers, Increasing Deployment of Data Center Facilities and Cloud-based Services. Notable trends are: The Central Processing Unit (CPU) Segment is Expected to Drive the Growth of the Market.

  8. Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

    • redivis.com
    application/jsonl +7
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Doerr School of Sustainability (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
    Explore at:
    avro, sas, arrow, csv, application/jsonl, parquet, spss, stataAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Doerr School of Sustainability
    Time period covered
    Jun 27, 2024
    Description

    Abstract

    S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

    Methodology

    The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

    https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

    %3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

    %3C!-- --%3E

  9. N

    Population Optic Radiation Maps created by CONSULT: HCP-M90 ses-1 meyersloop...

    • neurovault.org
    nifti
    Updated Aug 31, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Population Optic Radiation Maps created by CONSULT: HCP-M90 ses-1 meyersloop hemisphere-L [Dataset]. http://identifiers.org/neurovault.image:539731
    Explore at:
    niftiAvailable download formats
    Dataset updated
    Aug 31, 2021
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    HCP M90 Session 1 Left Hemisphere

    Collection description

    ***PLEASE READ OUR PUBLISHED PAPER CAREFULLY AND ENSURE YOU UNDERSTAND THESE IMAGES BEFORE USING ANY OF THIS INFORMATION CLINICALLY. WE CAN BE CONTACTED FOR CLARIFICATIONS.***

    This collection contains images of the outer loop, and partial middle loop, of the optic radiation. These are population averages, displayed as percentages of participants, as single subject maps cannot be released for privacy reasons.

    Please read our paper for how these images were generated. In brief, the CONSULT system created binarised tractography for each subject. We take the average of these binary maps *in MNI space* to create the images appearing here. and multiply the result by 100. The MNI template used is attached.

    Data are from multiple sources and filed as they appear in the paper. These sources are:
    1) HCP-*: Human Connectome Project data. These data were modified from their originals to test CONSULT using different quality data. Raw HCP data can be downloaded from the HCP website.
    2) Hospital-*: Data acquired by us on two hospital campuses. Some of these data are from neurosurgical patients. Three different scanners and acquisition protocols were used.
    3) MASSIVE-*: Data from the MASSIVE dataset. These data were modified from their originals to test CONSULT using different quality data. The original data can be downloaded from the MASSIVE website.

    Subject species

    homo sapiens

    Modality

    Diffusion MRI

    Analysis level

    group

    Cognitive paradigm (task)

    None / Other

    Map type

    Other

  10. w

    Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

    • data.wu.ac.at
    • cloud.csiss.gmu.edu
    • +6more
    application/unknown
    Updated Sep 8, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Aeronautics and Space Administration (2014). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://data.wu.ac.at/schema/data_gov/OGIxMWY3YjgtNmUwZi00MzY5LThjNzEtYTQzYTRkOWY1NWU5
    Explore at:
    application/unknownAvailable download formats
    Dataset updated
    Sep 8, 2014
    Dataset provided by
    National Aeronautics and Space Administration
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

  11. f

    DataSheet1_TurboPutative: A web server for data handling and metabolite...

    • figshare.com
    • frontiersin.figshare.com
    docx
    Updated Jun 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rafael Barrero-Rodríguez; Jose Manuel Rodriguez; Rocío Tarifa; Jesús Vázquez; Annalaura Mastrangelo; Alessia Ferrarini (2023). DataSheet1_TurboPutative: A web server for data handling and metabolite classification in untargeted metabolomics.docx [Dataset]. http://doi.org/10.3389/fmolb.2022.952149.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 9, 2023
    Dataset provided by
    Frontiers
    Authors
    Rafael Barrero-Rodríguez; Jose Manuel Rodriguez; Rocío Tarifa; Jesús Vázquez; Annalaura Mastrangelo; Alessia Ferrarini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Untargeted metabolomics aims at measuring the entire set of metabolites in a wide range of biological samples. However, due to the high chemical diversity of metabolites that range from small to large and more complex molecules (i.e., amino acids/carbohydrates vs. phospholipids/gangliosides), the identification and characterization of the metabolome remain a major bottleneck. The first step of this process consists of searching the experimental monoisotopic mass against databases, thus resulting in a highly redundant/complex list of candidates. Despite the progress in this area, researchers are still forced to manually explore the resulting table in order to prioritize the most likely identifications for further biological interpretation or confirmation with standards. Here, we present TurboPutative (https://proteomics.cnic.es/TurboPutative/), a flexible and user-friendly web-based platform composed of four modules (Tagger, REname, RowMerger, and TPMetrics) that streamlines data handling, classification, and interpretability of untargeted LC-MS-based metabolomics data. Tagger classifies the different compounds and provides preliminary insights into the biological system studied. REname improves putative annotation handling and visualization, allowing the recognition of isomers and equivalent compounds and redundant data removal. RowMerger reduces the dataset size, facilitating the manual comparison among annotations. Finally, TPMetrics combines different datasets with feature intensity and relevant information for the researcher and calculates a score based on adduct probability and feature correlations, facilitating further identification, assessment, and interpretation of the results. The TurboPutative web application allows researchers in the metabolomics field that are dealing with massive datasets containing multiple putative annotations to reduce the number of these entries by 80%–90%, thus facilitating the extrapolation of biological knowledge and improving metabolite prioritization for subsequent pathway analysis. TurboPutative comprises a rapid, automated, and customizable workflow that can also be included in programmed bioinformatics pipelines through its RESTful API services. Users can explore the performance of each module through demo datasets supplied on the website. The platform will help the metabolomics community to speed up the arduous task of manual data curation that is required in the first steps of metabolite identification, improving the generation of biological knowledge.

  12. A

    San Isabel National Forest and Leadville National Fish Hatchery, Mount...

    • data.amerigeoss.org
    pdf
    Updated Jul 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States[old] (2019). San Isabel National Forest and Leadville National Fish Hatchery, Mount Massive: A Report on Wilderness Character Monitoring [Dataset]. https://data.amerigeoss.org/sk/dataset/7037e5d9-143a-467a-b3d3-6332fbfda55e
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 27, 2019
    Dataset provided by
    United States[old]
    Area covered
    Mount Massive, Leadville
    Description

    This document is the completed effort of the U.S. Fish and Wildlife Service, Wilderness Fellows program to develop a monitoring strategy and evaluate the status of the Mount Massive Wilderness of the Leadville National Fish Hatchery and San Isabel National Forest. This document gives context to the status of the Mount Massive wilderness and identifies the major management challenges associated with maintaining wilderness character. This document is intended to be a reference source for readers interested in understanding the wilderness and to detail the natural and anthropogenic impacts that threaten the state of wilderness character. The Mount MassiveWilderness Character Monitoring Plan was developed using 35 distinct measures that assess the following: untrammeled quality, natural quality, undeveloped quality, and solitude or primitive and unconfined recreation quality.

  13. S

    Synthetic Data Platform Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Mar 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Synthetic Data Platform Report [Dataset]. https://www.marketresearchforecast.com/reports/synthetic-data-platform-33672
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 14, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Synthetic Data Platform market is experiencing robust growth, driven by the increasing need for data privacy and security, coupled with the rising demand for AI and machine learning model training. The market's expansion is fueled by several key factors. Firstly, stringent data privacy regulations like GDPR and CCPA are limiting the use of real-world data, creating a surge in demand for synthetic data that mimics the characteristics of real data without compromising sensitive information. Secondly, the expanding applications of AI and ML across diverse sectors like healthcare, finance, and transportation require massive datasets for effective model training. Synthetic data provides a scalable and cost-effective solution to this challenge, enabling organizations to build and test models without the limitations imposed by real data scarcity or privacy concerns. Finally, advancements in synthetic data generation techniques, including generative adversarial networks (GANs) and variational autoencoders (VAEs), are continuously improving the quality and realism of synthetic datasets, making them increasingly viable alternatives to real data. The market is segmented by application (Government, Retail & eCommerce, Healthcare & Life Sciences, BFSI, Transportation & Logistics, Telecom & IT, Manufacturing, Others) and type (Cloud-Based, On-Premises). While the cloud-based segment currently dominates due to its scalability and accessibility, the on-premises segment is expected to witness growth driven by organizations prioritizing data security and control. Geographically, North America and Europe are currently leading the market, owing to the presence of mature technological infrastructure and a high adoption rate of AI and ML technologies. However, Asia-Pacific is anticipated to show significant growth potential in the coming years, driven by increasing digitalization and investments in AI across the region. While challenges remain in terms of ensuring the quality and fidelity of synthetic data and addressing potential biases in generated datasets, the overall outlook for the Synthetic Data Platform market remains highly positive, with substantial growth projected over the forecast period. We estimate a CAGR of 25% from 2025 to 2033.

  14. R

    Face For Small Large Dataset

    • universe.roboflow.com
    zip
    Updated May 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ok (2024). Face For Small Large Dataset [Dataset]. https://universe.roboflow.com/ok-4sjtq/face-for-small-large/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 13, 2024
    Dataset authored and provided by
    ok
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Faces Bounding Boxes
    Description

    Face For Small Large

    ## Overview
    
    Face For Small Large is a dataset for object detection tasks - it contains Faces annotations for 389 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  15. f

    Details of dataset information.

    • plos.figshare.com
    xls
    Updated May 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fahmi H. Quradaa; Sara Shahzad; Rashad Saeed; Mubarak M. Sufyan (2024). Details of dataset information. [Dataset]. http://doi.org/10.1371/journal.pone.0302333.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 10, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Fahmi H. Quradaa; Sara Shahzad; Rashad Saeed; Mubarak M. Sufyan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In software development, it’s common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones—similar or identical code fragments—that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models’ efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.

  16. g

    Simple download service (Atom) of the dataset: Massive under the so-called...

    • gimi9.com
    • data.europa.eu
    Updated Mar 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Simple download service (Atom) of the dataset: Massive under the so-called “mountain law” law in Midi-Pyrénées [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-3a563705-936d-44a7-9879-ab944dd0e4a1/
    Explore at:
    Dataset updated
    Mar 10, 2022
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Midi-Pyrénées
    Description

    The concept of massif is to be distinguished from the concept of mountain.According to the texts in force, in France, a mountain area includes municipalities or parts of municipalities characterised by:- the existence, because of altitude (minimum 700 m, except for the Vosges at 600 m, and the Mediterranean mountains at 800 m), very difficult climatic conditions which result in a period of vegetation significantly shortened; either the presence, at a lower altitude, in most of the territory (at least 80 %) of steep slopes (above 20 %), such that mechanisation is not possible or requires the use of very expensive equipment;- or the combination of these two factors.On several occasions, the delimitation of the mountain areas has been enriched and completed. Today, it distinguishes several geographical units according to the intensity of their mountain character (from the foothill to the high mountain). The massif includes not only mountain areas but also areas immediately adjacent to them: foothills or even plains if the latter ensure the continuity of the massif. This enlargement takes account of interactions and exchanges between highland areas and the plains, which makes it possible to set up more relevant spatial planning projects.The concept of a massive area allows to have an administrative entity competent to carry out mountain policy.

  17. h

    Massive-STEPS-Jakarta

    • huggingface.co
    Updated May 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CRUISE Research Group (UNSW) (2025). Massive-STEPS-Jakarta [Dataset]. https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Jakarta
    Explore at:
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    CRUISE Research Group (UNSW)
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Jakarta
    Description

    Massive-STEPS-Jakarta

      Dataset Summary
    

    Massive-STEPSis a large-scale dataset of semantic trajectories intended for understanding POI check-ins. The dataset is derived from the Semantic Trails Dataset and Foursquare Open Source Places, and includes check-in data from 12 cities across 10 countries. The dataset is designed to facilitate research in various domains, including trajectory prediction, POI recommendation, and urban modeling. Massive-STEPS emphasizes the… See the full description on the dataset page: https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Jakarta.

  18. Big Data and Business Analytics Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Big Data and Business Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/big-data-and-business-analytics-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Big Data and Business Analytics Market Outlook



    In 2023, the global Big Data and Business Analytics market size is estimated to be valued at approximately $274 billion, and with a projected compound annual growth rate (CAGR) of 12.4%, it is anticipated to reach around $693 billion by 2032. This significant growth is driven by the escalating demand for data-driven decision-making processes across various industries, which leverage insights derived from vast data sets to enhance business efficiency, optimize operations, and drive innovation. The increasing adoption of Internet of Things (IoT) devices, coupled with the exponential growth of data generated daily, further propels the need for advanced analytics solutions to harness and interpret this information effectively.



    A critical growth factor in the Big Data and Business Analytics market is the increasing reliance on data to gain a competitive edge. Organizations are now more than ever looking to uncover hidden patterns, correlations, and insights from the data they collect to make informed decisions. This trend is especially prominent in industries such as retail, where understanding consumer behavior can lead to personalized marketing strategies, and in healthcare, where data analytics can improve patient outcomes through precision medicine. Moreover, the integration of big data analytics with artificial intelligence and machine learning technologies is enabling more accurate predictions and real-time decision-making, further enhancing the value proposition of these analytics solutions.



    Another key driver of market growth is the continuous technological advancements and innovations in data analytics tools and platforms. Companies are increasingly investing in advanced analytics capabilities, such as predictive analytics, prescriptive analytics, and real-time analytics, to gain deeper insights into their operations and market environments. The development of user-friendly and self-service analytics tools is also democratizing data access within organizations, empowering employees at all levels to leverage data in their daily decision-making processes. This democratization of data analytics is reducing the reliance on specialized data scientists, thereby accelerating the adoption of big data analytics across various business functions.



    The increasing emphasis on regulatory compliance and data privacy is also driving growth in the Big Data and Business Analytics market. Strict regulations, such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States, require organizations to manage and analyze data responsibly. This is prompting businesses to invest in robust analytics solutions that not only help them comply with these regulations but also ensure data integrity and security. Additionally, as data breaches and cybersecurity threats continue to rise, organizations are turning to analytics solutions to identify potential vulnerabilities and mitigate risks effectively.



    Regionally, North America remains a dominant player in the Big Data and Business Analytics market, benefiting from the presence of major technology companies and a high rate of digital adoption. The Asia Pacific region, however, is emerging as a significant growth area, driven by rapid industrialization, urbanization, and increasing investments in digital transformation initiatives. Europe also showcases a robust market, fueled by stringent data protection regulations and a strong focus on innovation. Meanwhile, the markets in Latin America and the Middle East & Africa are gradually gaining momentum as organizations in these regions are increasingly recognizing the value of data analytics in enhancing business outcomes and driving economic growth.



    Component Analysis



    The Big Data and Business Analytics market is segmented by components into software, services, and hardware, each playing a crucial role in the ecosystem. Software components, which include data management and analytics tools, are at the forefront, offering solutions that facilitate the collection, analysis, and visualization of large data sets. The software segment is driven by a demand for scalable solutions that can handle the increasing volume, velocity, and variety of data. As organizations strive to become more data-centric, there is a growing need for advanced analytics software that can provide actionable insights from complex data sets, leading to enhanced decision-making capabilities.



    In the services segment, businesses are increasingly seeking consultation, implementation, and support services to effective

  19. Enterprise High Performance Computing Market Report | Global Forecast From...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Enterprise High Performance Computing Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/enterprise-high-performance-computing-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Enterprise High Performance Computing Market Outlook



    The enterprise high-performance computing (HPC) market size was valued at approximately USD 37 billion in 2023 and is projected to reach USD 64.8 billion by 2032, reflecting a compound annual growth rate (CAGR) of 6.7% during the forecast period. This robust growth trajectory is driven by several factors, including the increasing complexity of applications requiring advanced computational capabilities, the accelerating adoption of artificial intelligence and machine learning technologies across industries, and the growing necessity for detailed analytics to drive business insights. High-performance computing is becoming increasingly critical as businesses seek to leverage data-driven strategies to maintain competitive edge and foster innovation.



    One of the primary growth drivers in the enterprise HPC market is the rising demand for real-time data analysis and simulation. Industries such as healthcare, finance, and manufacturing are increasingly relying on HPC systems to process vast amounts of data quickly and accurately. For instance, in healthcare, HPC is pivotal in drug discovery and genomics, enabling researchers to analyze complex biological data and accelerate the development of new treatments. Similarly, in finance, these systems facilitate risk management and fraud detection by processing large volumes of transactions and identifying patterns in real-time. The ability of HPC solutions to deliver rapid insights from massive datasets is a critical factor propelling the market forward.



    Another significant factor contributing to market growth is the escalating integration of artificial intelligence (AI) and machine learning (ML) with HPC systems. AI and ML models require considerable computational resources to train and deploy effectively, resources that are well provided by HPC environments. The synergy between AI/ML and HPC is enabling industries to automate processes and innovate at unprecedented scales. In manufacturing, for example, predictive maintenance and quality control processes are significantly enhanced through the application of AI models powered by HPC, leading to reduced downtime and improved product quality. Furthermore, the energy sector is utilizing these capabilities for advanced modeling and simulations to optimize resource extraction and energy distribution.



    High-Performance Computing Software is at the heart of this transformative shift, providing the necessary tools and frameworks to harness the full potential of HPC systems. These software solutions are designed to optimize computational processes, enabling businesses to execute complex simulations and data analyses with unprecedented speed and accuracy. By leveraging high-performance computing software, organizations can streamline their workflows, reduce processing times, and enhance the overall efficiency of their operations. This capability is particularly crucial in industries where time-sensitive data processing is essential, such as financial services and healthcare, where rapid insights can lead to significant competitive advantages.



    Cloud adoption is also playing a crucial role in the expansion of the enterprise HPC market. The availability of cloud-based HPC solutions allows organizations, particularly small and medium enterprises (SMEs), to access powerful computational resources on-demand without the substantial capital investment typically associated with on-premises setups. This democratization of HPC through the cloud is enabling a wider range of businesses to harness the benefits of high-speed data processing and analysis, thus broadening the market. Moreover, cloud providers are continuously enhancing their HPC offerings with advanced technologies and services, further fueling market growth.



    Component Analysis



    The component segment within the enterprise HPC market can be broken down into hardware, software, and services, each playing a pivotal role in the ecosystem. Hardware components, including processors, storage devices, and networking equipment, form the backbone of HPC systems. The demand for advanced hardware solutions is driven by the need to handle complex computational tasks efficiently. With the advent of more sophisticated processors and GPUs, the performance capabilities of HPC systems continue to grow, enabling them to tackle workloads of increasing complexity. Companies are investing heavily in research and development to produce hardware that meets the evolving needs of various industries.


  20. g

    A Large Scale Fish Dataset

    • gts.ai
    json
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). A Large Scale Fish Dataset [Dataset]. https://gts.ai/dataset-download/a-large-scale-fish-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Mar 20, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was collected in order to carry out segmentation, feature extraction, and classification tasks and compare the common segmentation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
will brown (2024). massive-scenario [Dataset]. https://huggingface.co/datasets/willcb/massive-scenario

massive-scenario

willcb/massive-scenario

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2024
Authors
will brown
Description

willcb/massive-scenario dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu