77 datasets found

h
massive-scenario
huggingface.co
Updated Sep 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
will brown (2024). massive-scenario [Dataset]. https://huggingface.co/datasets/willcb/massive-scenario
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2024
Authors
will brown
Description
willcb/massive-scenario dataset hosted on Hugging Face and contributed by the HF Datasets community
f
Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...
tandf.figshare.com
pdf
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peiyao Huang; Shuwei Li; Xinyuan Song (2025). Additive Hazards Regression Analysis of Massive Interval-Censored Data via Data Splitting [Dataset]. http://doi.org/10.6084/m9.figshare.27103243.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27103243.v1
Dataset updated
May 12, 2025
Dataset provided by
Taylor & Francis
Authors
Peiyao Huang; Shuwei Li; Xinyuan Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has not attended to interval-censored outcomes, which are ubiquitous in cross-sectional or periodical follow-up studies. In this work, we propose an easily implemented divide-and-combine approach for analyzing massive interval-censored survival data under the additive hazards model. We establish the asymptotic properties of the proposed estimator, including the consistency and asymptotic normality. In addition, the divide-and-combine estimator is shown to be asymptotically equivalent to the full-data-based estimator obtained from analyzing all data together. Simulation studies suggest that, relative to the full-data-based approach, the proposed divide-and-combine approach has desirable advantage in terms of computation time, making it more applicable to large-scale data analysis. An application to a set of interval-censored data also demonstrates the practical utility of the proposed method.
f
Data from: S8 Fig -
plos.figshare.com
zip
Updated Aug 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc (2023). S8 Fig - [Dataset]. http://doi.org/10.1371/journal.pone.0289211.s009
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0289211.s009
Dataset updated
Aug 3, 2023
Dataset provided by
PLOS ONE
Authors
Aaron Berk; Gulcenur Ozturan; Parsa Delavari; David Maberley; Özgür Yılmaz; Ipek Oruc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Deep learning (DL) techniques have seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for the development of automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging particularly amenable to such automated approaches. Recent work in the analysis of fundus images using CNNs relies on access to massive datasets for training and validation, composed of hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images—a trait thought not to be present or quantifiable in fundus images until recently. Specifically, we fine-tune a Resnet-152 model whose last layer has been modified to a fully-connected layer for binary classification. We carried out several experiments to assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior results in the literature. Our results show that binary classification, even with a hard task such as sex categorization from retinal fundus images, is possible with very small datasets. Our domain adaptation results show that models trained with one distribution of images may generalize well to an independent external source, as in the case of models trained on DOVS and tested on ODIR. Our results also show that eliminating poor quality images may hamper training of the CNN due to reducing the already small dataset size even further. Nevertheless, using high quality images may be an important factor as evidenced by superior generalizability of results in the domain adaptation experiments. Finally, our work shows that ensembling is an important tool in maximizing performance of deep CNNs in the context of small development datasets.
R
Oct Large Dataset
universe.roboflow.com
zip
Updated Sep 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Willoughby (2021). Oct Large Dataset [Dataset]. https://universe.roboflow.com/james-willoughby/oct-large
Explore at:
zipAvailable download formats
Dataset updated
Sep 20, 2021
Dataset authored and provided by
James Willoughby
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects Bounding Boxes
Description
OCT Large

## Overview OCT Large is a dataset for object detection tasks - it contains Objects annotations for 239 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
h
Massive-STEPS-Moscow
huggingface.co
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CRUISE Research Group (UNSW) (2025). Massive-STEPS-Moscow [Dataset]. https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Moscow
Explore at:
Dataset updated
May 19, 2025
Dataset authored and provided by
CRUISE Research Group (UNSW)
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Moscow
Description
Massive-STEPS-Moscow

Dataset Summary

Massive-STEPSis a large-scale dataset of semantic trajectories intended for understanding POI check-ins. The dataset is derived from the Semantic Trails Dataset and Foursquare Open Source Places, and includes check-in data from 12 cities across 10 countries. The dataset is designed to facilitate research in various domains, including trajectory prediction, POI recommendation, and urban modeling. Massive-STEPS emphasizes the… See the full description on the dataset page: https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Moscow.
u
Data from: Supporting data for "Efficient phylogenetic tree inference for...
investigacion.usc.gal
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Piñeiro, César; Pichel, Juan, Carlos; Piñeiro, César; Pichel, Juan, Carlos (2024). Supporting data for "Efficient phylogenetic tree inference for massive taxonomic datasets: harnessing the power of a server to analyze one million taxa" [Dataset]. https://investigacion.usc.gal/documentos/67321c6baea56d4af04833f2
Explore at:
Dataset updated
2024
Authors
Piñeiro, César; Pichel, Juan, Carlos; Piñeiro, César; Pichel, Juan, Carlos
Description
Phylogenies play a crucial role in biological research. Unfortunately, the search for the optimal phylogenetic tree incurs significant computational costs, and most of the existing state-of-the-art tools cannot deal with extremely large datasets in a reasonable time.
New VeryFastTree (version 4.0) is able to construct a tree on a single server using single precision arithmetic from a massive one million alignment dataset in only 36 hours, which is 3× and 3.2× faster than its previous version and FastTree-2, respectively.
Experimental results establish VeryFastTree as the fastest tool in the state-of-the-art for maximum-likelihood phylogeny estimation. It is publicly available at https://github.com/citiususc/veryfasttree. In addition, VeryFastTree is included as a package in Bioconda, MacPorts, and all Debian-based Linux distributions.
D
Data Center Processor Market Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Center Processor Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-center-processor-market-20618
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Center Processor market, valued at $11.98 billion in 2025, is projected to experience robust growth, driven by the increasing demand for high-performance computing in data centers globally. A Compound Annual Growth Rate (CAGR) of 7.80% from 2025 to 2033 indicates a significant expansion of this market. This growth is fueled by several key factors. The proliferation of cloud computing and big data analytics necessitates powerful processors capable of handling massive datasets and complex computations. Furthermore, the rise of artificial intelligence (AI) and machine learning (ML) applications, demanding significant processing power, is a major catalyst. The increasing adoption of virtualization and containerization technologies also contributes to the market's expansion, as these technologies require efficient resource management and powerful processors. Competition among major players like Intel, NVIDIA, AMD, and others drives innovation and fuels the development of advanced processor technologies, further enhancing performance and efficiency. Segmentation within the market includes CPUs, GPUs, FPGAs, and ASICs, each catering to specific computing needs. The market is geographically diverse, with North America, Europe, and Asia representing significant regional markets. While precise regional breakdowns are unavailable, it's reasonable to expect North America and Asia to hold the largest market shares given their strong presence in technology innovation and adoption. The market's restraints include the high initial investment costs associated with adopting advanced processor technologies and the potential for supply chain disruptions. However, these challenges are likely to be offset by the long-term benefits of improved performance, efficiency, and scalability that advanced processors offer. The continuous development of energy-efficient processors will also be a critical factor in mitigating the environmental impact and overall cost of operation. Looking ahead, the integration of specialized accelerators like SmartNICs and DPUs will continue to gain traction, further optimizing data center performance and efficiency. This signifies a shift towards specialized hardware tailored to specific workloads, increasing the complexity and potential for growth within the market. Data Center Processor Market: A Comprehensive Analysis (2019-2033) This insightful report provides a detailed analysis of the dynamic Data Center Processor market, encompassing the historical period (2019-2024), base year (2025), estimated year (2025), and forecast period (2025-2033). Valued at several billion USD, this market is experiencing significant growth driven by the increasing demand for high-performance computing, artificial intelligence, and big data analytics. The report delves into market segmentation, key players, emerging trends, and growth catalysts, offering a comprehensive understanding of this crucial technology sector. This report is essential for industry stakeholders, investors, and anyone seeking to understand the future of data center infrastructure. Recent developments include: February 2024: Arm Holdings released a new set of blueprints for making chips that it claims could cut the time required to develop data center processors to less than a year. Arm's technology for creating data center processors is already used by Amazon.com, Microsoft, and Ampere Computing, which supplies chips to Oracle. Arm announced a new generation of designs for the computing "cores" - the most central part of a data center chip., February 2024: Faraday announced that it collaborated with Arm and Intel to develop 64-core Intel 18A processors for its system-on-chip (SoCs) evaluation platform. The chips would be made by Intel Foundry Services (IFS) using its 18A fabrication process. The processor would integrate with Arm’s Neoverse Compute Subsystems and form part of Faraday’s SoC evaluation platform to support the development of data center servers, high-performance computing-related ASICs, and custom SoCs.. Key drivers for this market are: Increasing Deployment of AI in HPC Data Centers, Increasing Deployment of Data Center Facilities and Cloud-based Services. Potential restraints include: Increasing Deployment of AI in HPC Data Centers, Increasing Deployment of Data Center Facilities and Cloud-based Services. Notable trends are: The Central Processing Unit (CPU) Segment is Expected to Drive the Growth of the Market.
Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)
redivis.com
application/jsonl +7
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Doerr School of Sustainability (2024). Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS) [Dataset]. http://doi.org/10.57761/gk3g-wc33
Explore at:
avro, sas, arrow, csv, application/jsonl, parquet, spss, stataAvailable download formats
Unique identifier
https://doi.org/10.57761/gk3g-wc33
Dataset updated
Jun 28, 2024
Dataset provided by
Redivis Inc.
Authors
Stanford Doerr School of Sustainability
Time period covered
Jun 27, 2024
Description
Abstract

S3DIS comprises 6 colored 3D point clouds from 6 large-scale indoor areas, along with semantic instance annotations for 12 object categories (wall, floor, ceiling, beam, column, window, door, sofa, desk, chair, bookcase, and board).

Methodology

The Stanford Large-Scale 3D Indoor Spaces (S3DIS) dataset is composed of the colored 3D point clouds of six large-scale indoor areas from three different buildings, each covering approximately 935, 965, 450, 1700, 870, and 1100 square meters (total of 6020 square meters). These areas show diverse properties in architectural style and appearance and include mainly office areas, educational and exhibition spaces, and conference rooms, personal offices, restrooms, open spaces, lobbies, stairways, and hallways are commonly found therein. The entire point clouds are automatically generated without any manual intervention using the Matterport scanner. The dataset also includes semantic instance annotations on the point clouds for 12 semantic elements, which are structural elements (ceiling, floor, wall, beam, column, window, and door) and commonly found items and furniture (table, chair, sofa, bookcase, and board).

https://redivis.com/fileUploads/5bdaf09c-7d3b-4a91-b192-d98a0f0b0018%3E" alt="S3DIS.png">

%3Cu%3E%3Cstrong%3EImportant Information%3C/strong%3E%3C/u%3E

This paper was presented in the "3D Semantic Parsing of Large-Scale Indoor Spaces", CVPR 2016.

Project website: http://buildingparser.stanford.edu/

%3C!-- --%3E
N
Population Optic Radiation Maps created by CONSULT: HCP-M90 ses-1 meyersloop...
neurovault.org
nifti
Updated Aug 31, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Population Optic Radiation Maps created by CONSULT: HCP-M90 ses-1 meyersloop hemisphere-L [Dataset]. http://identifiers.org/neurovault.image:539731
Explore at:
niftiAvailable download formats
Unique identifier
https://identifiers.org/neurovault.image:539731
Dataset updated
Aug 31, 2021
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
HCP M90 Session 1 Left Hemisphere

Collection description

***PLEASE READ OUR PUBLISHED PAPER CAREFULLY AND ENSURE YOU UNDERSTAND THESE IMAGES BEFORE USING ANY OF THIS INFORMATION CLINICALLY. WE CAN BE CONTACTED FOR CLARIFICATIONS.***

This collection contains images of the outer loop, and partial middle loop, of the optic radiation. These are population averages, displayed as percentages of participants, as single subject maps cannot be released for privacy reasons.

Please read our paper for how these images were generated. In brief, the CONSULT system created binarised tractography for each subject. We take the average of these binary maps *in MNI space* to create the images appearing here. and multiply the result by 100. The MNI template used is attached.

Data are from multiple sources and filed as they appear in the paper. These sources are:
1) HCP-*: Human Connectome Project data. These data were modified from their originals to test CONSULT using different quality data. Raw HCP data can be downloaded from the HCP website.
2) Hospital-*: Data acquired by us on two hospital campuses. Some of these data are from neurosurgical patients. Three different scanners and acquisition protocols were used.
3) MASSIVE-*: Data from the MASSIVE dataset. These data were modified from their originals to test CONSULT using different quality data. The original data can be downloaded from the MASSIVE website.

Subject species

homo sapiens

Modality

Diffusion MRI

Analysis level

group

Cognitive paradigm (task)

None / Other

Map type

Other
w
Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...
data.wu.ac.at
cloud.csiss.gmu.edu
+6more
application/unknown
Updated Sep 8, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2014). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. https://data.wu.ac.at/schema/data_gov/OGIxMWY3YjgtNmUwZi00MzY5LThjNzEtYTQzYTRkOWY1NWU5
Explore at:
application/unknownAvailable download formats
Dataset updated
Sep 8, 2014
Dataset provided by
National Aeronautics and Space Administration
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
f
DataSheet1_TurboPutative: A web server for data handling and metabolite...
figshare.com
frontiersin.figshare.com
docx
Updated Jun 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Barrero-Rodríguez; Jose Manuel Rodriguez; Rocío Tarifa; Jesús Vázquez; Annalaura Mastrangelo; Alessia Ferrarini (2023). DataSheet1_TurboPutative: A web server for data handling and metabolite classification in untargeted metabolomics.docx [Dataset]. http://doi.org/10.3389/fmolb.2022.952149.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2022.952149.s001
Dataset updated
Jun 9, 2023
Dataset provided by
Frontiers
Authors
Rafael Barrero-Rodríguez; Jose Manuel Rodriguez; Rocío Tarifa; Jesús Vázquez; Annalaura Mastrangelo; Alessia Ferrarini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Untargeted metabolomics aims at measuring the entire set of metabolites in a wide range of biological samples. However, due to the high chemical diversity of metabolites that range from small to large and more complex molecules (i.e., amino acids/carbohydrates vs. phospholipids/gangliosides), the identification and characterization of the metabolome remain a major bottleneck. The first step of this process consists of searching the experimental monoisotopic mass against databases, thus resulting in a highly redundant/complex list of candidates. Despite the progress in this area, researchers are still forced to manually explore the resulting table in order to prioritize the most likely identifications for further biological interpretation or confirmation with standards. Here, we present TurboPutative (https://proteomics.cnic.es/TurboPutative/), a flexible and user-friendly web-based platform composed of four modules (Tagger, REname, RowMerger, and TPMetrics) that streamlines data handling, classification, and interpretability of untargeted LC-MS-based metabolomics data. Tagger classifies the different compounds and provides preliminary insights into the biological system studied. REname improves putative annotation handling and visualization, allowing the recognition of isomers and equivalent compounds and redundant data removal. RowMerger reduces the dataset size, facilitating the manual comparison among annotations. Finally, TPMetrics combines different datasets with feature intensity and relevant information for the researcher and calculates a score based on adduct probability and feature correlations, facilitating further identification, assessment, and interpretation of the results. The TurboPutative web application allows researchers in the metabolomics field that are dealing with massive datasets containing multiple putative annotations to reduce the number of these entries by 80%–90%, thus facilitating the extrapolation of biological knowledge and improving metabolite prioritization for subsequent pathway analysis. TurboPutative comprises a rapid, automated, and customizable workflow that can also be included in programmed bioinformatics pipelines through its RESTful API services. Users can explore the performance of each module through demo datasets supplied on the website. The platform will help the metabolomics community to speed up the arduous task of manual data curation that is required in the first steps of metabolite identification, improving the generation of biological knowledge.
A
San Isabel National Forest and Leadville National Fish Hatchery, Mount...
data.amerigeoss.org
pdf
Updated Jul 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States[old] (2019). San Isabel National Forest and Leadville National Fish Hatchery, Mount Massive: A Report on Wilderness Character Monitoring [Dataset]. https://data.amerigeoss.org/sk/dataset/7037e5d9-143a-467a-b3d3-6332fbfda55e
Explore at:
pdfAvailable download formats
Dataset updated
Jul 27, 2019
Dataset provided by
United States[old]
Area covered
Mount Massive, Leadville
Description
This document is the completed effort of the U.S. Fish and Wildlife Service, Wilderness Fellows program to develop a monitoring strategy and evaluate the status of the Mount Massive Wilderness of the Leadville National Fish Hatchery and San Isabel National Forest. This document gives context to the status of the Mount Massive wilderness and identifies the major management challenges associated with maintaining wilderness character. This document is intended to be a reference source for readers interested in understanding the wilderness and to detail the natural and anthropogenic impacts that threaten the state of wilderness character. The Mount MassiveWilderness Character Monitoring Plan was developed using 35 distinct measures that assess the following: untrammeled quality, natural quality, undeveloped quality, and solitude or primitive and unconfined recreation quality.
S
Synthetic Data Platform Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Synthetic Data Platform Report [Dataset]. https://www.marketresearchforecast.com/reports/synthetic-data-platform-33672
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 14, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Synthetic Data Platform market is experiencing robust growth, driven by the increasing need for data privacy and security, coupled with the rising demand for AI and machine learning model training. The market's expansion is fueled by several key factors. Firstly, stringent data privacy regulations like GDPR and CCPA are limiting the use of real-world data, creating a surge in demand for synthetic data that mimics the characteristics of real data without compromising sensitive information. Secondly, the expanding applications of AI and ML across diverse sectors like healthcare, finance, and transportation require massive datasets for effective model training. Synthetic data provides a scalable and cost-effective solution to this challenge, enabling organizations to build and test models without the limitations imposed by real data scarcity or privacy concerns. Finally, advancements in synthetic data generation techniques, including generative adversarial networks (GANs) and variational autoencoders (VAEs), are continuously improving the quality and realism of synthetic datasets, making them increasingly viable alternatives to real data. The market is segmented by application (Government, Retail & eCommerce, Healthcare & Life Sciences, BFSI, Transportation & Logistics, Telecom & IT, Manufacturing, Others) and type (Cloud-Based, On-Premises). While the cloud-based segment currently dominates due to its scalability and accessibility, the on-premises segment is expected to witness growth driven by organizations prioritizing data security and control. Geographically, North America and Europe are currently leading the market, owing to the presence of mature technological infrastructure and a high adoption rate of AI and ML technologies. However, Asia-Pacific is anticipated to show significant growth potential in the coming years, driven by increasing digitalization and investments in AI across the region. While challenges remain in terms of ensuring the quality and fidelity of synthetic data and addressing potential biases in generated datasets, the overall outlook for the Synthetic Data Platform market remains highly positive, with substantial growth projected over the forecast period. We estimate a CAGR of 25% from 2025 to 2033.
R
Face For Small Large Dataset
universe.roboflow.com
zip
Updated May 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ok (2024). Face For Small Large Dataset [Dataset]. https://universe.roboflow.com/ok-4sjtq/face-for-small-large/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
May 13, 2024
Dataset authored and provided by
ok
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Faces Bounding Boxes
Description
Face For Small Large

## Overview Face For Small Large is a dataset for object detection tasks - it contains Faces annotations for 389 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Details of dataset information.
plos.figshare.com
xls
Updated May 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahmi H. Quradaa; Sara Shahzad; Rashad Saeed; Mubarak M. Sufyan (2024). Details of dataset information. [Dataset]. http://doi.org/10.1371/journal.pone.0302333.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0302333.t005
Dataset updated
May 10, 2024
Dataset provided by
PLOS ONE
Authors
Fahmi H. Quradaa; Sara Shahzad; Rashad Saeed; Mubarak M. Sufyan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In software development, it’s common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones—similar or identical code fragments—that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models’ efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.
g
Simple download service (Atom) of the dataset: Massive under the so-called...
gimi9.com
data.europa.eu
Updated Mar 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Simple download service (Atom) of the dataset: Massive under the so-called “mountain law” law in Midi-Pyrénées [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-3a563705-936d-44a7-9879-ab944dd0e4a1/
Explore at:
Dataset updated
Mar 10, 2022
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Midi-Pyrénées
Description
The concept of massif is to be distinguished from the concept of mountain.According to the texts in force, in France, a mountain area includes municipalities or parts of municipalities characterised by:- the existence, because of altitude (minimum 700 m, except for the Vosges at 600 m, and the Mediterranean mountains at 800 m), very difficult climatic conditions which result in a period of vegetation significantly shortened; either the presence, at a lower altitude, in most of the territory (at least 80 %) of steep slopes (above 20 %), such that mechanisation is not possible or requires the use of very expensive equipment;- or the combination of these two factors.On several occasions, the delimitation of the mountain areas has been enriched and completed. Today, it distinguishes several geographical units according to the intensity of their mountain character (from the foothill to the high mountain). The massif includes not only mountain areas but also areas immediately adjacent to them: foothills or even plains if the latter ensure the continuity of the massif. This enlargement takes account of interactions and exchanges between highland areas and the plains, which makes it possible to set up more relevant spatial planning projects.The concept of a massive area allows to have an administrative entity competent to carry out mountain policy.
h
Massive-STEPS-Jakarta
huggingface.co
Updated May 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CRUISE Research Group (UNSW) (2025). Massive-STEPS-Jakarta [Dataset]. https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Jakarta
Explore at:
Dataset updated
May 19, 2025
Dataset authored and provided by
CRUISE Research Group (UNSW)
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Jakarta
Description
Massive-STEPS-Jakarta

Dataset Summary

Massive-STEPSis a large-scale dataset of semantic trajectories intended for understanding POI check-ins. The dataset is derived from the Semantic Trails Dataset and Foursquare Open Source Places, and includes check-in data from 12 cities across 10 countries. The dataset is designed to facilitate research in various domains, including trajectory prediction, POI recommendation, and urban modeling. Massive-STEPS emphasizes the… See the full description on the dataset page: https://huggingface.co/datasets/CRUISEResearchGroup/Massive-STEPS-Jakarta.
Big Data and Business Analytics Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Big Data and Business Analytics Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/big-data-and-business-analytics-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Dec 3, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Big Data and Business Analytics Market Outlook

In 2023, the global Big Data and Business Analytics market size is estimated to be valued at approximately $274 billion, and with a projected compound annual growth rate (CAGR) of 12.4%, it is anticipated to reach around $693 billion by 2032. This significant growth is driven by the escalating demand for data-driven decision-making processes across various industries, which leverage insights derived from vast data sets to enhance business efficiency, optimize operations, and drive innovation. The increasing adoption of Internet of Things (IoT) devices, coupled with the exponential growth of data generated daily, further propels the need for advanced analytics solutions to harness and interpret this information effectively.

A critical growth factor in the Big Data and Business Analytics market is the increasing reliance on data to gain a competitive edge. Organizations are now more than ever looking to uncover hidden patterns, correlations, and insights from the data they collect to make informed decisions. This trend is especially prominent in industries such as retail, where understanding consumer behavior can lead to personalized marketing strategies, and in healthcare, where data analytics can improve patient outcomes through precision medicine. Moreover, the integration of big data analytics with artificial intelligence and machine learning technologies is enabling more accurate predictions and real-time decision-making, further enhancing the value proposition of these analytics solutions.

Another key driver of market growth is the continuous technological advancements and innovations in data analytics tools and platforms. Companies are increasingly investing in advanced analytics capabilities, such as predictive analytics, prescriptive analytics, and real-time analytics, to gain deeper insights into their operations and market environments. The development of user-friendly and self-service analytics tools is also democratizing data access within organizations, empowering employees at all levels to leverage data in their daily decision-making processes. This democratization of data analytics is reducing the reliance on specialized data scientists, thereby accelerating the adoption of big data analytics across various business functions.

The increasing emphasis on regulatory compliance and data privacy is also driving growth in the Big Data and Business Analytics market. Strict regulations, such as the General Data Protection Regulation (GDPR) in Europe and the California Consumer Privacy Act (CCPA) in the United States, require organizations to manage and analyze data responsibly. This is prompting businesses to invest in robust analytics solutions that not only help them comply with these regulations but also ensure data integrity and security. Additionally, as data breaches and cybersecurity threats continue to rise, organizations are turning to analytics solutions to identify potential vulnerabilities and mitigate risks effectively.

Regionally, North America remains a dominant player in the Big Data and Business Analytics market, benefiting from the presence of major technology companies and a high rate of digital adoption. The Asia Pacific region, however, is emerging as a significant growth area, driven by rapid industrialization, urbanization, and increasing investments in digital transformation initiatives. Europe also showcases a robust market, fueled by stringent data protection regulations and a strong focus on innovation. Meanwhile, the markets in Latin America and the Middle East & Africa are gradually gaining momentum as organizations in these regions are increasingly recognizing the value of data analytics in enhancing business outcomes and driving economic growth.

Component Analysis

The Big Data and Business Analytics market is segmented by components into software, services, and hardware, each playing a crucial role in the ecosystem. Software components, which include data management and analytics tools, are at the forefront, offering solutions that facilitate the collection, analysis, and visualization of large data sets. The software segment is driven by a demand for scalable solutions that can handle the increasing volume, velocity, and variety of data. As organizations strive to become more data-centric, there is a growing need for advanced analytics software that can provide actionable insights from complex data sets, leading to enhanced decision-making capabilities.

In the services segment, businesses are increasingly seeking consultation, implementation, and support services to effective
Enterprise High Performance Computing Market Report | Global Forecast From...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Enterprise High Performance Computing Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/enterprise-high-performance-computing-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Enterprise High Performance Computing Market Outlook

The enterprise high-performance computing (HPC) market size was valued at approximately USD 37 billion in 2023 and is projected to reach USD 64.8 billion by 2032, reflecting a compound annual growth rate (CAGR) of 6.7% during the forecast period. This robust growth trajectory is driven by several factors, including the increasing complexity of applications requiring advanced computational capabilities, the accelerating adoption of artificial intelligence and machine learning technologies across industries, and the growing necessity for detailed analytics to drive business insights. High-performance computing is becoming increasingly critical as businesses seek to leverage data-driven strategies to maintain competitive edge and foster innovation.

One of the primary growth drivers in the enterprise HPC market is the rising demand for real-time data analysis and simulation. Industries such as healthcare, finance, and manufacturing are increasingly relying on HPC systems to process vast amounts of data quickly and accurately. For instance, in healthcare, HPC is pivotal in drug discovery and genomics, enabling researchers to analyze complex biological data and accelerate the development of new treatments. Similarly, in finance, these systems facilitate risk management and fraud detection by processing large volumes of transactions and identifying patterns in real-time. The ability of HPC solutions to deliver rapid insights from massive datasets is a critical factor propelling the market forward.

Another significant factor contributing to market growth is the escalating integration of artificial intelligence (AI) and machine learning (ML) with HPC systems. AI and ML models require considerable computational resources to train and deploy effectively, resources that are well provided by HPC environments. The synergy between AI/ML and HPC is enabling industries to automate processes and innovate at unprecedented scales. In manufacturing, for example, predictive maintenance and quality control processes are significantly enhanced through the application of AI models powered by HPC, leading to reduced downtime and improved product quality. Furthermore, the energy sector is utilizing these capabilities for advanced modeling and simulations to optimize resource extraction and energy distribution.

High-Performance Computing Software is at the heart of this transformative shift, providing the necessary tools and frameworks to harness the full potential of HPC systems. These software solutions are designed to optimize computational processes, enabling businesses to execute complex simulations and data analyses with unprecedented speed and accuracy. By leveraging high-performance computing software, organizations can streamline their workflows, reduce processing times, and enhance the overall efficiency of their operations. This capability is particularly crucial in industries where time-sensitive data processing is essential, such as financial services and healthcare, where rapid insights can lead to significant competitive advantages.

Cloud adoption is also playing a crucial role in the expansion of the enterprise HPC market. The availability of cloud-based HPC solutions allows organizations, particularly small and medium enterprises (SMEs), to access powerful computational resources on-demand without the substantial capital investment typically associated with on-premises setups. This democratization of HPC through the cloud is enabling a wider range of businesses to harness the benefits of high-speed data processing and analysis, thus broadening the market. Moreover, cloud providers are continuously enhancing their HPC offerings with advanced technologies and services, further fueling market growth.

Component Analysis

The component segment within the enterprise HPC market can be broken down into hardware, software, and services, each playing a pivotal role in the ecosystem. Hardware components, including processors, storage devices, and networking equipment, form the backbone of HPC systems. The demand for advanced hardware solutions is driven by the need to handle complex computational tasks efficiently. With the advent of more sophisticated processors and GPUs, the performance capabilities of HPC systems continue to grow, enabling them to tackle workloads of increasing complexity. Companies are investing heavily in research and development to produce hardware that meets the evolving needs of various industries.
g
A Large Scale Fish Dataset
gts.ai
json
Updated Mar 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). A Large Scale Fish Dataset [Dataset]. https://gts.ai/dataset-download/a-large-scale-fish-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Mar 20, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset was collected in order to carry out segmentation, feature extraction, and classification tasks and compare the common segmentation.

Facebook

Twitter

Click to copy link

Link copied

Cite

will brown (2024). massive-scenario [Dataset]. https://huggingface.co/datasets/willcb/massive-scenario

massive-scenario

willcb/massive-scenario

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 18, 2024

Authors

will brown

Description

willcb/massive-scenario dataset hosted on Hugging Face and contributed by the HF Datasets community

Clear search

Close search

Google apps

Main menu

massive-scenario

Data from: Additive Hazards Regression Analysis of Massive Interval-Censored...

Data from: S8 Fig -

Oct Large Dataset

OCT Large

Massive-STEPS-Moscow

Data from: Supporting data for "Efficient phylogenetic tree inference for...

Data Center Processor Market Report

Stanford Large-Scale 3D Indoor Spaces Dataset (S3DIS)

Abstract

Methodology

Population Optic Radiation Maps created by CONSULT: HCP-M90 ses-1 meyersloop...

Collection description

Subject species

Modality

Analysis level

Cognitive paradigm (task)

Map type

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

DataSheet1_TurboPutative: A web server for data handling and metabolite...

San Isabel National Forest and Leadville National Fish Hatchery, Mount...

Synthetic Data Platform Report

Face For Small Large Dataset

Face For Small Large

Details of dataset information.

Simple download service (Atom) of the dataset: Massive under the so-called...

Massive-STEPS-Jakarta

Big Data and Business Analytics Market Report | Global Forecast From 2025 To...

Big Data and Business Analytics Market Outlook

Component Analysis

Enterprise High Performance Computing Market Report | Global Forecast From...

Enterprise High Performance Computing Market Outlook

Component Analysis

A Large Scale Fish Dataset

massive-scenarioSee More Versions

willcb/massive-scenario

massive-scenario