100+ datasets found

f
Data Sheet 1_Functional partitioning through competitive learning.pdf
figshare.com
pdf
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin (2025). Data Sheet 1_Functional partitioning through competitive learning.pdf [Dataset]. http://doi.org/10.3389/frai.2025.1661444.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/frai.2025.1661444.s001
Dataset updated
Nov 5, 2025
Dataset provided by
Frontiers
Authors
Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. We validate our concept with datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. Our partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 56% loss reduction, confirming our algorithm's utility.
AMEX Training Data - Parquet Partitions
kaggle.com
zip
Updated Jul 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robbie Manolache (2022). AMEX Training Data - Parquet Partitions [Dataset]. https://www.kaggle.com/datasets/slashie/amex-train-data-pq
Explore at:
zip(7813557437 bytes)Available download formats
Dataset updated
Jul 24, 2022
Authors
Robbie Manolache
Description
Dataset

This dataset was created by Robbie Manolache

Contents
Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
d
Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...
catalog.data.gov
s.cnmilf.com
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).
d
Data from: OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS [Dataset]. https://catalog.data.gov/dataset/optimal-partitions-of-data-in-higher-dimensions
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS BRADLEY W. JACKSON, JEFFREY D. SCARGLE, AND CHRIS CUSANZA, DAVID BARNES, DENNIS KANYGIN, RUSSELL SARMIENTO, SOWMYA SUBRAMANIAM, TZU-WANG CHUANG** Abstract. Consider piece-wise constant approximations to a function of several parameters, and the problem of finding the best such approximation from measurements at a set of points in the parameter space. We find good approximate solutions to this problem in two steps: (1) partition the parameter space into cells, one for each of the N data points, and (2) collect these cells into blocks, such that within each block the function is constant to within measurement uncertainty. We describe a branch-and-bound algorithm for finding the optimal partition into connected blocks, as well as an O(N2) dynamic programming algorithm that finds the exact global optimum over this exponentially large search space, in a data space of any dimension. This second solution relaxes the connectivity constraint, and requires additivity and convexity conditions on the block fitness function, but in practice none of these items cause problems. From the wide variety of intelligent data understanding applications (including cluster analysis, classification, and anomaly detection) we demonstrate two: partitioning of the State of California (2D) and the Universe (3D).
f
Describes the data partitioning of the Berlin dataset.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wang, Huiqing; Wang, Huajun; Wu, Linfen (2025). Describes the data partitioning of the Berlin dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001300831
Explore at:
Dataset updated
Feb 19, 2025
Authors
Wang, Huiqing; Wang, Huajun; Wu, Linfen
Description
Describes the data partitioning of the Berlin dataset.
OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS - Dataset - NASA Open Data...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/optimal-partitions-of-data-in-higher-dimensions
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS BRADLEY W. JACKSON, JEFFREY D. SCARGLE, AND CHRIS CUSANZA, DAVID BARNES, DENNIS KANYGIN, RUSSELL SARMIENTO, SOWMYA SUBRAMANIAM, TZU-WANG CHUANG** Abstract. Consider piece-wise constant approximations to a function of several parameters, and the problem of finding the best such approximation from measurements at a set of points in the parameter space. We find good approximate solutions to this problem in two steps: (1) partition the parameter space into cells, one for each of the N data points, and (2) collect these cells into blocks, such that within each block the function is constant to within measurement uncertainty. We describe a branch-and-bound algorithm for finding the optimal partition into connected blocks, as well as an O(N2) dynamic programming algorithm that finds the exact global optimum over this exponentially large search space, in a data space of any dimension. This second solution relaxes the connectivity constraint, and requires additivity and convexity conditions on the block fitness function, but in practice none of these items cause problems. From the wide variety of intelligent data understanding applications (including cluster analysis, classification, and anomaly detection) we demonstrate two: partitioning of the State of California (2D) and the Universe (3D).
n
Gene Ontology Partition Database
neuinfo.org
scicrunch.org
+2more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Gene Ontology Partition Database [Dataset]. http://identifiers.org/RRID:SCR_007693/resolver?q=&i=rrid
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_007693 https://identifiers.org/RRID:SCR_007693/resolver?q=&i=rrid
Dataset updated
Jan 29, 2022
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. The GO Partition Database was designed to feature ontology partitions with GO terms of similar specificity. The GO partitions comprise varying numbers of nodes and present relevant information theoretic statistics, so researchers can choose to analyze datasets at arbitrary levels of specificity. The GO Partition Database, featuring GO partition sets for functional analysis of genes from human and ten other commonly-studied organisms with a total of 131,972 genes.
f
Data from: Predicting Solute Descriptors for Organic Chemicals by a Deep...
figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kai Zhang; Huichun Zhang (2023). Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric [Dataset]. http://doi.org/10.1021/acs.est.1c05398.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.est.1c05398.s002
Dataset updated
Jun 4, 2023
Dataset provided by
ACS Publications
Authors
Kai Zhang; Huichun Zhang
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log Kstorage‑lipid/water), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol–water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log Kstorage‑lipid/water, BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed “accurate” by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) “similar” chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.
Land use partitioned by region (sub-national) and year (1992-2019)
zenodo.org
zip
Updated Jan 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonello Lobianco; Antonello Lobianco (2022). Land use partitioned by region (sub-national) and year (1992-2019) [Dataset]. http://doi.org/10.5281/zenodo.4736887
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4736887
Dataset updated
Jan 19, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonello Lobianco; Antonello Lobianco
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Land Use partitioned by sub-national region and year (1992-2019)

What is this ?

This archive includes land use partitioned by sub-national administrative region and year, i.e. for each year a table reports the count of each land-use class per region. Data is available as one CSV file per year in the folder "out-computedLUseStatsByRegionAndYear".

This archive contains also the set of scripts used to compute that partition (including input data download) and that can be easily modified to retrieve a partition by a different geographical level.

Warnings

This data should only be used to compute the relative ratio of each land-use class in each region. Due to several issues in projecting the data, the sum of the counts multiplied by the nominal area of each pixel (90000 sq.m) is NOT equal to the area of the region. However the shares of land uses should remain invariant to the projections and unbiased.

By construction, land use classes are hierarchically organised. For example, to obtain land use in class "Tree cover, broadleaved, deciduous, closed to open (>15%) (class 60), one has to sum the cells in classes 60+61+62. Same for classes 10, 70, 80, 120, 150 and 200.

See the README on https://github.com/sylvaticus/landUsePartitionByRegionAndYear/ for further informations and citation format.
Z
Data Set "Systematic partitioning of proteins for quantum-chemical...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Dec 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mario Wolter; Moritz von Looz; Henning Meyerhenke; Christoph Jacob (2020). Data Set "Systematic partitioning of proteins for quantum-chemical fragmentation methods using graph algorithms" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4066960
Explore at:
Dataset updated
Dec 17, 2020
Dataset provided by
HU Berlin
TU Braunschweig
Authors
Mario Wolter; Moritz von Looz; Henning Meyerhenke; Christoph Jacob
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data set accompanying the publication "Systematic partitioning of proteins for quantum-chemical fragmentation methods using graph algorithms"

The data set contains:

Input script for PyADF (v0.97) for calculating (a) all two body terms to use as graph weights and (b) fragmentation error for all k and nmax (aspf)

PDB files of proteins and the "regions of interest" (RoI) used in this work.

Raw data: protein graph representations, resulting partitions, data underlying all figures shown in our article.

Jupiter notebook to create all figures shown in the article and in the supporting information from data in the results folder.

Images of protein structures and graph representations of ubiquitin.
Caltech-256: Pre-Processed 80/20 Train-Test Split
kaggle.com
zip
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KUSHAGRA MATHUR (2025). Caltech-256: Pre-Processed 80/20 Train-Test Split [Dataset]. https://www.kaggle.com/datasets/kushubhai/caltech-256-train-test
Explore at:
zip(1138799273 bytes)Available download formats
Dataset updated
Nov 12, 2025
Authors
KUSHAGRA MATHUR
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Context The Caltech-256 dataset is a foundational benchmark for object recognition, containing 30,607 images across 257 categories (256 object categories + 1 clutter category).

The original dataset is typically provided as a collection of directories, one for each category. This version streamlines the machine learning workflow by providing:

A clean, pre-defined 80/20 train-test split.

Manifest files (train.csv, test.csv) that map image paths directly to their labels, allowing for easy use with data generators in frameworks like PyTorch and TensorFlow.

A flat directory structure (train/, test/) for simplified file access.

File Content The dataset is organized into a single top-level folder and two CSV files:

train.csv: A CSV file containing two columns: image_path and label. This file lists all images designated for the training set.

test.csv: A CSV file with the same structure as train.csv, listing all images designated for the testing set.

Caltech-256_Train_Test/: The primary data folder.

train/: This directory contains 80% of the images from all 257 categories, intended for model training.

test/: This directory contains the remaining 20% of the images from all categories, reserved for model evaluation.

Data Split The dataset has been thoroughly partitioned to create a standard 80% training and 20% testing split. This split is (or should be assumed to be) stratified, meaning that each of the 257 object categories is represented in roughly an 80/20 proportion in the respective sets.

Acknowledgements & Original Source This dataset is a derivative work created for convenience. The original data and images belong to the authors of the Caltech-256 dataset.

Original Dataset Link: https://www.kaggle.com/datasets/jessicali9530/caltech256/data

Citation: Griffin, G. Holub, A.D. Perona, P. (2007). Caltech-256 Object Category Dataset. California Institute of Technology.
Jute Pest
kaggle.com
zip
Updated May 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ABUCHI ONWUEGBUSI (2024). Jute Pest [Dataset]. https://www.kaggle.com/datasets/abuchionwuegbusi/jute-pest
Explore at:
zip(163188696 bytes)Available download formats
Dataset updated
May 16, 2024
Authors
ABUCHI ONWUEGBUSI
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview: This dataset has 17 classes. Data are divided in three partition train, val and test.

Dataset Characteristics: Image Feature Type: Categorical Associated Tasks: Classification, Other

Class Labels: The classes are 0 : Beet Armyworm 1 : Black Hairy 2 : Cutworm 3 : Field Cricket 4 : Jute Aphid 5 : Jute Hairy 6 : Jute Red Mite 7 : Jute Semilooper 8 : Jute Stem Girdler 9 : Jute Stem Weevil 10 : Leaf Beetle 11 : Mealybug 12 : Pod Borer 13 : Scopula Emissaria 14 : Termite 15 : Termite odontotermes (Rambur) 16 : Yellow Mite

Has Missing Values?: No
d
Data from: An evaluation of different partitioning strategies for Bayesian...
datadryad.org
data.niaid.nih.gov
zip
Updated Jun 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantinos Angelis; Sandra Álvarez-Carretero; Mario Dos Reis; Ziheng Yang (2017). An evaluation of different partitioning strategies for Bayesian estimation of species divergence times [Dataset]. http://doi.org/10.5061/dryad.d7839
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.d7839
Dataset updated
Jun 29, 2017
Dataset provided by
Dryad
Authors
Konstantinos Angelis; Sandra Álvarez-Carretero; Mario Dos Reis; Ziheng Yang
Time period covered
Apr 3, 2017
Description
PartitionAnalysisTimeMC.FigS1Figure S1PartitionAnalysisTimeMC.TableS1S2Tables S1 & S2MakeTree.RRelClock.R
Data from: Vegetation index-based partitioning of evapotranspiration is...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Vegetation index-based partitioning of evapotranspiration is deficient in grazed systems [Dataset]. https://catalog.data.gov/dataset/data-from-vegetation-index-based-partitioning-of-evapotranspiration-is-deficient-in-grazed-98cf1
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The dataset includes 30 minutes values of partitioned evaporation (E) and transpiration (T), T:ET ratios, and other ancillary datasets for three ET partitioning methods viz. Flux Variance Similarity (FVS) method, Transpiration Estimation Algorithm (TEA), and Underlying Water Use Efficiency (uWUE) method for three wheat sites. Three wheat sites had different grazing treatments. For example, Site 1 was Grain-only and Graze-grain wheat for the 2016-17 and 2017-18 growing seasons, respectively. Site 2 was Grain-only wheat for the 2017-18 growing season. Site 3 was Graze-grain and Graze-out wheat for the 2016-17 and 2017-18 growing seasons, respectively. The grain-only wheat system is a single purpose to produce wheat grains only. Graze-grain wheat system has a dual purpose as it serves as a pasture for grazing cattle from November to February and is used to produce wheat grains later. Graze-out wheat system is also a single purpose crop that is grazed by the cattle for the entire season to solely serve as a pasture. FVS method performed ET partitioning using the high frequency (10 Hz) data collected from Eddy Covariance Flux stations, located near the middle of each field. The high-frequency data were also processed using the EddyPro software to get good quality estimates of different fluxes at 30-minute intervals. The processed 30-min data were used by TEA and uWUE methods for ET partitioning. Ancillary hydro-meteorological variables including net radiation, air temperature, soil water content, relative humidity, and others, also have been included in this dataset. The study sites were located at the United States Department of Agriculture, Agricultural Research Service (USDA-ARS), Grazinglands Research Laboratory, El Reno, Oklahoma. All sites were rainfed. Resources in this dataset:Resource Title: FVS output and other met data and site info. File Name: FVS_output_and_other_met_data_and_site_info.xlsxResource Description: Output of FVS model along with corresponding meteorological data and site metadata.Resource Title: TEA output. File Name: TEA_output.xlsxResource Description: Out from TEA model along with site metadata.Resource Title: WUE output. File Name: uWUE_output.xlsxResource Description: Output of WUE model run along with site metadata.
d
Data from: Performance of akaike information criterion and bayesian...
datadryad.org
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Jun 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qin Liu; Michael Charleston; Shane Richards; Barbara Holland (2022). Performance of akaike information criterion and bayesian information criterion in selecting partition models and mixture models [Dataset]. http://doi.org/10.5061/dryad.1jwstqjwj
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.1jwstqjwj
Dataset updated
Jun 6, 2022
Dataset provided by
Dryad
Authors
Qin Liu; Michael Charleston; Shane Richards; Barbara Holland
Time period covered
May 23, 2022
Description
The programs and software required are R, IQ-TREE2, and Seq-Gen-1.3.4.
d
Data from: Dataset for temporal influences on selenium partitioning, trophic...
catalog.data.gov
data.usgs.gov
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Dataset for temporal influences on selenium partitioning, trophic transfer, and exposure in a major U.S. river [Dataset]. https://catalog.data.gov/dataset/dataset-for-temporal-influences-on-selenium-partitioning-trophic-transfer-and-exposure-in-
Explore at:
Dataset updated
Nov 21, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
The trace element selenium is an essential element with a narrow window between concentrations needed to support life and those that cause toxicity to egg laying organisms. Selenium bioaccumulation in aquatic organisms is primarily the result of trophic transfer through food webs and is poorly predicted by dissolved concentrations in freshwater bodies. To better understand the hydrologic and biological dynamics that control selenium accumulation into fishes of the Lower Gunnison River Basin (Colorado), ecosystem scale selenium accumulation models were developed from data collected between June 2015 and October 2016.
Zipped NetCDF data for Precipitation partitioning in Multi-Scale Atmospheric...
catalog.data.gov
data.amerigeoss.org
Updated May 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Zipped NetCDF data for Precipitation partitioning in Multi-Scale Atmospheric Simulations: Impacts of Stability Restoration Methods [Dataset]. https://catalog.data.gov/dataset/zipped-netcdf-data-for-precipitation-partitioning-in-multi-scale-atmospheric-simulations-i
Explore at:
Dataset updated
May 2, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Data for all figures in NetCDF format zipped files. This dataset is associated with the following publication: He, J., and K. Alapaty. Precipitation Partitioning in Multiscale Atmospheric Simulations: Impacts of Stability Restoration Methods. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES. American Geophysical Union, Washington, DC, USA, 123(18): 10,185-10,201, (2018).
r
Data from: On Minimum Monotone and Unimodal Partitions of Permutations
resodate.org
Updated Dec 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriele Di Stefano; Stefan Krause; Marco E. Lübbecke; Uwe T. Zimmermann (2021). On Minimum Monotone and Unimodal Partitions of Permutations [Dataset]. http://doi.org/10.14279/depositonce-14343
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-14343
Dataset updated
Dec 17, 2021
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Gabriele Di Stefano; Stefan Krause; Marco E. Lübbecke; Uwe T. Zimmermann
Description
Partitioning a permutation into a minimum number of monotone subsequences is NP-hard. We extend this complexity result to minimum partitions into unimodal subsequences. In graph theoretical terms these problems are cocoloring and what we call split-coloring of permutation graphs. Based on a network flow interpretation of both problems we introduce mixed integer programs; this is the first approach to obtain optimal partitions for these problems in general. We derive an LP rounding algorithm which is a 2-approximation for both coloring problems. It performs much better in practice. In an online situation the permutation becomes known to an algorithm sequentially, and we give a logarithmic lower bound on the competitive ratio and analyze two online algorithms.
d
Ronlow beds partitioning
data.gov.au
researchdata.edu.au
zip
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2019). Ronlow beds partitioning [Dataset]. https://data.gov.au/data/dataset/activity/d2f60560-eda7-417d-86ca-1d29ce994edd
Explore at:
zip(41476)Available download formats
Dataset updated
Nov 20, 2019
Dataset provided by
Bioregional Assessment Program
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

This dataset describes the correlation of the Ronlow beds to other geological units in the Galilee subregion. The Ronlow beds are stratigraphic equivalents of three formal geological units: the Hutton Sandstone, the hooray Sandstone, and the Injune Creek Group. For the preparation of potentiometric surface maps and other hydrogeological interpretation of data from the Galilee subregion, the Ronlow beds were partitioned into three sub-units, which were assigned to either the Hutton Sandstone, Hooray Sandstone, or Injune Creek Group. This partitioning was based on potentiometry of bores screened in the Ronlow beds.

Dataset History

Hydraulic head data for bores screened in the Ronlow beds from dataset 'JkrRonlow_beds_Partitioning.gdb' were compared to hydraulic head values in bores assigned to the Hutton Sandstone, Hooray Sandstone, and Injune Creek group. Bores screened in the Ronlow beds were then assigned to either the Hutton Sandstone aquifer, Hooray Sandstone aquifer, or Injune Creek Group aquitard based on similarities in hydraulic head.The polygons were created in an ArcMap editing session.

Dataset Citation

Bioregional Assessment Programme (2015) Ronlow beds partitioning. Bioregional Assessment Derived Dataset. Viewed 07 December 2018, http://data.bioregionalassessments.gov.au/dataset/d2f60560-eda7-417d-86ca-1d29ce994edd.

Dataset Ancestors

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204

Derived From QDEX well completion reports (WCR) - Galilee v01

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores v3 03122014

Derived From Potentiometric head difference v01

Derived From QLD Department of Natural Resources and Mines Groundwater Database Extract 20142808

Derived From Galilee subregion groundwater usage estimates dataset v01

Derived From Galilee Water Accounts Table: volumes and purposes

Facebook

Twitter

Click to copy link

Link copied

Cite

Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin (2025). Data Sheet 1_Functional partitioning through competitive learning.pdf [Dataset]. http://doi.org/10.3389/frai.2025.1661444.s001

Data Sheet 1_Functional partitioning through competitive learning.pdf

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.3389/frai.2025.1661444.s001

Dataset updated

Nov 5, 2025

Dataset provided by

Frontiers

Authors

Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. We validate our concept with datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. Our partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 56% loss reduction, confirming our algorithm's utility.

Clear search

Close search

Google apps

Main menu

Data Sheet 1_Functional partitioning through competitive learning.pdf

AMEX Training Data - Parquet Partitions

Dataset

Contents

Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned...

Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

Data from: OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS

Describes the data partitioning of the Berlin dataset.

OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS - Dataset - NASA Open Data...

Gene Ontology Partition Database

Data from: Predicting Solute Descriptors for Organic Chemicals by a Deep...

Land use partitioned by region (sub-national) and year (1992-2019)

Data Set "Systematic partitioning of proteins for quantum-chemical...

Caltech-256: Pre-Processed 80/20 Train-Test Split

Jute Pest

Data from: An evaluation of different partitioning strategies for Bayesian...

Data from: Vegetation index-based partitioning of evapotranspiration is...

Data from: Performance of akaike information criterion and bayesian...

Data from: Dataset for temporal influences on selenium partitioning, trophic...

Zipped NetCDF data for Precipitation partitioning in Multi-Scale Atmospheric...

Data from: On Minimum Monotone and Unimodal Partitions of Permutations

Ronlow beds partitioning

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

Data Sheet 1_Functional partitioning through competitive learning.pdf