100+ datasets found
  1. f

    Data Sheet 1_Functional partitioning through competitive learning.pdf

    • figshare.com
    pdf
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin (2025). Data Sheet 1_Functional partitioning through competitive learning.pdf [Dataset]. http://doi.org/10.3389/frai.2025.1661444.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    Frontiers
    Authors
    Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. We validate our concept with datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. Our partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 56% loss reduction, confirming our algorithm's utility.

  2. AMEX Training Data - Parquet Partitions

    • kaggle.com
    zip
    Updated Jul 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robbie Manolache (2022). AMEX Training Data - Parquet Partitions [Dataset]. https://www.kaggle.com/datasets/slashie/amex-train-data-pq
    Explore at:
    zip(7813557437 bytes)Available download formats
    Dataset updated
    Jul 24, 2022
    Authors
    Robbie Manolache
    Description

    Dataset

    This dataset was created by Robbie Manolache

    Contents

  3. Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  4. d

    Data from: Distributed Anomaly Detection using 1-class SVM for Vertically...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data [Dataset]. https://catalog.data.gov/dataset/distributed-anomaly-detection-using-1-class-svm-for-vertically-partitioned-data
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    There has been a tremendous increase in the volume of sensor data collected over the last decade for different monitoring tasks. For example, petabytes of earth science data are collected from modern satellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data is downloaded for different commercial airlines. These different types of datasets need to be analyzed for finding outliers. Information extraction from such rich data sources using advanced data mining methodologies is a challenging task not only due to the massive volume of data, but also because these datasets are physically stored at different geographical locations with only a subset of features available at any location. Moving these petabytes of data to a single location may waste a lot of bandwidth. To solve this problem, in this paper, we present a novel algorithm which can identify outliers in the entire data without moving all the data to a single location. The method we propose only centralizes a very small sample from the different data subsets at different locations. We analytically prove and experimentally verify that the algorithm offers high accuracy compared to complete centralization with only a fraction of the communication cost. We show that our algorithm is highly relevant to both earth sciences and aeronautics by describing applications in these domains. The performance of the algorithm is demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a simulated aviation dataset generated by the ‘Commercial Modular Aero-Propulsion System Simulation’ (CMAPSS).

  5. d

    Data from: OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS [Dataset]. https://catalog.data.gov/dataset/optimal-partitions-of-data-in-higher-dimensions
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS BRADLEY W. JACKSON, JEFFREY D. SCARGLE, AND CHRIS CUSANZA, DAVID BARNES, DENNIS KANYGIN, RUSSELL SARMIENTO, SOWMYA SUBRAMANIAM, TZU-WANG CHUANG** Abstract. Consider piece-wise constant approximations to a function of several parameters, and the problem of finding the best such approximation from measurements at a set of points in the parameter space. We find good approximate solutions to this problem in two steps: (1) partition the parameter space into cells, one for each of the N data points, and (2) collect these cells into blocks, such that within each block the function is constant to within measurement uncertainty. We describe a branch-and-bound algorithm for finding the optimal partition into connected blocks, as well as an O(N2) dynamic programming algorithm that finds the exact global optimum over this exponentially large search space, in a data space of any dimension. This second solution relaxes the connectivity constraint, and requires additivity and convexity conditions on the block fitness function, but in practice none of these items cause problems. From the wide variety of intelligent data understanding applications (including cluster analysis, classification, and anomaly detection) we demonstrate two: partitioning of the State of California (2D) and the Universe (3D).

  6. f

    Describes the data partitioning of the Berlin dataset.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Feb 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wang, Huiqing; Wang, Huajun; Wu, Linfen (2025). Describes the data partitioning of the Berlin dataset. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001300831
    Explore at:
    Dataset updated
    Feb 19, 2025
    Authors
    Wang, Huiqing; Wang, Huajun; Wu, Linfen
    Description

    Describes the data partitioning of the Berlin dataset.

  7. OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS - Dataset - NASA Open Data...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/optimal-partitions-of-data-in-higher-dimensions
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    OPTIMAL PARTITIONS OF DATA IN HIGHER DIMENSIONS BRADLEY W. JACKSON, JEFFREY D. SCARGLE, AND CHRIS CUSANZA, DAVID BARNES, DENNIS KANYGIN, RUSSELL SARMIENTO, SOWMYA SUBRAMANIAM, TZU-WANG CHUANG** Abstract. Consider piece-wise constant approximations to a function of several parameters, and the problem of finding the best such approximation from measurements at a set of points in the parameter space. We find good approximate solutions to this problem in two steps: (1) partition the parameter space into cells, one for each of the N data points, and (2) collect these cells into blocks, such that within each block the function is constant to within measurement uncertainty. We describe a branch-and-bound algorithm for finding the optimal partition into connected blocks, as well as an O(N2) dynamic programming algorithm that finds the exact global optimum over this exponentially large search space, in a data space of any dimension. This second solution relaxes the connectivity constraint, and requires additivity and convexity conditions on the block fitness function, but in practice none of these items cause problems. From the wide variety of intelligent data understanding applications (including cluster analysis, classification, and anomaly detection) we demonstrate two: partitioning of the State of California (2D) and the Universe (3D).

  8. n

    Gene Ontology Partition Database

    • neuinfo.org
    • scicrunch.org
    • +2more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Gene Ontology Partition Database [Dataset]. http://identifiers.org/RRID:SCR_007693/resolver?q=&i=rrid
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. The GO Partition Database was designed to feature ontology partitions with GO terms of similar specificity. The GO partitions comprise varying numbers of nodes and present relevant information theoretic statistics, so researchers can choose to analyze datasets at arbitrary levels of specificity. The GO Partition Database, featuring GO partition sets for functional analysis of genes from human and ten other commonly-studied organisms with a total of 131,972 genes.

  9. f

    Data from: Predicting Solute Descriptors for Organic Chemicals by a Deep...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xlsx
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kai Zhang; Huichun Zhang (2023). Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric [Dataset]. http://doi.org/10.1021/acs.est.1c05398.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    ACS Publications
    Authors
    Kai Zhang; Huichun Zhang
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log Kstorage‑lipid/water), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol–water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log Kstorage‑lipid/water, BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed “accurate” by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) “similar” chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.

  10. Land use partitioned by region (sub-national) and year (1992-2019)

    • zenodo.org
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonello Lobianco; Antonello Lobianco (2022). Land use partitioned by region (sub-national) and year (1992-2019) [Dataset]. http://doi.org/10.5281/zenodo.4736887
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 19, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonello Lobianco; Antonello Lobianco
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Land Use partitioned by sub-national region and year (1992-2019)

    What is this ?

    This archive includes land use partitioned by sub-national administrative region and year, i.e. for each year a table reports the count of each land-use class per region. Data is available as one CSV file per year in the folder "out-computedLUseStatsByRegionAndYear".

    This archive contains also the set of scripts used to compute that partition (including input data download) and that can be easily modified to retrieve a partition by a different geographical level.

    Warnings

    • This data should only be used to compute the relative ratio of each land-use class in each region. Due to several issues in projecting the data, the sum of the counts multiplied by the nominal area of each pixel (90000 sq.m) is NOT equal to the area of the region. However the shares of land uses should remain invariant to the projections and unbiased.
    • By construction, land use classes are hierarchically organised. For example, to obtain land use in class "Tree cover, broadleaved, deciduous, closed to open (>15%) (class 60), one has to sum the cells in classes 60+61+62. Same for classes 10, 70, 80, 120, 150 and 200.

    See the README on https://github.com/sylvaticus/landUsePartitionByRegionAndYear/ for further informations and citation format.

  11. Z

    Data Set "Systematic partitioning of proteins for quantum-chemical...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Dec 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mario Wolter; Moritz von Looz; Henning Meyerhenke; Christoph Jacob (2020). Data Set "Systematic partitioning of proteins for quantum-chemical fragmentation methods using graph algorithms" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_4066960
    Explore at:
    Dataset updated
    Dec 17, 2020
    Dataset provided by
    HU Berlin
    TU Braunschweig
    Authors
    Mario Wolter; Moritz von Looz; Henning Meyerhenke; Christoph Jacob
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data set accompanying the publication "Systematic partitioning of proteins for quantum-chemical fragmentation methods using graph algorithms"

    The data set contains:

    • Input script for PyADF (v0.97) for calculating (a) all two body terms to use as graph weights and (b) fragmentation error for all k and nmax (aspf)

    • PDB files of proteins and the "regions of interest" (RoI) used in this work.

    • Raw data: protein graph representations, resulting partitions, data underlying all figures shown in our article.

    • Jupiter notebook to create all figures shown in the article and in the supporting information from data in the results folder.

    • Images of protein structures and graph representations of ubiquitin.

  12. Caltech-256: Pre-Processed 80/20 Train-Test Split

    • kaggle.com
    zip
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KUSHAGRA MATHUR (2025). Caltech-256: Pre-Processed 80/20 Train-Test Split [Dataset]. https://www.kaggle.com/datasets/kushubhai/caltech-256-train-test
    Explore at:
    zip(1138799273 bytes)Available download formats
    Dataset updated
    Nov 12, 2025
    Authors
    KUSHAGRA MATHUR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context The Caltech-256 dataset is a foundational benchmark for object recognition, containing 30,607 images across 257 categories (256 object categories + 1 clutter category).

    The original dataset is typically provided as a collection of directories, one for each category. This version streamlines the machine learning workflow by providing:

    A clean, pre-defined 80/20 train-test split.

    Manifest files (train.csv, test.csv) that map image paths directly to their labels, allowing for easy use with data generators in frameworks like PyTorch and TensorFlow.

    A flat directory structure (train/, test/) for simplified file access.

    File Content The dataset is organized into a single top-level folder and two CSV files:

    train.csv: A CSV file containing two columns: image_path and label. This file lists all images designated for the training set.

    test.csv: A CSV file with the same structure as train.csv, listing all images designated for the testing set.

    Caltech-256_Train_Test/: The primary data folder.

    train/: This directory contains 80% of the images from all 257 categories, intended for model training.

    test/: This directory contains the remaining 20% of the images from all categories, reserved for model evaluation.

    Data Split The dataset has been thoroughly partitioned to create a standard 80% training and 20% testing split. This split is (or should be assumed to be) stratified, meaning that each of the 257 object categories is represented in roughly an 80/20 proportion in the respective sets.

    Acknowledgements & Original Source This dataset is a derivative work created for convenience. The original data and images belong to the authors of the Caltech-256 dataset.

    Original Dataset Link: https://www.kaggle.com/datasets/jessicali9530/caltech256/data

    Citation: Griffin, G. Holub, A.D. Perona, P. (2007). Caltech-256 Object Category Dataset. California Institute of Technology.

  13. Jute Pest

    • kaggle.com
    zip
    Updated May 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ABUCHI ONWUEGBUSI (2024). Jute Pest [Dataset]. https://www.kaggle.com/datasets/abuchionwuegbusi/jute-pest
    Explore at:
    zip(163188696 bytes)Available download formats
    Dataset updated
    May 16, 2024
    Authors
    ABUCHI ONWUEGBUSI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Overview: This dataset has 17 classes. Data are divided in three partition train, val and test.

    Dataset Characteristics: Image Feature Type: Categorical Associated Tasks: Classification, Other

    Class Labels: The classes are 0 : Beet Armyworm 1 : Black Hairy 2 : Cutworm 3 : Field Cricket 4 : Jute Aphid 5 : Jute Hairy 6 : Jute Red Mite 7 : Jute Semilooper 8 : Jute Stem Girdler 9 : Jute Stem Weevil 10 : Leaf Beetle 11 : Mealybug 12 : Pod Borer 13 : Scopula Emissaria 14 : Termite 15 : Termite odontotermes (Rambur) 16 : Yellow Mite

    Has Missing Values?: No

  14. d

    Data from: An evaluation of different partitioning strategies for Bayesian...

    • datadryad.org
    • data.niaid.nih.gov
    zip
    Updated Jun 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantinos Angelis; Sandra Álvarez-Carretero; Mario Dos Reis; Ziheng Yang (2017). An evaluation of different partitioning strategies for Bayesian estimation of species divergence times [Dataset]. http://doi.org/10.5061/dryad.d7839
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 29, 2017
    Dataset provided by
    Dryad
    Authors
    Konstantinos Angelis; Sandra Álvarez-Carretero; Mario Dos Reis; Ziheng Yang
    Time period covered
    Apr 3, 2017
    Description

    PartitionAnalysisTimeMC.FigS1Figure S1PartitionAnalysisTimeMC.TableS1S2Tables S1 & S2MakeTree.RRelClock.R

  15. Data from: Vegetation index-based partitioning of evapotranspiration is...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Vegetation index-based partitioning of evapotranspiration is deficient in grazed systems [Dataset]. https://catalog.data.gov/dataset/data-from-vegetation-index-based-partitioning-of-evapotranspiration-is-deficient-in-grazed-98cf1
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The dataset includes 30 minutes values of partitioned evaporation (E) and transpiration (T), T:ET ratios, and other ancillary datasets for three ET partitioning methods viz. Flux Variance Similarity (FVS) method, Transpiration Estimation Algorithm (TEA), and Underlying Water Use Efficiency (uWUE) method for three wheat sites. Three wheat sites had different grazing treatments. For example, Site 1 was Grain-only and Graze-grain wheat for the 2016-17 and 2017-18 growing seasons, respectively. Site 2 was Grain-only wheat for the 2017-18 growing season. Site 3 was Graze-grain and Graze-out wheat for the 2016-17 and 2017-18 growing seasons, respectively. The grain-only wheat system is a single purpose to produce wheat grains only. Graze-grain wheat system has a dual purpose as it serves as a pasture for grazing cattle from November to February and is used to produce wheat grains later. Graze-out wheat system is also a single purpose crop that is grazed by the cattle for the entire season to solely serve as a pasture. FVS method performed ET partitioning using the high frequency (10 Hz) data collected from Eddy Covariance Flux stations, located near the middle of each field. The high-frequency data were also processed using the EddyPro software to get good quality estimates of different fluxes at 30-minute intervals. The processed 30-min data were used by TEA and uWUE methods for ET partitioning. Ancillary hydro-meteorological variables including net radiation, air temperature, soil water content, relative humidity, and others, also have been included in this dataset. The study sites were located at the United States Department of Agriculture, Agricultural Research Service (USDA-ARS), Grazinglands Research Laboratory, El Reno, Oklahoma. All sites were rainfed. Resources in this dataset:Resource Title: FVS output and other met data and site info. File Name: FVS_output_and_other_met_data_and_site_info.xlsxResource Description: Output of FVS model along with corresponding meteorological data and site metadata.Resource Title: TEA output. File Name: TEA_output.xlsxResource Description: Out from TEA model along with site metadata.Resource Title: WUE output. File Name: uWUE_output.xlsxResource Description: Output of WUE model run along with site metadata.

  16. d

    Data from: Performance of akaike information criterion and bayesian...

    • datadryad.org
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Jun 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin Liu; Michael Charleston; Shane Richards; Barbara Holland (2022). Performance of akaike information criterion and bayesian information criterion in selecting partition models and mixture models [Dataset]. http://doi.org/10.5061/dryad.1jwstqjwj
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 6, 2022
    Dataset provided by
    Dryad
    Authors
    Qin Liu; Michael Charleston; Shane Richards; Barbara Holland
    Time period covered
    May 23, 2022
    Description

    The programs and software required are R, IQ-TREE2, and Seq-Gen-1.3.4.

  17. d

    Data from: Dataset for temporal influences on selenium partitioning, trophic...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Dataset for temporal influences on selenium partitioning, trophic transfer, and exposure in a major U.S. river [Dataset]. https://catalog.data.gov/dataset/dataset-for-temporal-influences-on-selenium-partitioning-trophic-transfer-and-exposure-in-
    Explore at:
    Dataset updated
    Nov 21, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    The trace element selenium is an essential element with a narrow window between concentrations needed to support life and those that cause toxicity to egg laying organisms. Selenium bioaccumulation in aquatic organisms is primarily the result of trophic transfer through food webs and is poorly predicted by dissolved concentrations in freshwater bodies. To better understand the hydrologic and biological dynamics that control selenium accumulation into fishes of the Lower Gunnison River Basin (Colorado), ecosystem scale selenium accumulation models were developed from data collected between June 2015 and October 2016.

  18. Zipped NetCDF data for Precipitation partitioning in Multi-Scale Atmospheric...

    • catalog.data.gov
    • data.amerigeoss.org
    Updated May 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Zipped NetCDF data for Precipitation partitioning in Multi-Scale Atmospheric Simulations: Impacts of Stability Restoration Methods [Dataset]. https://catalog.data.gov/dataset/zipped-netcdf-data-for-precipitation-partitioning-in-multi-scale-atmospheric-simulations-i
    Explore at:
    Dataset updated
    May 2, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Data for all figures in NetCDF format zipped files. This dataset is associated with the following publication: He, J., and K. Alapaty. Precipitation Partitioning in Multiscale Atmospheric Simulations: Impacts of Stability Restoration Methods. JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES. American Geophysical Union, Washington, DC, USA, 123(18): 10,185-10,201, (2018).

  19. r

    Data from: On Minimum Monotone and Unimodal Partitions of Permutations

    • resodate.org
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriele Di Stefano; Stefan Krause; Marco E. Lübbecke; Uwe T. Zimmermann (2021). On Minimum Monotone and Unimodal Partitions of Permutations [Dataset]. http://doi.org/10.14279/depositonce-14343
    Explore at:
    Dataset updated
    Dec 17, 2021
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Gabriele Di Stefano; Stefan Krause; Marco E. Lübbecke; Uwe T. Zimmermann
    Description

    Partitioning a permutation into a minimum number of monotone subsequences is NP-hard. We extend this complexity result to minimum partitions into unimodal subsequences. In graph theoretical terms these problems are cocoloring and what we call split-coloring of permutation graphs. Based on a network flow interpretation of both problems we introduce mixed integer programs; this is the first approach to obtain optimal partitions for these problems in general. We derive an LP rounding algorithm which is a 2-approximation for both coloring problems. It performs much better in practice. In an online situation the permutation becomes known to an algorithm sequentially, and we give a logarithmic lower bound on the competitive ratio and analyze two online algorithms.

  20. d

    Ronlow beds partitioning

    • data.gov.au
    • researchdata.edu.au
    zip
    Updated Nov 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2019). Ronlow beds partitioning [Dataset]. https://data.gov.au/data/dataset/activity/d2f60560-eda7-417d-86ca-1d29ce994edd
    Explore at:
    zip(41476)Available download formats
    Dataset updated
    Nov 20, 2019
    Dataset provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    This dataset describes the correlation of the Ronlow beds to other geological units in the Galilee subregion. The Ronlow beds are stratigraphic equivalents of three formal geological units: the Hutton Sandstone, the hooray Sandstone, and the Injune Creek Group. For the preparation of potentiometric surface maps and other hydrogeological interpretation of data from the Galilee subregion, the Ronlow beds were partitioned into three sub-units, which were assigned to either the Hutton Sandstone, Hooray Sandstone, or Injune Creek Group. This partitioning was based on potentiometry of bores screened in the Ronlow beds.

    Dataset History

    Hydraulic head data for bores screened in the Ronlow beds from dataset 'JkrRonlow_beds_Partitioning.gdb' were compared to hydraulic head values in bores assigned to the Hutton Sandstone, Hooray Sandstone, and Injune Creek group. Bores screened in the Ronlow beds were then assigned to either the Hutton Sandstone aquifer, Hooray Sandstone aquifer, or Injune Creek Group aquitard based on similarities in hydraulic head.The polygons were created in an ArcMap editing session.

    Dataset Citation

    Bioregional Assessment Programme (2015) Ronlow beds partitioning. Bioregional Assessment Derived Dataset. Viewed 07 December 2018, http://data.bioregionalassessments.gov.au/dataset/d2f60560-eda7-417d-86ca-1d29ce994edd.

    Dataset Ancestors

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin (2025). Data Sheet 1_Functional partitioning through competitive learning.pdf [Dataset]. http://doi.org/10.3389/frai.2025.1661444.s001

Data Sheet 1_Functional partitioning through competitive learning.pdf

Related Article
Explore at:
pdfAvailable download formats
Dataset updated
Nov 5, 2025
Dataset provided by
Frontiers
Authors
Marius Tacke; Matthias Busch; Kevin Linka; Christian Cyron; Roland Aydin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Datasets often incorporate various functional patterns related to different aspects or regimes, which are typically not equally present throughout the dataset. We propose a novel partitioning algorithm that utilizes competition between models to detect and separate these functional patterns. This competition is induced by multiple models iteratively submitting their predictions for the dataset, with the best prediction for each data point being rewarded with training on that data point. This reward mechanism amplifies each model's strengths and encourages specialization in different patterns. The specializations can then be translated into a partitioning scheme. We validate our concept with datasets with clearly distinct functional patterns, such as mechanical stress and strain data in a porous structure. Our partitioning algorithm produces valuable insights into the datasets' structure, which can serve various further applications. As a demonstration of one exemplary usage, we set up modular models consisting of multiple expert models, each learning a single partition, and compare their performance on more than twenty popular regression problems with single models learning all partitions simultaneously. Our results show significant improvements, with up to 56% loss reduction, confirming our algorithm's utility.

Search
Clear search
Close search
Google apps
Main menu