7 datasets found
  1. h

    PLANE-ood

    • huggingface.co
    Updated Sep 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tasksource (2023). PLANE-ood [Dataset]. https://huggingface.co/datasets/tasksource/PLANE-ood
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2023
    Dataset authored and provided by
    tasksource
    License

    Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
    License information was derived automatically

    Description

    Preprocessed from https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/ df=pd.read_json('https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/resolve/main/PLANE_trntst-OoV_inftype-all.json') f = lambda df: pd.DataFrame(list(zip(*[df[c] for c in df.index])),columns=df.index) ds=DatasetDict() for split in ['train','test']: dfs=pd.concat([f(df[c]) for c in df.columns if split in c.lower()]).reset_index(drop=True) dfs['label']=dfs['label'].map(lambda x:{1:'entailment'… See the full description on the dataset page: https://huggingface.co/datasets/tasksource/PLANE-ood.

  2. Z

    Multimodal Vision-Audio-Language Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schaumlöffel, Timothy (2024). Multimodal Vision-Audio-Language Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10060784
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Schaumlöffel, Timothy
    Choksi, Bhavin
    Roig, Gemma
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation

    pip install pandas pyarrow Example

    import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])

    dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de

  3. SELTO Dataset

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated May 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7034899
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 23, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Benchmark Dataset for Deep Learning-based Methods for 3D Topology Optimization.

    One can find a description of the provided dataset partitions in Section 3 of Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.


    Every dataset container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and a corresponding binarized SIMP solution. Every file of the form {i}.csv contains all voxel-wise information about the sample i. Every file of the form {i}_info.csv file contains scalar parameters of the topology optimization problem, such as material parameters.


    This dataset represents topology optimization problems and solutions on the bases of voxels. We define all spatially varying quantities via the voxels' centers -- rather than via the vertices or surfaces of the voxels.
    In {i}.csv files, each row corresponds to one voxel in the design space. The columns correspond to ['x', 'y', 'z', 'design_space', 'dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'].

    • x, y, z - These are three integer indices stating the index/location of the voxel within the voxel mesh.
    • design_space - This is one ternary variable indicating the type of material density constraint on the voxel within the TO problem formulation. "0" and "1" indicate a material density fixed at 0 or 1, respectively. "-1" indicates the absence of constraints.
    • dirichlet_x, dirichlet_y, dirichlet_z - These are three binary variables defining whether the voxel contains homogenous Dirichlet constraints in the respective axis direction.
    • force_x, force_y, force_z - These are three floating point variables giving the three spacial components of the forces applied to each voxel. All forces are body forces given in [N/m^3].
    • density - This is a binary variable stating whether the voxel carries material in the solution of the topology optimization problem.

    Any of these files with the index i can be imported using pandas by executing:

    import pandas as pd
    
    directory = ...
    file_path = f'{directory}/{i}.csv'
    column_names = ['x', 'y', 'z', 'design_space','dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density']
    data = pd.read_csv(file_path, names=column_names)

    From this pandas dataframe one can extract the torch tensors of forces F, Dirichlet conditions ωDirichlet, and design space information ωdesign using the following functions:

    import torch
    
    def get_shape_and_voxels(data):
      shape = data[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1
      vox_x = data['x'].values
      vox_y = data['y'].values
      vox_z = data['z'].values
      voxels = [vox_x, vox_y, vox_z]
      return shape, voxels
    
    
    def get_forces_boundary_conditions_and_design_space(data, shape, voxels):
      F = torch.zeros(3, *shape, dtype=torch.float32)
      F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_x'].values, dtype=torch.float32)
      F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_y'].values, dtype=torch.float32)
      F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_z'].values, dtype=torch.float32)
    
      ω_Dirichlet = torch.zeros(3, *shape, dtype=torch.float32)
      ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_x'].values, dtype=torch.float32)
      ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_y'].values, dtype=torch.float32)
      ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_z'].values, dtype=torch.float32)
    
      ω_design = torch.zeros(1, *shape, dtype=int)
      ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['design_space'].values.astype(int))
      return F, ω_Dirichlet, ω_design

    The corresponding {i}_info.csv files only have one row with column labels ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'].

    • E - Young's modulus [Pa]
    • ν - Poisson's ratio [-]
    • σ_ys - Yield stress [Pa]
    • vox_size - Length of the edge of a (cube-shaped) voxel [m]
    • p_x, p_y, p_z - Location of the root of the design space [m]

    Analogously to above, one can import any {i}_info.csv file by executing:

    file_path = f'{directory}/{i}_info.csv'
    data_info_column_names = ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z']
    data_info = pd.read_csv(file_path, names=data_info_column_names)

  4. Covid-19 Czech Republic

    • kaggle.com
    Updated Jul 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michal Brezak (2020). Covid-19 Czech Republic [Dataset]. https://www.kaggle.com/michalbrezk/covid19-czech-republic/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Michal Brezak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Czechia
    Description

    Context

    This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Columns description

    Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex Infected abroad - Identifies if person was infected by Covid-19 in Czech republic or abroad Infected in country - code of country from where person came (origin country of Covid-19)

    Data granularity

    Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.

    Day level

    df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()
    

    Region level

    df_region = df[df['region'] != ''].groupby(['region']).agg(
      region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'), 
      infected=pd.NamedAgg(column='infected', aggfunc='sum'),
      cured=pd.NamedAgg(column='cured', aggfunc='sum'),
      death=pd.NamedAgg(column='death', aggfunc='sum')
    ).reset_index()
    

    Detail level

    df_detail = df[['date','region','sub_region','age','sex','infected','cured','death','infected_abroad','infected_in_country']].reset_index(drop=True)
    

    Acknowledgements

    Thanks to websites of MVCR for sharing such great information.

    Inspiration

    Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?

  5. h

    stress_tests_nli

    • huggingface.co
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pietro Lesci (2025). stress_tests_nli [Dataset]. https://huggingface.co/datasets/pietrolesci/stress_tests_nli
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 20, 2025
    Authors
    Pietro Lesci
    Description

    Overview

    Original dataset page here and dataset available here.

      Dataset curation
    

    Added new column label with encoded labels with the following mapping {"entailment": 0, "neutral": 1, "contradiction": 2}

    and the columns with parse information are dropped as they are not well formatted. Also, the name of the file from which each instance comes is added in the column dtype.

      Code to create the dataset
    

    import pandas as pd from datasets import Dataset… See the full description on the dataset page: https://huggingface.co/datasets/pietrolesci/stress_tests_nli.

  6. Summary statistics for Kishore, Sreelatha, Tenghe et al. GWAS for...

    • zenodo.org
    Updated Jan 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asha Kishore; Asha Kishore; Ashwin Ashok Kumar Sreelatha; Ashwin Ashok Kumar Sreelatha; Amabel Tenghe; Amabel Tenghe (2025). Summary statistics for Kishore, Sreelatha, Tenghe et al. GWAS for Parkinson's Disease in India [Dataset]. http://doi.org/10.5281/zenodo.14726797
    Explore at:
    Dataset updated
    Jan 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Asha Kishore; Asha Kishore; Ashwin Ashok Kumar Sreelatha; Ashwin Ashok Kumar Sreelatha; Amabel Tenghe; Amabel Tenghe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    GWAS summary statistics for Kishore, Sreelatha, Tenghe et al. “Deciphering the Genetic Architecture of Parkinson’s Disease in India

    Genomic Coordinates: GRCh38

    Study Type: Case-Control

    Datasets Included

    1. Indian PD GWAS Summary Statistics
      • File Name: PD_GWAS_India_4806Cases_6364Controls_Summary_Statistics.tsv.gz
      • Description: This file contains the GWAS summary statistics derived from 4,806 Parkinson’s disease cases and 6,364 controls from the Indian population. Contains a total of 8,414,146 biallelic SNPs.

    File Description

    • Columns:
      • CHR: Chromosome number
      • POS: Base pair position
      • SNP: SNP identifier (rsID)
      • A1: Effect allele
      • A2: Alternate allele
      • EAF: Effect allele frequency
      • BETA: Effect size for the effect allele
      • SE: Standard error of the effect size
      • P: P-value for association
    • Methods:
      • GWAS conducted using logistic generalized linear models (GLM) implemented in PLINK
      • Quality control included filtering for MAF > 0.01, HWE p-value > 1e-8, and INFO score > 0.3

    2. Meta-Analysis Summary Statistics

      • File Name: PD_Meta_Analysis_India_MultiEthnic_Summary_Statistics.tsv.gz
      • Description: This file contains the results of a meta-analysis combining summary statistics from the Indian GWAS and a multi-ethnic GWAS. Contains a total of 7,097,037 biallelic SNPs.
      • Studies Included:
        • Kim et al.: Multi-ancestry GWAS meta-analysis of Parkinson’s Disease (611,485 individuals)
        • Kishore, Sreelatha, Tenghe et al.: Indian GWAS of Parkinson’s Disease (11,170 individuals)

    File Description

    • Columns:

      • CHR: Chromosome number
      • POS: Base pair position
      • SNP: SNP identifier (rsID)
      • A1: Effect allele
      • A2: Alternate allele
      • META_BETA: Meta-analyzed effect size for the effect allele
      • META_SE: Standard error of the meta-analyzed effect size
      • META_P: P-value for the meta-analysis
      • I2: Heterogeneity index across studies
      • N_STUDIES: Number of studies
      • EFFECTS: Summary of effect directions
    • Methods:

      • Meta-analysis conducted using GWAMA software with fixed-effects model
      • Heterogeneity statistics (I2) calculated to evaluate population differences

    Populations

    1. Indian PD GWAS: Includes individuals of Indian ancestry
    2. Meta-Analysis: Combines Indian GWAS results with a multi-ethnic GWAS summary statistics dataset

  7. Data from: Pd-Catalyzed Cross-Couplings: On the Importance of the Catalyst...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher S. Horbaczewskyj; Ian J. S. Fairlamb (2023). Pd-Catalyzed Cross-Couplings: On the Importance of the Catalyst Quantity Descriptors, mol % and ppm [Dataset]. http://doi.org/10.1021/acs.oprd.2c00051.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Christopher S. Horbaczewskyj; Ian J. S. Fairlamb
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This Review examines parts per million (ppm) palladium concentrations in catalytic cross-coupling reactions and their relationship with mole percentage (mol %). Most studies in catalytic cross-coupling chemistry have historically focused on the concentration ratio between (pre)catalyst and the limiting reagent (substrate), expressed as mol %. Several recent papers have outlined the use of “ppm level” palladium as an alternative means of describing catalytic cross-coupling reaction systems. This led us to delve deeper into the literature to assess whether “ppm level” palladium is a practically useful descriptor of catalyst quantities in palladium-catalyzed cross-coupling reactions. Indeed, we conjectured that many reactions could, unknowingly, have employed low “ppm levels” of palladium (pre)catalyst, and generally, what would the spread of ppm palladium look like across a selection of studies reported across the vast array of the cross-coupling chemistry literature. In a few selected examples, we have examined other metal catalyst systems for comparison with palladium.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
tasksource (2023). PLANE-ood [Dataset]. https://huggingface.co/datasets/tasksource/PLANE-ood

PLANE-ood

tasksource/PLANE-ood

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2023
Dataset authored and provided by
tasksource
License

Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically

Description

Preprocessed from https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/ df=pd.read_json('https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/resolve/main/PLANE_trntst-OoV_inftype-all.json') f = lambda df: pd.DataFrame(list(zip(*[df[c] for c in df.index])),columns=df.index) ds=DatasetDict() for split in ['train','test']: dfs=pd.concat([f(df[c]) for c in df.columns if split in c.lower()]).reset_index(drop=True) dfs['label']=dfs['label'].map(lambda x:{1:'entailment'… See the full description on the dataset page: https://huggingface.co/datasets/tasksource/PLANE-ood.

Search
Clear search
Close search
Google apps
Main menu