Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Preprocessed from https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/ df=pd.read_json('https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/resolve/main/PLANE_trntst-OoV_inftype-all.json') f = lambda df: pd.DataFrame(list(zip(*[df[c] for c in df.index])),columns=df.index) ds=DatasetDict() for split in ['train','test']: dfs=pd.concat([f(df[c]) for c in df.columns if split in c.lower()]).reset_index(drop=True) dfs['label']=dfs['label'].map(lambda x:{1:'entailment'… See the full description on the dataset page: https://huggingface.co/datasets/tasksource/PLANE-ood.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation
pip install pandas pyarrow Example
import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])
dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Benchmark Dataset for Deep Learning-based Methods for 3D Topology Optimization.
One can find a description of the provided dataset partitions in Section 3 of Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.
Every dataset container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and a corresponding binarized SIMP solution. Every file of the form {i}.csv contains all voxel-wise information about the sample i. Every file of the form {i}_info.csv file contains scalar parameters of the topology optimization problem, such as material parameters.
This dataset represents topology optimization problems and solutions on the bases of voxels. We define all spatially varying quantities via the voxels' centers -- rather than via the vertices or surfaces of the voxels.
In {i}.csv files, each row corresponds to one voxel in the design space. The columns correspond to ['x', 'y', 'z', 'design_space', 'dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'].
Any of these files with the index i can be imported using pandas by executing:
import pandas as pd
directory = ...
file_path = f'{directory}/{i}.csv'
column_names = ['x', 'y', 'z', 'design_space','dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density']
data = pd.read_csv(file_path, names=column_names)
From this pandas dataframe one can extract the torch tensors of forces F, Dirichlet conditions ωDirichlet, and design space information ωdesign using the following functions:
import torch
def get_shape_and_voxels(data):
shape = data[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1
vox_x = data['x'].values
vox_y = data['y'].values
vox_z = data['z'].values
voxels = [vox_x, vox_y, vox_z]
return shape, voxels
def get_forces_boundary_conditions_and_design_space(data, shape, voxels):
F = torch.zeros(3, *shape, dtype=torch.float32)
F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_x'].values, dtype=torch.float32)
F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_y'].values, dtype=torch.float32)
F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_z'].values, dtype=torch.float32)
ω_Dirichlet = torch.zeros(3, *shape, dtype=torch.float32)
ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_x'].values, dtype=torch.float32)
ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_y'].values, dtype=torch.float32)
ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_z'].values, dtype=torch.float32)
ω_design = torch.zeros(1, *shape, dtype=int)
ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['design_space'].values.astype(int))
return F, ω_Dirichlet, ω_design
The corresponding {i}_info.csv files only have one row with column labels ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'].
Analogously to above, one can import any {i}_info.csv file by executing:
file_path = f'{directory}/{i}_info.csv'
data_info_column_names = ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z']
data_info = pd.read_csv(file_path, names=data_info_column_names)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex Infected abroad - Identifies if person was infected by Covid-19 in Czech republic or abroad Infected in country - code of country from where person came (origin country of Covid-19)
Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.
df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()
df_region = df[df['region'] != ''].groupby(['region']).agg(
region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'),
infected=pd.NamedAgg(column='infected', aggfunc='sum'),
cured=pd.NamedAgg(column='cured', aggfunc='sum'),
death=pd.NamedAgg(column='death', aggfunc='sum')
).reset_index()
df_detail = df[['date','region','sub_region','age','sex','infected','cured','death','infected_abroad','infected_in_country']].reset_index(drop=True)
Thanks to websites of MVCR for sharing such great information.
Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?
Overview
Original dataset page here and dataset available here.
Dataset curation
Added new column label with encoded labels with the following mapping {"entailment": 0, "neutral": 1, "contradiction": 2}
and the columns with parse information are dropped as they are not well formatted. Also, the name of the file from which each instance comes is added in the column dtype.
Code to create the dataset
import pandas as pd from datasets import Dataset… See the full description on the dataset page: https://huggingface.co/datasets/pietrolesci/stress_tests_nli.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GWAS summary statistics for Kishore, Sreelatha, Tenghe et al. “Deciphering the Genetic Architecture of Parkinson’s Disease in India”
Genomic Coordinates: GRCh38
Study Type: Case-Control
PD_GWAS_India_4806Cases_6364Controls_Summary_Statistics.tsv.gz
CHR
: Chromosome numberPOS
: Base pair positionSNP
: SNP identifier (rsID)A1
: Effect alleleA2
: Alternate alleleEAF
: Effect allele frequency BETA
: Effect size for the effect alleleSE
: Standard error of the effect sizeP
: P-value for association2. Meta-Analysis Summary Statistics
PD_Meta_Analysis_India_MultiEthnic_Summary_Statistics.tsv.gz
Columns:
CHR
: Chromosome numberPOS
: Base pair positionSNP
: SNP identifier (rsID)A1
: Effect alleleA2
: Alternate alleleMETA_BETA
: Meta-analyzed effect size for the effect alleleMETA_SE
: Standard error of the meta-analyzed effect sizeMETA_P
: P-value for the meta-analysisI2
: Heterogeneity index across studiesN_STUDIES
: Number of studies EFFECTS
: Summary of effect directions Methods:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This Review examines parts per million (ppm) palladium concentrations in catalytic cross-coupling reactions and their relationship with mole percentage (mol %). Most studies in catalytic cross-coupling chemistry have historically focused on the concentration ratio between (pre)catalyst and the limiting reagent (substrate), expressed as mol %. Several recent papers have outlined the use of “ppm level” palladium as an alternative means of describing catalytic cross-coupling reaction systems. This led us to delve deeper into the literature to assess whether “ppm level” palladium is a practically useful descriptor of catalyst quantities in palladium-catalyzed cross-coupling reactions. Indeed, we conjectured that many reactions could, unknowingly, have employed low “ppm levels” of palladium (pre)catalyst, and generally, what would the spread of ppm palladium look like across a selection of studies reported across the vast array of the cross-coupling chemistry literature. In a few selected examples, we have examined other metal catalyst systems for comparison with palladium.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Preprocessed from https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/ df=pd.read_json('https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/resolve/main/PLANE_trntst-OoV_inftype-all.json') f = lambda df: pd.DataFrame(list(zip(*[df[c] for c in df.index])),columns=df.index) ds=DatasetDict() for split in ['train','test']: dfs=pd.concat([f(df[c]) for c in df.columns if split in c.lower()]).reset_index(drop=True) dfs['label']=dfs['label'].map(lambda x:{1:'entailment'… See the full description on the dataset page: https://huggingface.co/datasets/tasksource/PLANE-ood.