29 datasets found

PandasPlotBench
huggingface.co
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset provided by
JetBrainshttp://jetbrains.com/
Authors
JetBrains Research
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
PandasPlotBench

PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.
Pandas
kaggle.com
zip
Updated Feb 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shail_2604 (2024). Pandas [Dataset]. https://www.kaggle.com/shail2604/pandas
Explore at:
zip(1050 bytes)Available download formats
Dataset updated
Feb 27, 2024
Authors
Shail_2604
Description
Dataset

This dataset was created by Shail_2604

Released under Other (specified in description)

Contents
d
Data from: Constraints on trait combinations explain climatic drivers of...
datadryad.org
zenodo.org
zip
Updated Apr 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John M. Dwyer; Daniel C. Laughlin (2018). Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly [Dataset]. http://doi.org/10.5061/dryad.76kt8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.76kt8
Dataset updated
Apr 27, 2018
Dataset provided by
Dryad
Authors
John M. Dwyer; Daniel C. Laughlin
Time period covered
2018
Description
quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.species.in.quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.Dwyer_&_Laughlin_2017_Trait_covariance_scriptThis script reads in the two dataframes of "raw" data, calculates diversity and trait metrics and runs the major analyses presented in Dwyer & Laughlin 2017.
Z
polyOne Data Set - 100 million hypothetical polymers including 29 properties...
data.niaid.nih.gov
zenodo.org
Updated Mar 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rampi Ramprasad (2023). polyOne Data Set - 100 million hypothetical polymers including 29 properties [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7124187
Explore at:
Dataset updated
Mar 24, 2023
Dataset provided by
Christopher Kuenneth
Rampi Ramprasad
Description
polyOne Data Set

The data set contains 100 million hypothetical polymers each with 29 predicted properties using machine learning models. We use PSMILES strings to represent polymer structures, see here and here. The polymers are generated by decomposing previously synthesized polymers into unique chemical fragments. Random and enumerative compositions of these fragments yield 100 million hypothetical PSMILES strings. All PSMILES strings are chemically valid polymers but, mostly, have never been synthesized before. More information can be found in the paper. Please note the license agreement in the LICENSE file.

Full data set including the properties

The data files are in Apache Parquet format. The files start with polyOne_*.parquet.

I recommend using dask (pip install dask) to load and process the data set. Pandas also works but is slower.

Load sharded data set with dask python import dask.dataframe as dd ddf = dd.read_parquet("*.parquet", engine="pyarrow")

For example, compute the description of data set ```python df_describe = ddf.describe().compute() df_describe

PSMILES strings only generated_polymer_smiles_train.txt - 80 million PSMILES strings for training polyBERT. One string per line. generated_polymer_smiles_dev.txt - 20 million PSMILES strings for testing polyBERT. One string per line.
Z
Data from: I-MAESTRO data: 42 million trees from three large European...
data.niaid.nih.gov
Updated Jul 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Vallet (2024). I-MAESTRO data: 42 million trees from three large European landscapes in France, Poland and Slovenia [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7462440
Explore at:
Dataset updated
Jul 10, 2024
Dataset provided by
Raphaël Aussenac
Jarosław Socha
Mats Mahnken
Matija Klopčič
Patrick Vallet
Martin Gutsch
Jean-Matthieu Monnet
Thomas Cordonnier
Paweł Hawryło
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Europe, Poland, France, Slovenia
Description
Here we present three datasets describing three large European landscapes in France (Bauges Geopark - 89,000 ha), Poland (Milicz forest district - 21,000 ha) and Slovenia (Snežnik forest - 4,700 ha) down to the tree level. Individual trees were generated combining inventory plot data, vegetation maps and Airborne Laser Scanning (ALS) data. Together, these landscapes (hereafter virtual landscapes) cover more than 100,000 ha including about 64,000 ha of forest and consist of more than 42 million trees of 51 different species. For each virtual landscape we provide a table (in .csv format) with the following columns:- cellID25: the unique ID of each 25x25 m² cell- sp: species latin names- n: number of trees. n is an integer >= 1, meaning that a specific set of species "sp", diameter "dbh" and height "h" can be present multiple times in a cell.- dbh: tree diameter at breast height (cm)- h: tree height (m) We also provide, for each virtual landscape, a raster (in .asc format) with the cell IDs (cellID25) which makes data spatialisation possible. The coordinate reference systems are EPSG: 2154 for the Bauges, EPSG: 2180 for Milicz, and EPSG: 3912 for Sneznik. The v2.0.0 presents the algorithm in its final state. Finally, we provide a proof of how our algorithm makes it possible to reach the total BA and the BA proportion of broadleaf trees provided by the ALS mapping using the alpha correction coefficient and how it maintains the Dg ratios observed on the field plots between the different species (see algorithm presented in the associated Open Research Europe article). Below is an example of R code that opens the datasets and creates a tree density map. ------------------------------------------------------------# load package library(terra) library(dplyr)

set work directory

setwd() # define path to the I-MAESTRO_data folder

load tree data

tree <- read.csv2('./sneznik/sneznik_trees.csv', sep = ',')

load spatial data

cellID <- rast('./sneznik/sneznik_cellID25.asc')

set coordinate reference system

Bauges:

crs(cellID) <- "epsg:2154"

Milicz:

crs(cellID) <- "epsg:2180"

Sneznik:

crs(cellID) <- "epsg:3912"

convert raster into dataframe

cellIDdf <- as.data.frame(cellID) colnames(cellIDdf) <- 'cellID25'

calculate tree density from tree dataframe

dens <- tree %>% group_by(cellID25) %>% summarise(n = sum(n))

merge the two dataframes

dens <- left_join(cellIDdf, dens, join_by(cellID25))

add density to raster

cellID$dens <- dens$n

plot density map

plot(cellID$dens)
Tabular representation of a parameter mapping.
plos.figshare.com
xls
Updated Jun 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Marie Helleckes; Michael Osthege; Wolfgang Wiechert; Eric von Lieres; Marco Oldiges (2023). Tabular representation of a parameter mapping. [Dataset]. http://doi.org/10.1371/journal.pcbi.1009223.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1009223.t001
Dataset updated
Jun 12, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Laura Marie Helleckes; Michael Osthege; Wolfgang Wiechert; Eric von Lieres; Marco Oldiges
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With columns corresponding to the parameter names of a naive Monod process model, the parametrization of each replicate, identified by a replicate ID (rid) is specified in a tabular format. Parameter identifiers that appear multiple times (e.g. S0) correspond to a parameter shared across replicates. Accordingly, replicate-local parameters names simply do not appear multiple times (e.g. X0_A06). Numeric entries are interpreted as fixed values and will be left out of parameter estimation. Columns do not need to be homogeneously fixed/shared/local, but parameters can only be shared within the same column. The parameter mapping can be provided as a DataFrame object.
o
AAVE v3 (Ethereum) user behaviour summary dataset
market.oceanprotocol.com
Updated Aug 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
samtin (2023). AAVE v3 (Ethereum) user behaviour summary dataset [Dataset]. https://market.oceanprotocol.com/asset/did:op:1a0c57514f0089206b920924e5000071dda3341e9162f54f189df9890aa7a163
Explore at:
Dataset updated
Aug 21, 2023
Dataset authored and provided by
samtin
License
https://market.oceanprotocol.com/termshttps://market.oceanprotocol.com/terms
Description
Until August 18th, 2023.

Aave User Data Description

This DataFrame consists of user-specific data related to the Aave protocol, with 11,899 entries across 18 different columns. Here is the detailed description of each column:

user (object): The address of the user involved in the transactions. cohort_ID (period[M]): A cohort identifier for grouping related transactions. first_transaction_date (datetime64[ns, UTC]): The date of the user's first transaction. is_aave_v2_user (bool): A flag indicating whether the user has interacted with Aave's V2 protocol. total_transactions (int64): The total number of transactions performed by the user. total_usd_transacted (float64): The total amount in USD transacted by the user. avg_transaction_value (float64): The average value of the user's transactions in USD. total_usd_flashloans (float64): The total USD value of flash loans taken by the user. number_of_deposits (int64): The total number of deposits made by the user. number_of_borrows (int64): The total number of borrows made by the user. number_of_repays (int64): The total number of repayments made by the user. number_of_liquidations (int64): The total number of liquidations executed by the user. number_of_withdraws (int64): The total number of withdrawals made by the user. average_ltv (float64): The average loan-to-value ratio for the user (6012 non-null entries). average_ltv_borrow (float64): The average loan-to-value ratio for borrows specifically (6020 non-null entries). number_of_unique_symbols_transacted (int64): The number of unique symbols the user has transacted with. trading_assets_category_type (object): The category type of assets traded by the user. number_of_unique_months_active (int64): The number of unique months the user has been active.

The data types include bool, datetime64[ns, UTC], float64, int64, object, and period[M], with a total memory usage of 1.6+ MB.
f
Table4_Whole genome bisulfite sequencing reveals DNA methylation roles in...
figshare.com
xlsx
Updated Jun 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaodie Jie; Honglin Wu; Miao Yang; Ming He; Guangqing Zhao; Shanshan Ling; Yan Huang; Bisong Yue; Nan Yang; Xiuyue Zhang (2023). Table4_Whole genome bisulfite sequencing reveals DNA methylation roles in the adaptive response of wildness training giant pandas to wild environment.XLSX [Dataset]. http://doi.org/10.3389/fgene.2022.995700.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.995700.s004
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Xiaodie Jie; Honglin Wu; Miao Yang; Ming He; Guangqing Zhao; Shanshan Ling; Yan Huang; Bisong Yue; Nan Yang; Xiuyue Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DNA methylation modification can regulate gene expression without changing the genome sequence, which helps organisms to rapidly adapt to new environments. However, few studies have been reported in non-model mammals. Giant panda (Ailuropoda melanoleuca) is a flagship species for global biodiversity conservation. Wildness and reintroduction of giant pandas are the important content of giant pandas’ protection. However, it is unclear how wildness training affects the epigenetics of giant pandas, and we lack the means to assess the adaptive capacity of wildness training giant pandas. We comparatively analyzed genome-level methylation differences in captive giant pandas with and without wildness training to determine whether methylation modification played a role in the adaptive response of wildness training pandas. The whole genome DNA methylation sequencing results showed that genomic cytosine methylation ratio of all samples was 5.35%–5.49%, and the methylation ratio of the CpG site was the highest. Differential methylation analysis identified 544 differentially methylated genes (DMGs). The results of KEGG pathway enrichment of DMGs showed that VAV3, PLCG2, TEC and PTPRC participated in multiple immune-related pathways, and may participate in the immune response of wildness training giant pandas by regulating adaptive immune cells. A large number of DMGs enriched in GO terms may also be related to the regulation of immune activation during wildness training of giant pandas. Promoter differentially methylation analysis identified 1,199 genes with differential methylation at promoter regions. Genes with low methylation level at promoter regions and high expression such as, CCL5, P2Y13, GZMA, ANP32A, VWF, MYOZ1, NME7, MRPS31 and TPM1 were important in environmental adaptation for wildness training giant pandas. The methylation and expression patterns of these genes indicated that wildness training giant pandas have strong immunity, blood coagulation, athletic abilities and disease resistance. The adaptive response of giant pandas undergoing wildness training may be regulated by their negatively related promoter methylation. We are the first to describe the DNA methylation profile of giant panda blood tissue and our results indicated methylation modification is involved in the adaptation of captive giant pandas when undergoing wildness training. Our study also provided potential monitoring indicators for the successful reintroduction of valuable and threatened animals to the wild.
h
bank-of-ghana-treasury-bills
huggingface.co
Updated Jun 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theophilus Siameh (2021). bank-of-ghana-treasury-bills [Dataset]. https://huggingface.co/datasets/worldboss/bank-of-ghana-treasury-bills
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2021
Authors
Theophilus Siameh
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description 🙅‍♂️🤖

Bank of Ghana historical and real-time treasury bills data. Bank of Ghana Click Here:

Data Format

{ "issue_date": "...", "tender": "...", "security_type": "...", "discount_rate": "...", "interest_rate": "..." }

Load Dataset

pip install datasets

from datasets import load_dataset

treasury = load_dataset("worldboss/bank-of-ghana-treasury-bills", split="train")

pd.DataFrame(treasury).head()… See the full description on the dataset page: https://huggingface.co/datasets/worldboss/bank-of-ghana-treasury-bills.
h
PLANE-ood
huggingface.co
Updated Sep 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tasksource (2023). PLANE-ood [Dataset]. https://huggingface.co/datasets/tasksource/PLANE-ood
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2023
Dataset authored and provided by
tasksource
License
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Description
Preprocessed from https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/ df=pd.read_json('https://huggingface.co/datasets/lorenzoscottb/PLANE-ood/resolve/main/PLANE_trntst-OoV_inftype-all.json') f = lambda df: pd.DataFrame(list(zip(*[df[c] for c in df.index])),columns=df.index) ds=DatasetDict() for split in ['train','test']: dfs=pd.concat([f(df[c]) for c in df.columns if split in c.lower()]).reset_index(drop=True) dfs['label']=dfs['label'].map(lambda x:{1:'entailment'… See the full description on the dataset page: https://huggingface.co/datasets/tasksource/PLANE-ood.
Covid-19 Czech Republic
kaggle.com
zip
Updated Jul 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michal Brezak (2020). Covid-19 Czech Republic [Dataset]. https://www.kaggle.com/michalbrezk/covid19-czech-republic
Explore at:
zip(116897 bytes)Available download formats
Dataset updated
Jul 3, 2020
Authors
Michal Brezak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Czechia
Description
Context

This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Columns description

Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex

Data granularity

Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.

Day level

df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()

Region level

df_region = df[df['region'] != ''].groupby(['region']).agg( region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'), infected=pd.NamedAgg(column='infected', aggfunc='sum'), cured=pd.NamedAgg(column='cured', aggfunc='sum'), death=pd.NamedAgg(column='death', aggfunc='sum') ).reset_index()

Detail level

df_detail = df[['date','region','sub_region','age','sex','infected','cured','death']].reset_index(drop=True)

Acknowledgements

Thanks to websites of MVCR for sharing such great information.

Inspiration

Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?
Covid-19 Czech Republic
kaggle.com
Updated Jul 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michal Brezak (2020). Covid-19 Czech Republic [Dataset]. https://www.kaggle.com/michalbrezk/covid19-czech-republic/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Michal Brezak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Czechia
Description
Context

This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Columns description

Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex Infected abroad - Identifies if person was infected by Covid-19 in Czech republic or abroad Infected in country - code of country from where person came (origin country of Covid-19)

Data granularity

Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.

Day level

df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()

Region level

df_region = df[df['region'] != ''].groupby(['region']).agg( region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'), infected=pd.NamedAgg(column='infected', aggfunc='sum'), cured=pd.NamedAgg(column='cured', aggfunc='sum'), death=pd.NamedAgg(column='death', aggfunc='sum') ).reset_index()

Detail level

df_detail = df[['date','region','sub_region','age','sex','infected','cured','death','infected_abroad','infected_in_country']].reset_index(drop=True)

Acknowledgements

Thanks to websites of MVCR for sharing such great information.

Inspiration

Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?
Weather dataset from Otemma glacier forefield, Switzerland (from 14 July...
zenodo.org
data.niaid.nih.gov
csv, html, txt, zip
Updated Mar 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom Müller; Tom Müller (2022). Weather dataset from Otemma glacier forefield, Switzerland (from 14 July 2019 to 18 November 2021) [Dataset]. http://doi.org/10.5281/zenodo.6106778
Explore at:
txt, zip, csv, htmlAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6106778
Dataset updated
Mar 10, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tom Müller; Tom Müller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Switzerland, Otemma Glacier
Description
Weather data collected in the Otemma forefield (Switzerland) from 14 July 2019 to 18 November 2021.
Data were collected by the research teams of Bettina Schaefli² and Stuart N. Lane¹.

¹ Institute of Earth Surface Dynamics (IDYST), University of Lausanne, 1015 Lausanne, Switzerland

² Institute of Geography (GIUB), University of Bern, 3012 Bern, Switzerland

For further information, please contact:
tom.muller.1@unil.ch
bettina.schaefli@giub.unibe.ch

Description of data : WeatherData.csv

Time span of data : 14 July 2019 to 18 November 2021
Time step : homogenized 10 minutes-averaged data

Location of data (Coordinate in SWISS LV95 (EPSG:2056)
- Glacier snout Station : 2598615 / 1087375
- Glacier center Station : 2600495 / 1088631
- Floodplain Station : 2598096 / 1087087

STRUCTURE OF DATA : tidy dataframe with following headers :
- date : local date (UTC+01 with daylight saving time)
- variable : parameter of interest, with following classes :
Air_humidity : Air humidity in percent of air saturation [%]
Air_temperature : Air temperature in [°C]
Atm_pressure : Atmospheric pressure in [hPa]
Incoming_radiation : Incoming shortwave radiation in [W/m2]
Precipitation : Liquid precipitation measured [mm]
- name : location of data (see coordinates above)
- dateUTC : date with UTC timezone

Device used for data acquisition :
- Air_humidity/Air_temperature/Atm_pressure : Decagon VP-4
- Incoming_radiation : Apogee Instruments SP-11
- Precipitation : Double tipping buckets rain gauge from Davis Instruments (resolution 0.2 mm)

Description of data : RainComposite_Otemma_Arolla.csv

Time span of data : 01 July 2019 to 18 November 2021
Time step : homogenized 10 minutes-averaged data

The dataset compiles the measured rain during summer at the closest weather station (Glacier snout).
For the winter period (and few gaps during the summer), the solid precipitations (snow) from the closest MeteoSwiss weather stations (SwissMetNet) were used.
Gaps where filled with 1) MeteoSwiss Otemma and if data were still missing filled with 2) MeteoSwiss Arolla

Location of data (Coordinate in SWISS LV95 (EPSG:2056)
- Glacier snout Station : 2598615 / 1087375
- Otemma camp Station : 2597508 / 1086653
- MeteoSwiss Station : 2596476 / 1085864
- MeteoSwiss Arolla : 2603507 / 1095832

STRUCTURE OF DATA : tidy dataframe with following headers :
- date : local date (UTC+01 with dst)
- variable : parameter of interest
Precipitation : Liquid and solid precipitation [mm]. Composite dataset composed of melted snow (snow Water Equivalent, in mm, from MeteoSwiss station) and Rain (in mm from Glacier station).
- Location : location of data (see above)
- dateUTC : date in UTC timezone

Device used for data acquisition :
- Glacier snout Station : Double tipping buckets rain gauge from Davis Instruments (resolution 0.2 mm)
- Otemma camp Station : Double tipping buckets rain gauge, Spectrum WatchDog 1120 (resolution 0.25 mm)
- MeteoSwiss : see SwissMetNet project

Description of data : Otemma_weather_Plot_alldata.html

An interactive plot generated with python plotly (open in web browser) containing all above described data.
PP2021 - Augmented KFold TFRecords (2/4)
kaggle.com
zip
Updated Apr 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Kuzmenkov (2021). PP2021 - Augmented KFold TFRecords (2/4) [Dataset]. https://www.kaggle.com/nickuzmenkov/pp2021-kfold-tfrecords-1
Explore at:
zip(13051777813 bytes)Available download formats
Dataset updated
Apr 13, 2021
Authors
Nick Kuzmenkov
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

Dataset of TFRecords files made from Plant Pathology 2021 original competition data. Changes: * labels column of the initial train.csv DataFrame was binarized to multi-label format columns: complex, frog_eye_leaf_spot, healthy, powdery_mildew, rust, and scab * images were scaled to 512x512 * 77 duplicate images having different labels were removed (see the context in this notebook) * samples were stratified and split into 5 folds (see corresponding folders fold_0:fold_4) * images were heavily augmented with albumentations library (for raw images see this dataset) * each folder contains 5 copies of randomly augmented initial images (so that the model never meets the same images)

I suggest adding all 5 datasets to your notebook: 4 augmented datasets = 20 epochs of unique images (1, 2, 3, 4) + 1 raw dataset for validation here.

For a complete example see my TPU Training Notebook

Contents:

preprocessed DataFrame train.csv

fold indexes DataFrame folds.csv

fold_0:fold_4 folders containing 64 .tfrec files, respectively, with feature map shown below: feature_map = { 'image': tf.io.FixedLenFeature([], tf.string), 'name': tf.io.FixedLenFeature([], tf.string), 'complex': tf.io.FixedLenFeature([], tf.int64), 'frog_eye_leaf_spot': tf.io.FixedLenFeature([], tf.int64), 'healthy': tf.io.FixedLenFeature([], tf.int64), 'powdery_mildew': tf.io.FixedLenFeature([], tf.int64), 'rust': tf.io.FixedLenFeature([], tf.int64), 'scab': tf.io.FixedLenFeature([], tf.int64)} ### Acknowledgements

photo from Unsplash here
e
A global database of long-term changes in insect assemblages
knb.ecoinformatics.org
search-dev.test.dataone.org
+4more
Updated Jan 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roel van Klink; Diana E. Bowler; Jonathan M. Chase; Orr Comay; Michael M. Driessen; S.K. Morgan Ernest; Alessandro Gentile; Francis Gilbert; Konstantin Gongalsky; Jennifer Owen; Guy Pe'er; Israel Pe'er; Vincent H. Resh; Ilia Rochlin; Sebastian Schuch; Ann E. Swengel; Scott R. Swengel; Thomas L. Valone; Rikjan Vermeulen; Tyson Wepprich; Jerome Wiedmann (2022). A global database of long-term changes in insect assemblages [Dataset]. http://doi.org/10.5063/F1ZC817H
Explore at:
Unique identifier
https://doi.org/10.5063/F1ZC817H
Dataset updated
Jan 26, 2022
Dataset provided by
Knowledge Network for Biocomplexity
Authors
Roel van Klink; Diana E. Bowler; Jonathan M. Chase; Orr Comay; Michael M. Driessen; S.K. Morgan Ernest; Alessandro Gentile; Francis Gilbert; Konstantin Gongalsky; Jennifer Owen; Guy Pe'er; Israel Pe'er; Vincent H. Resh; Ilia Rochlin; Sebastian Schuch; Ann E. Swengel; Scott R. Swengel; Thomas L. Valone; Rikjan Vermeulen; Tyson Wepprich; Jerome Wiedmann
Time period covered
Jan 1, 1925 - Jan 1, 2018
Area covered
Pacific Ocean, North Pacific Ocean
Variables measured
End, Link, Year, Realm, Start, CRUmnC, CRUmnK, Metric, Number, Period, and 63 more
Description
UPDATED on October 15 2020 After some mistakes in some of the data were found, we updated this data set. The changes to the data are detailed on Zenodo (http://doi.org/10.5281/zenodo.4061807), and an Erratum has been submitted. This data set under CC-BY license contains time series of total abundance and/or biomass of assemblages of insect, arachnid and Entognatha assemblages (grouped at the family level or higher taxonomic resolution), monitored by standardized means for ten or more years. The data were derived from 165 data sources, representing a total of 1668 sites from 41 countries. The time series for abundance and biomass represent the aggregated number of all individuals of all taxa monitored at each site. The data set consists of four linked tables, representing information on the study level, the plot level, about sampling, and the measured assemblage sizes. all references to the original data sources can be found in the pdf with references, and a Google Earth file (kml) file presents the locations (including metadata) of all datasets. When using (parts of) this data set, please respect the original open access licenses. This data set underlies all analyses performed in the paper 'Meta-analysis reveals declines in terrestrial, but increases in freshwater insect abundances', a meta-analysis of changes in insect assemblage sizes, and is accompanied by a data paper entitled 'InsectChange – a global database of temporal changes in insect and arachnid assemblages'. Consulting the data paper before use is recommended. Tables that can be used to calculate trends of specific taxa and for species richness will be added as they become available. The data set consists of four tables that are linked by the columns 'DataSource_ID'. and 'Plot_ID', and a table with references to original research. In the table 'DataSources', descriptive data is provided at the dataset level: Links are provided to online repositories where the original data can be found, it describes whether the dataset provides data on biomass, abundance or both, the invertebrate group under study, the realm, and describes the location of sampling at different geographic scales (continent to state). This table also contains a reference column. The full reference to the original data is found in the file 'References_to_original_data_sources.pdf'. In the table 'PlotData' more details on each site within each dataset are provided: there is data on the exact location of each plot, whether the plots were experimentally manipulated, and if there was any spatial grouping of sites (column 'Location'). Additionally, this table contains all explanatory variables used for analysis, e.g. climate change variables, land-use variables, protection status. The table 'SampleData' describes the exact source of the data (table X, figure X, etc), the extraction methods, as well as the sampling methods (derived from the original publications). This includes the sampling method, sampling area, sample size, and how the aggregation of samples was done, if reported. Also, any calculations we did on the original data (e.g. reverse log transformations) are detailed here, but more details are provided in the data paper. This table links to the table 'DataSources' by the column 'DataSource_ID'. Note that each datasource may contain multiple entries in the 'SampleData' table if the data were presented in different figures or tables, or if there was any other necessity to split information on sampling details. The table 'InsectAbundanceBiomassData' provides the insect abundance or biomass numbers as analysed in the paper. It contains columns matching to the tables 'DataSources' and 'PlotData', as well as year of sampling, a descriptor of the period within the year of sampling (this was used as a random effect), the unit in which the number is reported (abundance or biomass), and the estimated abundance or biomass. In the column for Number, missing data are included (NA). The years with missing data were added because this was essential for the analysis performed, and retained here because they are easier to remove than to add. Linking the table 'InsectAbundanceBiomassData.csv' with 'PlotData.csv' by column 'Plot_ID', and with 'DataSources.csv' by column 'DataSource_ID' will provide the full dataframe used for all analyses. Detailed explanations of all column headers and terms are available in the ReadMe file, and more details will be available in the forthcoming data paper. WARNING: Because of the disparate sampling methods and various spatial and temporal scales used to collect the original data, this dataset should never be used to test for differences in insect abundance/biomass among locations (i.e. differences in intercept). The data can only be used to study temporal trends, by testing for differences in slopes. The data are standardized within plots to allow the temporal comparison, but not necessarily among plots (even within one dataset).
u
Data from: BSRN solar radiation data for the testing, validation and...
portaldelainvestigacion.uma.es
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BSRN solar radiation data for the testing, validation and benchmarking of solar irradiance components separation models [Dataset]. https://portaldelainvestigacion.uma.es/documentos/67321e25aea56d4af04851a1
Explore at:
Dataset updated
2024
Authors
Ruiz-Arias, Jose A; Ruiz-Arias, Jose A
Description
The dataset is an excerpt of the validation dataset used in:

Ruiz-Arias JA, Gueymard CA. Review and performance benchmarking of 1-min solar irradiance components separation methods: The critical role of dynamically-constrained sky conditions. Submitted for publication to Renewable and Sustainable Energy Reviews.

and it is ready to use in the Python package splitting_models developed during that research. See the documentation in the Python package for usage details. Below, there is a detailed description of the dataset.

The data is in a single parquet file that contains 1-min time series of solar geometry, clear-sky solar irradiance simulations, solar irradiance observations and CAELUS sky types for 5 BSRN sites, one per primary Köppen-Geiger climate, namely: Minamitorishima (mnm), JP, for equatorial climate; Alice Springs (asp), AU, for dry climate; Carpentras (car), FR, for temperate climate; Bondville (bon), US, for continental climate; and Sonnblick (son), AT, for cold/polar/snow climate. It includes one calendar year per site. The BSRN data is publicly available. See download instructions in https://bsrn.awi.de/data.

The specific variables included in the dataset are:

climate: primary Köppen-Geiger climate. Values are: A (equatorial), B (dry), C (temperate), D (continental) and E (polar/snow).

longitude: longitude, in degrees east.

latitude: latitude, in degrees north.

sza: solar zenith angle, in degrees.

eth: extraterrestrial solar irradiance (i.e., top of atmosphere solar irradiance), in W/m2.

ghics: clear-sky global solar irradiance, in W/m2. It is evaluated with the SPARTA clear-sky model and MERRA-2 clear-sky atmosphere.

difcs: clear-sky diffuse solar irradiance, in W/m2.It is evaluated with the SPARTA clear-sky model and MERRA-2 clear-sky atmosphere.

ghicda: clean-and-dry clear-sky global solar irradiance, in W/m2. It is evaluated with the SPARTA clear-sky model and MERRA-2 clear-sky atmosphere, prescribing zero aerosols and zero precipitable water.

ghi: observed global horizontal irradiance, in W/m2.

dif: observed diffuse irradiance, in W/m2.

sky_type: CAELUS sky type. Values are: 1 (unknown), 2 (overcast), 3 (thick clouds), 4 (scattered clouds), 5 (thin clouds), 6 (cloudless) and 7 (cloud enhancement).

The dataset can be easily loaded in a Python Pandas DataFrame as follows:

import pandas as pd

data = pd.read_parquet(

The dataframe has a multi-index with two levels: times_utc and site. The former are the UTC timestamps at the center of each 1-min interval. The latter is each site's label.
Z
Gamma rays from dark matter spikes in EAGLE simulations - IMBH mock...
data.niaid.nih.gov
zenodo.org
Updated Jun 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aschersleben, Jann (2024). Gamma rays from dark matter spikes in EAGLE simulations - IMBH mock catalogue [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10491704
Explore at:
Dataset updated
Jun 5, 2024
Dataset provided by
Peletier, Reynier
Vecchi, Manuela
Bertone, Gianfranco
Aschersleben, Jann
Moulin, Emmanuel
Horns, Dieter
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
DArk Matter SPIkes (DAMSPI) is a fully Python-based software for the analysis of dark matter spikes around Intermediate Mass Black Holes (IMBHs) in the Milky Way. It allows to extract an IMBH catalogue and their corresponding dark matter spike parameters from the EAGLE simulations in order to probe a potential gamma-ray signal from dark matter self-annihilation.

The dataset contains the IMBH catalogue including, among others, the coordinates, mass, formation redshift and spike parameters for each individual IMBH. Each column of the catalogue is described in detail in J. Aschersleben et al. (2024). We also provide separate files for which we calculated the gamma-ray fluxes for different dark matter masses and annihilation cross sections. Lastly, we provide a catalogue of our selection of Milky Way like galaxies within EAGLE. The columns of these files are also described in J. Aschersleben et al. (2024).

The source code to extract this dataset is publicy available here:

https://doi.org/10.5281/zenodo.11488472

Description of the data files

The imbh_catalogue/imbh/ directory contains the following files:

catalogue_nfw.h5

catalogue_cored_gamma_0p3.h5

catalogue_cored_gamma_0p9.h5

catalogue_cored_gamma_free.h5

They contain the IMBH catalogues, including the coordinates and dark matter spike parameters, calculated assuming the 1.) NFW profile, 2.) cored profile with a fixed core index of 0.0, 3.) cored profile with a fixed core index of 0.4 and 4.) cored profile with the core index as a free fitting parameter.

The imbh_catalogue/flux/

The imbh_catalogue/galaxy/ directory contains the mw_galaxies_catalogue_nfw.h5 file which contains our selection of Milky Way-like galaxies within EAGLE.

The HDF files can be opened in Python with:

import pandas as pd

file_path = "

df = pd.read_hdf(file_path, key="table")

Printing the first few rows of the DataFrame

print(df.head())
Z
Datasets for "Irradiance and cloud optical properties from solar...
data.niaid.nih.gov
zenodo.org
Updated Sep 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deneke, Hartwig (2023). Datasets for "Irradiance and cloud optical properties from solar photovoltaic systems" (final version) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7628154
Explore at:
Dataset updated
Sep 14, 2023
Dataset provided by
Hofbauer, Philipp
Meilinger, Stefanie
Gödde, Felix
Witthuhn, Jonas
Mayer, Bernhard
Emde, Claudia
Scheck, Leonhard
Grabenstein, Johannes
Buchmann, Tina
Herman-Czezuch, Anna
Schirrmeister, Christopher
Barry, James
Struck, Matthias
Kimiaie, Nicola
Deneke, Hartwig
Schroedter-Homscheidt, Marion
Pfeilsticker, Klaus
Yousif, Rone
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains all the relevant data for the algorithms described in the paper "Irradiance and cloud optical properties from solar photovoltaic systems", which were developed within the framework of the MetPVNet project.

Input data:

COSMO weather model data (DWD) as NetCDF files (cosmo_d2_2018(9).tar.gz)

COSMO atmospheres for libRadtran (cosmo_atmosphere_libradtran_input.tar.gz)

COSMO surface data for calibration (cosmo_pvcal_output.tar.gz)

Aeronet data as text files (MetPVNet_Aeronet_Input_Data.zip)

Measured data from the MetPVNet measurement campaigns as text files (MetPVNet_Messkampagne_2018(9).tar.gz)

PV power data

Horizontal and tilted irradiance from pyranometers

Longwave irradiance from pyrgeometer

MYSTIC-based lookup table for translated tilted to horizontal irradiance (gti2ghi_lut_v1.nc)

Output data:

Global tilted irradiance (GTI) inferred from PV power plants (with calibration parameters in comments)

Linear temperature model: MetPVNet_gti_cf_inversion_results_linear.tar.gz

Faiman non-linear temperature model: MetPVNet_gti_cf_inversion_results_faiman.tar.gz

Global horizontal irradiance (GHI) inferred from PV power plants

Linear temperature model: MetPVNet_ghi_inversion_results_linear.tar.gz

Faiman non-linear temperature model: MetPVNet_ghi_inversion_results_faiman.tar.gz

Combined GHI averaged to 60 minutes and compared with COSMO data

Linear temperature model: MetPVNet_ghi_inversion_combo_60min_results_linear.tar.gz

Faiman non-linear temperature model: MetPVNet_ghi_inversion_combo_60min_results_faiman.tar.gz

Cloud optical depth inferred from PV power plants

Linear temperature model: MetPVNet_cod_cf_inversion_results_linear.tar.gz

Faiman non-linear temperature model: MetPVNet_cod_cf_inversion_results_faiman.tar.gz

Combined COD averaged to 60 minutes and compared with COSMO and APOLLO_NG data

Linear temperature model: MetPVNet_cod_inversion_combo_60min_results_linear.tar.gz

Faiman non-linear temperature model: MetPVNet_cod_inversion_combo_60min_results_faiman.tar.gz

Validation data:

COSMO cloud optical depth (cosmo_cod_output.tar.gz)

APOLLO_NG cloud optical depth (MetPVNet_apng_extract_all_stations_2018(9).tar.gz)

COSMO irradiance data for validation (cosmo_irradiance_output.tar.gz)

CAMS irradiance data for validation (CAMS_irradiation_detailed_MetPVNet_MK_2018(9).zip)

How to import results:

The results files are stored as text files ".dat", using Python multi-index columns. In order to import the data into a Pandas dataframe, use the following lines of code (replace [filename] with the relevant file name):

import pandas as pd data = pd.read_csv("[filename].dat",comment='#',header=[0,1],delimiter=';',index_col=0,parse_dates=True)

This gives a multi-index Dataframe with the index column the timestamp, the first column label corresponds to the measured variable and the second column to the relevant sensor

Note:

The output data has been updated to match the latest version of the paper, whereas the input and validation data remains the same as in Version 1.0.0
compas-recidivism
huggingface.co
Updated Aug 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
imodels (2022). compas-recidivism [Dataset]. https://huggingface.co/datasets/imodels/compas-recidivism
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 13, 2022
Dataset provided by
Authors
imodels
License
https://choosealicense.com/licenses/undefined/https://choosealicense.com/licenses/undefined/
Description
Port of the compas-recidivism dataset from propublica (github here). See details there and use carefully, as there are serious known social impacts and biases present in this dataset. Basic preprocessing done by the imodels team in this notebook. The target is the binary outcome is_recid.

Sample usage

Load the data: from datasets import load_dataset

dataset = load_dataset("imodels/compas-recidivism") df = pd.DataFrame(dataset['train']) X = df.drop(columns=['is_recid']) y =… See the full description on the dataset page: https://huggingface.co/datasets/imodels/compas-recidivism.
SELTO Dataset
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7034899
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7034899
Dataset updated
May 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Benchmark Dataset for Deep Learning-based Methods for 3D Topology Optimization.

One can find a description of the provided dataset partitions in Section 3 of Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

Every dataset container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and a corresponding binarized SIMP solution. Every file of the form {i}.csv contains all voxel-wise information about the sample i. Every file of the form {i}_info.csv file contains scalar parameters of the topology optimization problem, such as material parameters.

This dataset represents topology optimization problems and solutions on the bases of voxels. We define all spatially varying quantities via the voxels' centers -- rather than via the vertices or surfaces of the voxels.
In {i}.csv files, each row corresponds to one voxel in the design space. The columns correspond to ['x', 'y', 'z', 'design_space', 'dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'].

x, y, z - These are three integer indices stating the index/location of the voxel within the voxel mesh.

design_space - This is one ternary variable indicating the type of material density constraint on the voxel within the TO problem formulation. "0" and "1" indicate a material density fixed at 0 or 1, respectively. "-1" indicates the absence of constraints.

dirichlet_x, dirichlet_y, dirichlet_z - These are three binary variables defining whether the voxel contains homogenous Dirichlet constraints in the respective axis direction.

force_x, force_y, force_z - These are three floating point variables giving the three spacial components of the forces applied to each voxel. All forces are body forces given in [N/m^3].

density - This is a binary variable stating whether the voxel carries material in the solution of the topology optimization problem.

Any of these files with the index i can be imported using pandas by executing:

import pandas as pd directory = ... file_path = f'{directory}/{i}.csv' column_names = ['x', 'y', 'z', 'design_space','dirichlet_x', 'dirichlet_y', 'dirichlet_z', 'force_x', 'force_y', 'force_z', 'density'] data = pd.read_csv(file_path, names=column_names)

From this pandas dataframe one can extract the torch tensors of forces F, Dirichlet conditions ω_Dirichlet, and design space information ω_design using the following functions:

import torch def get_shape_and_voxels(data): shape = data[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1 vox_x = data['x'].values vox_y = data['y'].values vox_z = data['z'].values voxels = [vox_x, vox_y, vox_z] return shape, voxels def get_forces_boundary_conditions_and_design_space(data, shape, voxels): F = torch.zeros(3, *shape, dtype=torch.float32) F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_x'].values, dtype=torch.float32) F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_y'].values, dtype=torch.float32) F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['force_z'].values, dtype=torch.float32) ω_Dirichlet = torch.zeros(3, *shape, dtype=torch.float32) ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_x'].values, dtype=torch.float32) ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_y'].values, dtype=torch.float32) ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(data['dirichlet_z'].values, dtype=torch.float32) ω_design = torch.zeros(1, *shape, dtype=int) ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['design_space'].values.astype(int)) return F, ω_Dirichlet, ω_design

The corresponding {i}_info.csv files only have one row with column labels ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'].

E - Young's modulus [Pa]

ν - Poisson's ratio [-]

σ_ys - Yield stress [Pa]

vox_size - Length of the edge of a (cube-shaped) voxel [m]

p_x, p_y, p_z - Location of the root of the design space [m]

Analogously to above, one can import any {i}_info.csv file by executing:

file_path = f'{directory}/{i}_info.csv' data_info_column_names = ['E', 'ν', 'σ_ys', 'vox_size', 'p_x', 'p_y', 'p_z'] data_info = pd.read_csv(file_path, names=data_info_column_names)

Facebook

Twitter

Click to copy link

Link copied

Cite

PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench

PandasPlotBench

JetBrains-Research/PandasPlotBench

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 25, 2024

Dataset provided by

JetBrainshttp://jetbrains.com/

Authors

JetBrains Research

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

PandasPlotBench

PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.

Clear search

Close search

Google apps

Main menu

PandasPlotBench

Pandas

Dataset

Contents

Data from: Constraints on trait combinations explain climatic drivers of...

polyOne Data Set - 100 million hypothetical polymers including 29 properties...

Data from: I-MAESTRO data: 42 million trees from three large European...

set work directory

load tree data

load spatial data

set coordinate reference system

Bauges:

crs(cellID) <- "epsg:2154"

Milicz:

crs(cellID) <- "epsg:2180"

Sneznik:

crs(cellID) <- "epsg:3912"

convert raster into dataframe

calculate tree density from tree dataframe

merge the two dataframes

add density to raster

plot density map

Tabular representation of a parameter mapping.

AAVE v3 (Ethereum) user behaviour summary dataset

Table4_Whole genome bisulfite sequencing reveals DNA methylation roles in...

bank-of-ghana-treasury-bills

PLANE-ood

Covid-19 Czech Republic

Context

Content

Columns description

Data granularity

Day level

Region level

Detail level

Acknowledgements

Inspiration

Covid-19 Czech Republic

Context

Content

Columns description

Data granularity

Day level

Region level

Detail level

Acknowledgements

Inspiration

Weather dataset from Otemma glacier forefield, Switzerland (from 14 July...

PP2021 - Augmented KFold TFRecords (2/4)

Description

Contents:

A global database of long-term changes in insect assemblages

Data from: BSRN solar radiation data for the testing, validation and...

Gamma rays from dark matter spikes in EAGLE simulations - IMBH mock...

Printing the first few rows of the DataFrame

Datasets for "Irradiance and cloud optical properties from solar...

compas-recidivism

SELTO Dataset

PandasPlotBench

JetBrains-Research/PandasPlotBench