75 datasets found

R
Dataset made from a Pandas Dataframe
peter.demo.socrata.com
csv, xlsx, xml
Updated Jul 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). Dataset made from a Pandas Dataframe [Dataset]. https://peter.demo.socrata.com/dataset/Dataset-made-from-a-Pandas-Dataframe/w2r9-3vfi
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
Jul 5, 2017
Description
a description
h
example-data-frame
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Robotics Ethics Society (PUCRS), example-data-frame [Dataset]. https://huggingface.co/datasets/AiresPucrs/example-data-frame
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
AI Robotics Ethics Society (PUCRS)
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Example DataFrame (Teeny-Tiny Castle)

This dataset is part of a tutorial tied to the Teeny-Tiny Castle, an open-source repository containing educational tools for AI Ethics and Safety research.

How to Use

from datasets import load_dataset

dataset = load_dataset("AiresPucrs/example-data-frame", split = 'train')
Learn Data Science Series Part 1
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rupesh Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

Chapter 1: Getting started with pandas

Chapter 2: Analysis: Bringing it all together and making decisions

Chapter 3: Appending to DataFrame

Chapter 4: Boolean indexing of dataframes

Chapter 5: Categorical data

Chapter 6: Computational Tools

Chapter 7: Creating DataFrames

Chapter 8: Cross sections of different axes with MultiIndex

Chapter 9: Data Types

Chapter 10: Dealing with categorical variables

Chapter 11: Duplicated data

Chapter 12: Getting information about DataFrames

Chapter 13: Gotchas of pandas

Chapter 14: Graphs and Visualizations

Chapter 15: Grouping Data

Chapter 16: Grouping Time Series Data

Chapter 17: Holiday Calendars

Chapter 18: Indexing and selecting data

Chapter 19: IO for Google BigQuery

Chapter 20: JSON

Chapter 21: Making Pandas Play Nice With Native Python Datatypes

Chapter 22: Map Values

Chapter 23: Merge, join, and concatenate

Chapter 24: Meta: Documentation Guidelines

Chapter 25: Missing Data

Chapter 26: MultiIndex

Chapter 27: Pandas Datareader

Chapter 28: Pandas IO tools (reading and saving data sets)

Chapter 29: pd.DataFrame.apply

Chapter 30: Read MySQL to DataFrame

Chapter 31: Read SQL Server to Dataframe

Chapter 32: Reading files into pandas DataFrame

Chapter 33: Resampling

Chapter 34: Reshaping and pivoting

Chapter 35: Save pandas dataframe to a csv file

Chapter 36: Series

Chapter 37: Shifting and Lagging Data

Chapter 38: Simple manipulation of DataFrames

Chapter 39: String manipulation

Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame

Chapter 41: Working with Time Series
o
dataset: Create interoperable and well-documented data frames
explore.openaire.eu
Updated Jun 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Antal (2022). dataset: Create interoperable and well-documented data frames [Dataset]. http://doi.org/10.5281/zenodo.6854273
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6854273
Dataset updated
Jun 23, 2022
Authors
Daniel Antal
Description
See the package documentation website on dataset.dataobservatory.eu. Report bugs and suggestions on Github: https://github.com/dataobservatory-eu/dataset/issues The primary aim of dataset is to build well-documented data.frames, tibbles or data.tables that follow the W3C Data Cube Vocabulary based on the statistical SDMX data cube model. Such standard R objects (data.fame, data.table, tibble, or well-structured lists like json) become highly interoperable and can be placed into relational databases, semantic web applications, archives, repositories. They follow the FAIR principles: they are findable, accessible, interoperable and reusable. Our datasets: Contain Dublin Core or DataCite (or both) metadata that makes the findable and easier accessible via online libraries. See vignette article Datasets With FAIR Metadata. Their dimensions can be easily and unambigously reduced to triples for RDF applications; they can be easily serialized to, or synchronized with semantic web applications. See vignette article From dataset To RDF. Contain processing metadata that greatly enhance the reproducibility of the results, and the reviewability of the contents of the dataset, including metadata defined by the DDI Alliance, which is particularly helpful for not yet processed data; Follow the datacube model of the Statistical Data and Metadata eXchange, therefore allowing easy refreshing with new data from the source of the analytical work, and particularly useful for datasets containing results of statistical operations in R; Correct exporting with FAIR metadata to the most used file formats and straighforward publication to open science repositories with correct bibliographical and use metadata. See Export And Publish a dataset. Relatively lightweight in dependencies and easily works with data.frame, tibble or data.table R objects.
R
Dataframe Detection Dataset
universe.roboflow.com
zip
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
detection (2024). Dataframe Detection Dataset [Dataset]. https://universe.roboflow.com/detection-fvah2/dataframe-detection
Explore at:
zipAvailable download formats
Dataset updated
Apr 6, 2024
Dataset authored and provided by
detection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Student Responses On Exams Bounding Boxes
Description
Dataframe Detection

## Overview Dataframe Detection is a dataset for object detection tasks - it contains Student Responses On Exams annotations for 1,052 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Z
MAT-Builder datasets
data.niaid.nih.gov
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chiara Renso (2023). MAT-Builder datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7839805
Explore at:
Dataset updated
Apr 19, 2023
Dataset provided by
Chiara Pugliese
Fabio Pinelli
Chiara Renso
Francesco Lettich
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The archive contains two datasets that have been used to empirically evaluate MAT-Builder, a system to generate multiple aspect trajectories.

The first one is located in the "rome" folder and contains 26395 trajectories from 3181 individuals. The trajectories move over the city of Rome and were collected from OpenStreetMap. The folder contains also auxiliary datasets, i.e., the set of POIs within the province of Rome's boundaries (downloaded from OpenStreetMap) (see the "poi" subfolder), historical weather information (downloaded from Meteostat https://meteostat.net/it/) (see the "weather" subfolder), and a dataset of social media posts from the individuals which was generated synthetically (see the "tweets" subfolder). All the datasets are pandas dataframes, except for the POI dataset which is a geopandas DataFrame. All the datasets have been stored according to the parquet format.

The second one is located in the "geolife" folder, and contains the GeoLife dataset. The dataset contains 17621 trajectories from 178 users. The timestamps of the trajectory samples have been adjusted from the GMT to the GMT+8 timezone. As in the former dataset's case, this folder contains also a dataset of POIs, a dataset of historical weather information, and a dataset of social media posts that were generated synthetically.

For more information on the MAT-Builder project (i.e., published papers, how to use to datasets, how the information within the datasets is structured, and so on) we refer to the MAT-Builder's GitHub page: https://github.com/chiarap2/MAT_Builder.
dataframe-with-removed-features
kaggle.com
Updated Nov 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anton Kostin (2022). dataframe-with-removed-features [Dataset]. https://www.kaggle.com/datasets/visualcomments/dataframe-with-removed-features/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 13, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anton Kostin
Description
Dataset

This dataset was created by Anton Kostin

Contents
Datasets of the CIKM resource paper "A Semantically Enriched Mobility...
zenodo.org
zip
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Lettich; Francesco Lettich; Chiara Pugliese; Chiara Pugliese; Guido Rocchietti; Guido Rocchietti; Chiara Renso; Chiara Renso; Fabio PINELLI; Fabio PINELLI (2025). Datasets of the CIKM resource paper "A Semantically Enriched Mobility Dataset with Contextual and Social Dimensions" [Dataset]. http://doi.org/10.5281/zenodo.15658129
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15658129
Dataset updated
Jun 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francesco Lettich; Francesco Lettich; Chiara Pugliese; Chiara Pugliese; Guido Rocchietti; Guido Rocchietti; Chiara Renso; Chiara Renso; Fabio PINELLI; Fabio PINELLI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the two semantically enriched trajectory datasets introduced in the CIKM Resource Paper "A Semantically Enriched Mobility Dataset with Contextual and Social Dimensions", by Chiara Pugliese (CNR-IIT), Francesco Lettich (CNR-ISTI), Guido Rocchietti (CNR-ISTI), Chiara Renso (CNR-ISTI), and Fabio Pinelli (IMT Lucca, CNR-ISTI).

The two datasets were generated with an open source pipeline based on the Jupyter notebooks published in the GitHub repository behind our resource paper, and our MAT-Builder system. Overall, our pipeline first generates the files that we provide in the [paris|nyc]_input_matbuilder.zip archives; the files are then passed as input to the MAT-Builder system, which ultimately generates the two semantically enriched trajectory datasets for Paris and New York City, both in tabular and RDF formats. For more details on the input and output data, please see the sections below.

Input data

The [paris|nyc]_input_matbuilder.zip archives contain the data sources we used with the MAT-Builder system to semantically enrich raw preprocessed trajectories. More specifically, the archives contain the following files:

raw_trajectories_[paris|nyc]_matbuilder.parquet: these are the datasets of raw preprocessed trajectories, ready for ingestion by the MAT-Builder system, as outputted by the notebook 5 - Ensure MAT-Builder compatibility.ipynb in our GitHub repository, saved in Parquet format. Each row in the dataframe represents the sample of some trajectory, and the dataframe has the following columns:

traj_id: trajectory identifier;

user: user identifier;

lat: latitude of a trajectory sample;

lon: longitude of a trajectory sample;

time: timestamp of a sample;

pois.parqet: these are the POI datasets, ready for ingestion by the MAT-Builder system. outputted by the notebook 6 - Generate dataset POI from OpenStreetMap.ipynb in our GitHub repository, saved in Parquet format. Each row in the dataframe represents a POI, and the dataframe has the following columns:

osmid: POI OSM identifier

element_type: POI OSM element type

name: POI native name;

name:en: POI English name;

wikidata: POI WikiData identifier;

geometry: geometry associated with the POI;

category: POI category.

social_[paris|ny].parquet: these are the social media post datasets, ready for ingestion by the MAT-Builder system, outputted by the notebook 9 - Prepare social media dataset for MAT-Builder.ipynb in our GitHub repository, saved in Parquet format. Each row in the dataframe represents a single social media post, and the dataframe has the following columns:

tweet_ID: identifier of the post;

text: post's text;

tweet_created: post's timestamp;

uid: identifier of the user who posted.

weather_conditions.parquet: these are the weather conditions datasets, ready for ingestion by the MAT-Builder system, outputted by the notebook 7 - Meteostat daily data downloader.ipynb our GitHub repository, saved in Parquet format. Each row in the dataframe represents the weather conditions recorder in a single day, and the dataframe has the following columns:

DATE: date in which the weather observation was recorded;

TAVG_C: average temperature in celsius;

DESCRIPTION: weather conditions.

Output data: the semantically enriched Paris and New York City datasets

Tabular Representation

The [paris|nyc]_output_tabular.zip zip archives contain the output files generated by MAT-Builder that express the semantically enriched Paris and New York City datasets in tabular format. More specifically, they contain the following files:

traj_cleaned.parquet: parquet file storing the dataframe containing the raw preprocessed trajectories after applying the MAT-Builder's preprocessing step on raw_trajectories_[paris|nyc]_matbuilder.parquet. The dataframe contains the same columns found in raw_trajectories_[paris|nyc]_matbuilder.parquet, except for time which in this dataframe has been renamed to datetime. The operations performed in the MAT-Builder's preprocessing step were:

(1) we filtered out trajectories having less than 2 samples;

(2) we filtered noisy samples inducing velocities above 300km/h:

(3) finally, we compressed the trajectories such that all points within a radius of 20 meters from a given initial point are compressed into a single point that has the median coordinates of all points and the time of the initial point.

stops.parquet: parquet file storing the dataframe containing the stop segments detected from the trajectories by the MAT-Builder's segmentation step. Each row in the dataframe represents a specific stop segment from some trajectory. The columns are:

datetime, which indicates when a stop segments starts;

leaving_datetime, which indicates when a stop segment ends;

uid, the trajectory user's identifier;

tid, the trajectory's identifier;

lat, the stop segment's centroid latitude;

lng, the stop segment's centroid longitude.
NOTE: to uniquely identify a stop segment, you need the triple (stop segment's index in the dataframe, uid, tid).

moves.parquet: parquet file storing the dataframe containing the samples associated with the move segments detected from the trajectories by the MAT-Builder's segmentation step. Each row in the dataframe represents a specific sample beloning to some move segment of some trajectory. The columns are:

datetime, which indicates when a sample's timestamp;

uid, the samples' trajectory user's identifier;

tid, the sample's trajectory's identifier;

lat, the sample's latitude;

lng, the sample's longitude;

move_id, the identifier of a move segment.
NOTE: to uniquely identify a move segment, you need the triple (uid, tid, move_id).

enriched_occasional.parquet: parquet file storing the dataframe containing pairs representing associations between stop segments that have been deemed occasional and POIs found to be close to their centroids. As such, in this dataframe an occasional stop can appear multiple times, i.e., when the are multiple POIs located nearby a stop's centroid. The columns found in this dataframe are the same from stops.parquet, plus two sets of columns.

The first set of columns concerns a stop's charachteristics:

stop_id, which represents the unique identifier of a stop segment and corresponds to the index of said stop in stops.parquet;

geometry_stop, which is a Shapely Point representing a stop's centroid;

geometry, which is the aforementioned Shapely Point plus a 50 meters buffer around it.

There is then a second set of columns which represents the characteristics of the POI that has been associated with a stop. The relevant ones are:

index_poi, which is the index of the associated POI in the pois.parqet file;

osmid, which is the identifier given by OpenStreetMap to the POI;

name, the POI's name;

wikidata, the POI identifier on wikidata;

category, the POI's category;

geometry_poi, a Shapely (multi)polygon describing the POI's geometry;

distance, the distance between the stop segment's centroid and the POI.

enriched_systematic.parquet: parquet file storing the dataframe containing pairs representing associations between stop segments that have been deemed systematic and POIs found to be close to their centroids. This dataframe has exactly the same characteristics of enriched_occasional.parquet, plus the following columns:

systematic_id, the identifier of the cluster of systematic stops a systematic stop belongs to;

frequency, the number of systematic stops within a systematic stop's cluster;

home, the probability that the systematic stop's cluster represents the home of the associated user;

work, the probability that the systematic stop's cluster represents the workplace of the associated user;

other,
Data of The Balance-Scale task (paper-and-pencil and Math Garden)
figshare.com
application/gzip
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abe Hofman; Han van der Maas (2016). Data of The Balance-Scale task (paper-and-pencil and Math Garden) [Dataset]. http://doi.org/10.6084/m9.figshare.1309897.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1309897.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Abe Hofman; Han van der Maas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Paper: The Balance-Scale Task Revisited: A Comparison of Statistical Models for Rule-Based and Information-Integration Theories of Proportional Reasoning Abe Hofman, Ingmar Visser, Brenda Jansen & Han van der Maas; 15-2-2015 ————————————————————————The “dataBS.Rdata” file include four dataframes based on two different datasets: A paper-and-pencil dataset collected by Jansen & van der Maas (1997), and a online dataset collected with the Math Garden. Description of the for dataframes: 1) student_info_pp: Student information of paper-and-pencil dataset - id = student id - age = student age 2) student_info_mg: Student information of Math Garden dataset - id = student id - age = student age - new = student has not played the task before data collection started - practise = number of items made by students before the data collection started 3) responses_pp: Response information of paper-and-pencil dataset in long format 4) responses_mg: Response information of Math Garden dataset in long format - id = student id - it = item id - item_type = item type as defined in paper - product_difference = difference between the product of weights and distance on each side of the fulcrum - weight_difference = difference between the weights on each side of the fulcrum - distance_difference = difference between the distance of the weights on each side of the fulcrum - resp = response; left, balance, right - cor = 0 incorrect; 1 correct
Z
Flow map data of the singel pendulum, double pendulum and 3-body problem
data.niaid.nih.gov
zenodo.org
Updated Apr 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Horn, Philipp (2024). Flow map data of the singel pendulum, double pendulum and 3-body problem [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11032351
Explore at:
Dataset updated
Apr 23, 2024
Dataset provided by
Simon, Portegies Zwart
Koren, Barry
Horn, Philipp
Veronica, Saz Ulibarrena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was constructed to compare the performance of various neural network architectures learning the flow maps of Hamiltonian systems. It was created for the paper: A Generalized Framework of Neural Networks for Hamiltonian Systems.

The dataset consists of trajectory data from three different Hamiltonian systems. Namely, the single pendulum, double pendulum and 3-body problem. The data was generated using numerical integrators. For the single pendulum, the symplectic Euler method with a step size of 0.01 was used. The data of the double pendulum was also computed by the symplectic Euler method, however, with an adaptive step size. The trajectories of the 3-body problem were calculated by the arbitrarily high-precision code Brutus.

For each Hamiltonian system, there is one file containing the entire trajectory information (*_all_runs.h5.1). In these files, the states along all trajectories are recorded with a step size of 0.01. These files are composed of several Pandas DataFrames. One DataFrame per trajectory, called "run0", "run1", ... and finally one large DataFrame in which all the trajectories are combined, called "all_runs". Additionally, one Pandas Series called "constants" is contained in these files, in which several parameters of the data are listed.

Also, there is a second file per Hamiltonian system in which the data is prepared as features and labels ready for neural networks to be trained (*_training.h5.1). Similar to the first type of files, they contain a Series called "constants". The features and labels are then separated into 6 DataFrames called "features", "labels", "val_features", "val_labels", "test_features" and "test_labels". The data is split into 80% training data, 10% validation data and 10% test data.

The code used to train various neural network architectures on this data can be found on GitHub at: https://github.com/AELITTEN/GHNN.

Already trained neural networks can be found on GitHub at: https://github.com/AELITTEN/NeuralNets_GHNN.

Single pendulum Double pendulum 3-body problem

Number of trajectories 500 2000 5000

final time in all_runs T (one period of the pendulum) 10 10

final time in training data 0.25*T 5 5

step size in training data 0.1 0.1 0.5
n
CODEX multiplexed imaging cell datasets used for using STELLAR to transfer...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jul 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Hickey (2022). CODEX multiplexed imaging cell datasets used for using STELLAR to transfer cell type annotations to other tissues and donors [Dataset]. http://doi.org/10.5061/dryad.g4f4qrfrc
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.g4f4qrfrc
Dataset updated
Jul 19, 2022
Dataset provided by
Stanford University
Authors
John Hickey
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
We performed CODEX (co-detection by indexing) multiplexed imaging on 24 sections of the human intestine from 3 donors (B004, B005, B006) using a panel of 47 oligonucleotide-barcoded antibodies. We also performed CODEX imaging on both human tonsil and Barrett's esophagus (BE) using a panel of 57 oligonucleotide-barcoded antibodies. Subsequently images underwent standard CODEX image processing (tile stitching, drift compensation, cycle concatenation, background subtraction, deconvolution, and determination of best focal plane), single cell segmentation, and column marker z-normalization by tissue. Output of this process were dataframes of 870,000 cells and 220,000 cells respectively with fluorescence values quantified from each marker. Methods See README file.
Shopping Mall
kaggle.com
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Pachauri (2023). Shopping Mall [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/shopping-mall
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anshul Pachauri
Description
Libraries Import:

Importing necessary libraries such as pandas, seaborn, matplotlib, scikit-learn's KMeans, and warnings. Data Loading and Exploration:

Reading a dataset named "Mall_Customers.csv" into a pandas DataFrame (df). Displaying the first few rows of the dataset using df.head(). Conducting univariate analysis by calculating descriptive statistics with df.describe(). Univariate Analysis:

Visualizing the distribution of the 'Annual Income (k$)' column using sns.distplot. Looping through selected columns ('Age', 'Annual Income (k$)', 'Spending Score (1-100)') and plotting individual distribution plots. Bivariate Analysis:

Creating a scatter plot for 'Annual Income (k$)' vs 'Spending Score (1-100)' using sns.scatterplot. Generating a pair plot for selected columns with gender differentiation using sns.pairplot. Gender-Based Analysis:

Grouping the data by 'Gender' and calculating the mean for selected columns. Computing the correlation matrix for the grouped data and visualizing it using a heatmap. Univariate Clustering:

Applying KMeans clustering with 3 clusters based on 'Annual Income (k$)' and adding the 'Income Cluster' column to the DataFrame. Plotting the elbow method to determine the optimal number of clusters. Bivariate Clustering:

Applying KMeans clustering with 5 clusters based on 'Annual Income (k$)' and 'Spending Score (1-100)' and adding the 'Spending and Income Cluster' column. Plotting the elbow method for bivariate clustering and visualizing the cluster centers on a scatter plot. Displaying a normalized cross-tabulation between 'Spending and Income Cluster' and 'Gender'. Multivariate Clustering:

Performing multivariate clustering by creating dummy variables, scaling selected columns, and applying KMeans clustering. Plotting the elbow method for multivariate clustering. Result Saving:

Saving the modified DataFrame with cluster information to a CSV file named "Result.csv". Saving the multivariate clustering plot as an image file ("Multivariate_figure.png").
Raw data from datasets used in SIMON analysis
zenodo.org
bin
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adriana Tomic; Adriana Tomic; Ivan Tomic; Ivan Tomic (2020). Raw data from datasets used in SIMON analysis [Dataset]. http://doi.org/10.5281/zenodo.2580414
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2580414
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Adriana Tomic; Adriana Tomic; Ivan Tomic; Ivan Tomic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here you can find raw data and information about each of the 34 datasets generated by the mulset algorithm and used for further analysis in SIMON.
Each dataset is stored in separate folder which contains 4 files:

json_info: This file contains, number of features with their names and number of subjects that are available for the same dataset
data_testing: data frame with data used to test trained model
data_training: data frame with data used to train models
results: direct unfiltered data from database

Files are written in feather format. Here is an example of data structure for each file in repository.

File was compressed using 7-Zip available at https://www.7-zip.org/.
PandasPlotBench
huggingface.co
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PandasPlotBench [Dataset]. https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Dataset provided by
JetBrainshttp://jetbrains.com/
Authors
JetBrains Research
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
PandasPlotBench

PandasPlotBench is a benchmark to assess the capability of models in writing the code for visualizations given the description of the Pandas DataFrame. 🛠️ Task. Given the plotting task and the description of a Pandas DataFrame, write the code to build a plot. The dataset is based on the MatPlotLib gallery. The paper can be found in arXiv: https://arxiv.org/abs/2412.02764v1. To score your model on this dataset, you can use the our GitHub repository. 📩 If you have… See the full description on the dataset page: https://huggingface.co/datasets/JetBrains-Research/PandasPlotBench.
dfencoder - AutoEncoders for DataFrames
kaggle.com
zip
Updated Oct 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KingOfDayDream (2020). dfencoder - AutoEncoders for DataFrames [Dataset]. https://www.kaggle.com/kingofdaydream/dfencoder-autoencoders-for-dataframes
Explore at:
zip(755488 bytes)Available download formats
Dataset updated
Oct 30, 2020
Authors
KingOfDayDream
Description
Dataset

This dataset was created by KingOfDayDream

Contents

It contains the following files:
Crop classification dataset for testing domain adaptation or distributional...
zenodo.org
explore.openaire.eu
+1more
bin, csv
Updated May 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan M. Kluger; Dan M. Kluger; Sherrie Wang; Sherrie Wang; David B. Lobell; David B. Lobell (2022). Crop classification dataset for testing domain adaptation or distributional shift methods [Dataset]. http://doi.org/10.5281/zenodo.6376160
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6376160
Dataset updated
May 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Dan M. Kluger; Dan M. Kluger; Sherrie Wang; Sherrie Wang; David B. Lobell; David B. Lobell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this upload we share processed crop type datasets from both France and Kenya. These datasets can be helpful for testing and comparing various domain adaptation methods. The datasets are processed, used, and described in this paper: https://doi.org/10.1016/j.rse.2021.112488 (arXiv version: https://arxiv.org/pdf/2109.01246.pdf).

In summary, each point in the uploaded datasets corresponds to a particular location. The label is the crop type grown at that location in 2017. The 70 processed features are based on Sentinel-2 satellite measurements at that location in 2017. The points in the France dataset come from 11 different departments (regions) in Occitanie, France, and the points in the Kenya dataset come from 3 different regions in Western Province, Kenya. Within each dataset there are notable shifts in the distribution of the labels and in the distribution of the features between regions. Therefore, these datasets can be helpful for testing for testing and comparing methods that are designed to address such distributional shifts.

More details on the dataset and processing steps can be found in Kluger et. al. (2021). Much of the processing steps were taken to deal with Sentinel-2 measurements that were corrupted by cloud cover. For users interested in the raw multi-spectral time series data and dealing with cloud cover issues on their own (rather than using the 70 processed features provided here), the raw dataset from Kenya can be found in Yeh et. al. (2021), and the raw dataset from France can be made available upon request from the authors of this Zenodo upload.

All of the data uploaded here can be found in "CropTypeDatasetProcessed.RData". We also post the dataframes and tables within that .RData file as separate .csv files for users who do not have R. The contents of each R object (or .csv file) is described in the file "Metadata.rtf".

Preferred Citation:

-Kluger, D.M., Wang, S., Lobell, D.B., 2021. Two shifts for crop mapping: Leveraging aggregate crop statistics to improve satellite-based maps in new regions. Remote Sens. Environ. 262, 112488. https://doi.org/10.1016/j.rse.2021.112488.

-URL to this Zenodo post https://zenodo.org/record/6376160
b
pandas DataFrames of the DYToMuMu_M-20_CT10_TuneZ2star_v2_8TeV process
bonndata.uni-bonn.de
bin, text/x-python +1
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timo Saala; Timo Saala (2025). pandas DataFrames of the DYToMuMu_M-20_CT10_TuneZ2star_v2_8TeV process [Dataset]. http://doi.org/10.60507/FK2/1MTTRE
Explore at:
bin(630694234), bin(595883050), text/x-python(2553), bin(642092194), txt(7203), bin(525465770), bin(637589794), bin(637555602), bin(515541514), bin(624730562), bin(635941242), bin(632160114)Available download formats
Unique identifier
https://doi.org/10.60507/FK2/1MTTRE
Dataset updated
Jan 21, 2025
Dataset provided by
bonndata
Authors
Timo Saala; Timo Saala
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains pandas DataFrames that represent filtered versions of CMS Open Data (in the form of ROOT files) available on the CERN OpenData Portal. This dataset specifically contains data from a DYToMuMu process (Drell-Yan process resulting in two Muons in the final state), which is a simulated process created during the 2012 LHC run. A total of 121 (99 for real collision data) relevant variables are contained in the filtered pandas DataFrames that can be found here. A list of variables can be found below, for a full explanation of them, please refer to the following paper (PLACEHOLDER, REFERENCE PAPER HERE): nEvent, runNum, lumisection, evtNum; nMuon, vecMuon_PT, vecMuon_Eta, vecMuon_Phi, vecMuon_PTErr, vecMuon_Q, vecMuon_StaPt, vecMuon_StaEta, vecMuon_StaPhi, vecMuon_TrkIso03, vecMuon_EcalIso03, vecMuon_HcalIso03; nVertex, vecVertex_nTracksfit, vecVertex_ndof, vecVertex_Chi2, vecVertex_X, vecVertex_Y, vecVertex_Z; nEle, vecEle_PT, vecEle_Eta, vecEle_Phi, vecEle_Q, vecEle_TrkIso03, vecEle_EcalIso03, vecEle_HcalIso03, vecEle_D0, vecEle_Dz; nTau, vecTau_PT, vecTau_Eta, vecTau_Phi, vecTau_Q, vecTau_RawIso3Hits, vecTau_RawIsoMVA3oldDMwoLT, vecTau_RawIsoMVA3oldDMwLT, vecTau_RawIsoMVA3newDMwoLT, vecTau_RawIsoMVA3newDMwLT; nPhoton, vecPhoton_PT, vecPhoton_Eta, vecPhoton_Phi, vecPhoton_Hovere, vecPhoton_Sthovere, vecPhoton_HasPixelSeed, vecPhoton_IsConv, vecPhoton_PassElectronVeto; nMctruth, vecMctruth_PT, vecMctruth_Eta, vecMctruth_Phi, vecMctruth_Id_1, vecMctruth_Id_2, vecMctruth_X_1, vecMctruth_X_2, vecMctruth_PdgId, vecMctruth_Status, vecMctruth_Y, vecMctruth_Mass, vecMctruth_Mothers.first, vecMctruth_Mothers.second; nJets, vecJet_PT, vecJet_Eta, vecJet_Phi, vecJet_D0, vecJet_Dz, vecJet_nCharged, vecJet_nNeutrals, vecJet_nParticles, vecJet_Beta, vecJet_BetaStar, vecJet_dR2Mean, vecJet_Q, vecJet_Mass, vecJet_Area, vecJet_Energy, vecJet_chEmEnergy, vecJet_neuEmEnergy, vecJet_chHadEnergy, vecJet_neuHadEnergy, vecJet_ID, vecJet_Num, vecJet_mcFlavor, vecJet_GenPT, vecJet_GenEta, vecJet_GenPhi, vecJet_GenMass, vecJet_flavorMatchPT, vecJet_JEC, vecJet_MatchIdx; nPF, vecPF_PT, vecPF_Eta, vecPF_Phi, vecPF_Mass, vecPF_E, vecPF_Q, vecPF_PfType, vecPF_EcalE, vecPF_HcalE, vecPF_ndof, vecPF_Chi2, vecPF_pvId, vecPF_X, vecPF_Y, vecPF_Z, vecPF_JetNum; fMET_PT, fMET_Eta, fMET_Phi; HLT_Mu17_Mu8, HLT_Mu24, HLT_MET120_v, HLT_Ele27, HLT_HT350. For the datasets containing data from real collisions at the LHC, the following variables are NOT contained: nMctruth, vecMctruth_PT, vecMctruth_Eta, vecMctruth_Phi, vecMctruth_Id_1, vecMctruth_Id_2, vecMctruth_X_1, vecMctruth_X_2, vecMctruth_PdgId, vecMctruth_Status, vecMctruth_Y, vecMctruth_Mass, vecMctruth_Mothers.first, vecMctruth_Mothers.second; vecJet_mcFlavor, vecJet_GenPT, vecJet_GenEta, vecJet_GenPhi, vecJet_GenMass, vecJet_flavorMatchPT, vecJet_JEC, vecJet_MatchIdx
E
A Replication Dataset for Fundamental Frequency Estimation
live.european-language-grid.eu
data.niaid.nih.gov
+1more
json
Updated Oct 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). A Replication Dataset for Fundamental Frequency Estimation [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7808
Explore at:
jsonAvailable download formats
Dataset updated
Oct 19, 2023
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Part of the dissertation Pitch of Voiced Speech in the Short-Time Fourier Transform: Algorithms, Ground Truths, and Evaluation Methods.© 2020, Bastian Bechtold. All rights reserved. Estimating the fundamental frequency of speech remains an active area of research, with varied applications in speech recognition, speaker identification, and speech compression. A vast number of algorithms for estimatimating this quantity have been proposed over the years, and a number of speech and noise corpora have been developed for evaluating their performance. The present dataset contains estimated fundamental frequency tracks of 25 algorithms, six speech corpora, two noise corpora, at nine signal-to-noise ratios between -20 and 20 dB SNR, as well as an additional evaluation of synthetic harmonic tone complexes in white noise.The dataset also contains pre-calculated performance measures both novel and traditional, in reference to each speech corpus’ ground truth, the algorithms’ own clean-speech estimate, and our own consensus truth. It can thus serve as the basis for a comparison study, or to replicate existing studies from a larger dataset, or as a reference for developing new fundamental frequency estimation algorithms. All source code and data is available to download, and entirely reproducible, albeit requiring about one year of processor-time.Included Code and Data
ground truth data.zip is a JBOF dataset of fundamental frequency estimates and ground truths of all speech files in the following corpora:
CMU-ARCTIC (consensus truth) [1]FDA (corpus truth and consensus truth) [2]KEELE (corpus truth and consensus truth) [3]MOCHA-TIMIT (consensus truth) [4]PTDB-TUG (corpus truth and consensus truth) [5]TIMIT (consensus truth) [6]
noisy speech data.zip is a JBOF datasets of fundamental frequency estimates of speech files mixed with noise from the following corpora:NOISEX [7]QUT-NOISE [8]
synthetic speech data.zip is a JBOF dataset of fundamental frequency estimates of synthetic harmonic tone complexes in white noise.noisy_speech.pkl and synthetic_speech.pkl are pickled Pandas dataframes of performance metrics derived from the above data for the following list of fundamental frequency estimation algorithms:AUTOC [9]AMDF [10]BANA [11]CEP [12]CREPE [13]DIO [14]DNN [15]KALDI [16]MAPSMBSC [17]NLS [18]PEFAC [19]PRAAT [20]RAPT [21]SACC [22]SAFE [23]SHR [24]SIFT [25]SRH [26]STRAIGHT [27]SWIPE [28]YAAPT [29]YIN [30]
noisy speech evaluation.py and synthetic speech evaluation.py are Python programs to calculate the above Pandas dataframes from the above JBOF datasets. They calculate the following performance measures:Gross Pitch Error (GPE), the percentage of pitches where the estimated pitch deviates from the true pitch by more than 20%.Fine Pitch Error (FPE), the mean error of grossly correct estimates.High/Low Octave Pitch Error (OPE), the percentage pitches that are GPEs and happens to be at an integer multiple of the true pitch.Gross Remaining Error (GRE), the percentage of pitches that are GPEs but not OPEs.Fine Remaining Bias (FRB), the median error of GREs.True Positive Rate (TPR), the percentage of true positive voicing estimates.False Positive Rate (FPR), the percentage of false positive voicing estimates.False Negative Rate (FNR), the percentage of false negative voicing estimates.F₁, the harmonic mean of precision and recall of the voicing decision.
Pipfile is a pipenv-compatible pipfile for installing all prerequisites necessary for running the above Python programs.
The Python programs take about an hour to compute on a fast 2019 computer, and require at least 32 Gb of memory.References:
John Kominek and Alan W Black. CMU ARCTIC database for speech synthesis, 2003.Paul C Bagshaw, Steven Hiller, and Mervyn A Jack. Enhanced Pitch Tracking and the Processing of F0 Contours for Computer Aided Intonation Teaching. In EUROSPEECH, 1993.F Plante, Georg F Meyer, and William A Ainsworth. A Pitch Extraction Reference Database. In Fourth European Conference on Speech Communication and Technology, pages 837–840, Madrid, Spain, 1995.Alan Wrench. MOCHA MultiCHannel Articulatory database: English, November 1999.Gregor Pirker, Michael Wohlmayr, Stefan Petrik, and Franz Pernkopf. A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario. page 4, 2011.John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus, 1993.Andrew Varga and Herman J.M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recog- nition systems. Speech Communication, 12(3):247–251, July 1993.David B. Dean, Sridha Sridharan, Robert J. Vogt, and Michael W. Mason. The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms. Proceedings of Interspeech 2010, 2010.Man Mohan Sondhi. New methods of pitch extraction. Audio and Electroacoustics, IEEE Transactions on, 16(2):262—266, 1968.Myron J. Ross, Harry L. Shaffer, Asaf Cohen, Richard Freudberg, and Harold J. Manley. Average magnitude difference function pitch extractor. Acoustics, Speech and Signal Processing, IEEE Transactions on, 22(5):353—362, 1974.Na Yang, He Ba, Weiyang Cai, Ilker Demirkol, and Wendi Heinzelman. BaNa: A Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):1833–1848, December 2014.Michael Noll. Cepstrum Pitch Determination. The Journal of the Acoustical Society of America, 41(2):293–309, 1967.Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello. CREPE: A Convolutional Representation for Pitch Estimation. arXiv:1802.06182 [cs, eess, stat], February 2018. arXiv: 1802.06182.Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications. IEICE Transactions on Information and Systems, E99.D(7):1877–1884, 2016.Kun Han and DeLiang Wang. Neural Network Based Pitch Tracking in Very Noisy Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12):2158–2168, Decem- ber 2014.Pegah Ghahremani, Bagher BabaAli, Daniel Povey, Korbinian Riedhammer, Jan Trmal, and Sanjeev Khudanpur. A pitch extraction algorithm tuned for automatic speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 2494–2498. IEEE, 2014.Lee Ngee Tan and Abeer Alwan. Multi-band summary correlogram-based pitch detection for noisy speech. Speech Communication, 55(7-8):841–856, September 2013.Jesper Kjær Nielsen, Tobias Lindstrøm Jensen, Jesper Rindom Jensen, Mads Græsbøll Christensen, and Søren Holdt Jensen. Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient. Signal Processing, 135:188–197, June 2017.Sira Gonzalez and Mike Brookes. PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(2):518—530, February 2014.Paul Boersma. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of the institute of phonetic sciences, volume 17, page 97—110. Amsterdam, 1993.David Talkin. A robust algorithm for pitch tracking (RAPT). Speech coding and synthesis, 495:518, 1995.Byung Suk Lee and Daniel PW Ellis. Noise robust pitch tracking by subband autocorrelation classification. In Interspeech, pages 707–710, 2012.Wei Chu and Abeer Alwan. SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech. In INTERSPEECH, pages 2590–2593, 2010.Xuejing Sun. Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 1, page I—333. IEEE, 2002.Markel. The SIFT algorithm for fundamental frequency estimation. IEEE Transactions on Audio and Electroacoustics, 20(5):367—377, December 1972.Thomas Drugman and Abeer Alwan. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. In Interspeech, page 1973—1976, 2011.Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Toshio Irino, and Hideki Banno. TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In Acous- tics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 3933–3936. IEEE, 2008.Arturo Camacho. SWIPE: A sawtooth waveform inspired pitch estimator for speech and music. PhD thesis, University of Florida, 2007.Kavita Kasi and Stephen A. Zahorian. Yet Another Algorithm for Pitch Tracking. In IEEE International Conference on Acoustics Speech and Signal Processing, pages I–361–I–364, Orlando, FL, USA, May 2002. IEEE.Alain de Cheveigné and Hideki Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917, 2002.
h
flores200-eng-bem
huggingface.co
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kreasof AI (2025). flores200-eng-bem [Dataset]. https://huggingface.co/datasets/kreasof-ai/flores200-eng-bem
Explore at:
Dataset updated
May 4, 2025
Dataset authored and provided by
Kreasof AI
Description
Dataset Details

This is Bemba-to-English dataset for machine translation task. This dataset is a customized version of the from FLORES-200. It includes parallel sentences between Bemba and English.

Preprocessing Notes

Drop some unused columns like URL, domain, topic, has_image, has_hyperlink. Merge the Bemba and English DataFrames on the ID column. Rename columns name from sentence_bem into text_bem and sentence_en into text_en. Convert dataframe into DatasetDict.… See the full description on the dataset page: https://huggingface.co/datasets/kreasof-ai/flores200-eng-bem.
Fracture toughness of mixed-mode anticracks in highly porous materials...
zenodo.org
bin, text/x-python +1
Updated Sep 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valentin Adam; Valentin Adam; Bastian Bergfeld; Bastian Bergfeld; Philipp Weißgraeber; Philipp Weißgraeber; Alec van Herwijnen; Alec van Herwijnen; Philipp L. Rosendahl; Philipp L. Rosendahl (2024). Fracture toughness of mixed-mode anticracks in highly porous materials dataset and data processing [Dataset]. http://doi.org/10.5281/zenodo.11443644
Explore at:
text/x-python, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11443644
Dataset updated
Sep 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Valentin Adam; Valentin Adam; Bastian Bergfeld; Bastian Bergfeld; Philipp Weißgraeber; Philipp Weißgraeber; Alec van Herwijnen; Alec van Herwijnen; Philipp L. Rosendahl; Philipp L. Rosendahl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the code and datasets used in the data analysis for "Fracture toughness of mixed-mode anticracks in highly porous materials". The analysis is implemented in Python, using Jupyter Notebooks.

Contents

main.ipynb: Jupyter notebook with the main data analysis workflow.

energy.py: Methods for the calculation of energy release rates.

regression.py: Methods for the regression analyses.

visualization.py: Methods for generating visualizations.

df_mmft.pkl: Pickled DataFrame with experimental data gathered in the present work.

df_legacy.pkl: Pickled DataFrame with literature data.

Prerequisites

To run the scripts and notebooks, you need:

Python 3.12 or higher

Jupyter Notebook or JupyterLab

Libraries: pandas, matplotlib, numpy, scipy, tqdm, uncertainties, weac

Setup

Download the zip file or clone this repository to your local machine.

Ensure that Python and Jupyter are installed.

Install required Python libraries using pip install -r requirements.txt.

Running the Analysis

Open the main.ipynb notebook in Jupyter Notebook or JupyterLab.

Execute the cells in sequence to reproduce the analysis.

Data Description

The data included in this repository is encapsulated in two pickled DataFrame files, df_mmft.pkl and df_legacy.pkl, which contain experimental measurements and corresponding parameters. Below are the descriptions for each column in these DataFrames:

df_mmft.pkl

Includes data such as experiment identifiers, datetime, and physical measurements like slope inclination and critical cut lengths.

exp_id: Unique identifier for each experiment.

datestring: Date of the experiment as a string.

datetime: Timestamp of the experiment.

bunker: Field site of the experiment. Bunker IDs 1 and 2 correspond to field sites A and B, respectively.

slope_incl: Inclination of the slope in degrees.

h_sledge_top: Distance from sample top surface to the sled in mm.

h_wl_top: Distance from sample top surface to weak layer in mm.

h_wl_notch: Distance from the notch root to the weak layer in mm.

rc_right: Critical cut length in mm, measured on the front side of the sample.

rc_left: Critical cut length in mm, measured on the back side of the sample.

rc: Mean of rc_right and rc_left.

densities: List of density measurements in kg/m^3 for each distinct slab layer of each sample.

densities_mean: Daily mean of densities.

layers: 2D array with layer density (kg/m^3) and layer thickness (mm) pairs for each distinct slab layer.

layers_mean: Daily mean of layers.

surface_lineload: Surface line load of added surface weights in N/mm.

wl_thickness: Weak-layer thickness in mm.

notes: Additional notes regarding the experiment or observations.

L: Length of the slab–weak-layer assembly in mm.

df_legacy.pkl

Contains robustness data such as radii of curvature, slope inclination, and various geometrical measurements.

#: Record number.

rc: Critical cut length in mm.

slope_incl: Inclination of the slope in degrees.

h: Slab height in mm.

density: Mean slab density in kg/m^3.

L: Lenght of the slab–weak-layer assembly in mm.

collapse_height: Weak-layer height reduction through collapse.

layers_mean: 2D array with layer density (kg/m^3) and layer thickness (mm) pairs for each distinct slab layer.

wl_thickness: Weak-layer thickness in mm.

surface_lineload: Surface line load from added weights in N/mm.

For more detailed information on the datasets, refer to the paper or the documentation provided within the Jupyter notebook.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

You are free to:

Share — copy and redistribute the material in any medium or format

Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Citation

Please cite the following paper if you use this analysis or the accompanying datasets:

Adam, V., Bergfeld, B., Weißgraeber, P. van Herwijnen, A., Rosendahl, P.L., Fracture toughness of mixed-mode anticracks in highly porous materials. Nature Communincations 15, 7379 (2024). https://doi.org/10.1038/s41467-024-51491-7

Facebook

Twitter

Click to copy link

Link copied

Cite

(2017). Dataset made from a Pandas Dataframe [Dataset]. https://peter.demo.socrata.com/dataset/Dataset-made-from-a-Pandas-Dataframe/w2r9-3vfi

Dataset made from a Pandas Dataframe

Explore at:

xlsx, csv, xmlAvailable download formats

Dataset updated

Jul 5, 2017

Description

a description

Clear search

Close search

Google apps

Main menu

Dataset made from a Pandas Dataframe

example-data-frame

Learn Data Science Series Part 1

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

dataset: Create interoperable and well-documented data frames

Dataframe Detection Dataset

Dataframe Detection

MAT-Builder datasets

dataframe-with-removed-features

Dataset

Contents

Datasets of the CIKM resource paper "A Semantically Enriched Mobility...

Input data

Output data: the semantically enriched Paris and New York City datasets

Tabular Representation

Data of The Balance-Scale task (paper-and-pencil and Math Garden)

Flow map data of the singel pendulum, double pendulum and 3-body problem

CODEX multiplexed imaging cell datasets used for using STELLAR to transfer...

Shopping Mall

Raw data from datasets used in SIMON analysis

PandasPlotBench

dfencoder - AutoEncoders for DataFrames

Dataset

Contents

Crop classification dataset for testing domain adaptation or distributional...

pandas DataFrames of the DYToMuMu_M-20_CT10_TuneZ2star_v2_8TeV process

A Replication Dataset for Fundamental Frequency Estimation

flores200-eng-bem

Fracture toughness of mixed-mode anticracks in highly porous materials...

Contents

Prerequisites

Setup

Running the Analysis

Data Description

`df_mmft.pkl`

`df_legacy.pkl`

License

Citation

Dataset made from a Pandas Dataframe