MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Download the dataset
At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")
You can visualize the dataset with: df.head()
To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)
Dataset Description
This is an italian dataset formed by 200 old (ancient) italian sentence and… See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.
klib library enables us to quickly visualize missing data, perform data cleaning, visualize data distribution plot, visualize correlation plot and visualize categorical column values. klib is a Python library for importing, cleaning, analyzing and preprocessing data. Explanations on key functionalities can be found on Medium / TowardsDataScience in the examples section or on YouTube (Data Professor).
Original Github repo
https://raw.githubusercontent.com/akanz1/klib/main/examples/images/header.png" alt="klib Header">
!pip install klib
import klib
import pandas as pd
df = pd.DataFrame(data)
# klib.describe functions for visualizing datasets
- klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features
- klib.corr_mat(df) # returns a color-encoded correlation matrix
- klib.corr_plot(df) # returns a color-encoded heatmap, ideal for correlations
- klib.dist_plot(df) # returns a distribution plot for every numeric feature
- klib.missingval_plot(df) # returns a figure containing information about missing values
Take a look at this starter notebook.
Further examples, as well as applications of the functions can be found here.
Pull requests and ideas, especially for further functions are welcome. For major changes or feedback, please open an issue first to discuss what you would like to change. Take a look at this Github repo.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A collection of 13 brain maps. Each brain map is a 3D array of values representing properties of the brain at different locations.
MKDAChi2 meta analysis with FDRCorrector
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is generated syhthetically to create tables with following characteristics:
Empty cell percentage in following range 0,30 There is clear seperator between rows and columns (Structured). 15 <= num rows <= 30, 7 <= num columns <= 15 (Long)
Load the dataset
import io import pandas as pd from PIL import Image
def bytes_to_image(self, image_bytes: bytes): return Image.open(io.BytesIO(image_bytes))
def parse_annotations(self, annotations: str) -> pd.DataFrame:… See the full description on the dataset page: https://huggingface.co/datasets/nanonets/long_dense_structured_table.
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
pd-extended
Summary
PD-Extended is a collection of ~34.7 million image/caption pairs derived from the PD12M and Megalith-CC0 datasets. The image/caption pairs are accompanied with metadata, such as mime type and dimensions, as well as the accompanying CLIP-L14 embeddings. Of note, these images retain their original licensing, and the source_id is available to pair any derived image to its source within the original dataset. All images are paired with synthetic captions… See the full description on the dataset page: https://huggingface.co/datasets/Spawning/pd-extended.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation
pip install pandas pyarrow Example
import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])
dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de
Descriptor Prediction Dataset
This dataset is part of the Deep Principle Bench collection.
Files
descriptor_prediction.csv: Main dataset file
Usage
import pandas as pd from datasets import load_dataset
dataset = load_dataset("yhqu/descriptor_prediction")
df = pd.read_csv("hf://datasets/yhqu/descriptor_prediction/descriptor_prediction.csv")
Citation
Please cite this work if you use… See the full description on the dataset page: https://huggingface.co/datasets/yhqu/descriptor_prediction.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
MKDAChi2 meta analysis with FDRCorrector
homo sapiens
Other
meta-analysis
P
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The crystal structure of three incommensurately modulated γ-brass related composite structures in the Pd–Zn–Pt system has been solved from X-ray single crystal diffraction data using a 3 + 1-dimensional super space description. The compounds PtxPd15–xZn54 (x ≈ 6, 7, 10) crystallize in orthorhombic superspace group Fmmm(α00)0s0 (F = [(1/2, 1/2, 0, 0); (1/2, 0, 1/2, 0); (0, 1/2, 1/2, 0)] with the following fundamental cell dimensions: a = 4.265(1) Å, b = 9.132(1) Å, c = 12.928(2) Å, q ≈ 0.629a*; a = 4.284(1) Å, b = 9.151(2) Å, c = 12.948(4) Å, q ≈ 0.628a*; and a = 4.288(1) Å, b = 9.140(4) Å, c = 12.926(7) Å, q ≈ 0.627a*. Each structure is built by two sub-latticespentagonal antiprismatic columns parallel to [100] and a zigzag chain of Zn atoms running along the center of the column.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Herein we describe a concise assembly of the central 1,6-dioxaspiro[4.4]nonane core of cephalosporolides E/F by employing a Pd-mediated alkynediol cycloisomerization and their total synthesis. On the basis of spectroscopic data and optical rotation values, the absolute configurations of cephalosporolides E/F were proposed.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Download the dataset
At the moment to download the dataset you should use Pandas DataFrame: import pandas as pd df = pd.read_csv("https://huggingface.co/datasets/cybernetic-m/oldIT2modIT/resolve/main/oldIT2modIT_dataset.csv")
You can visualize the dataset with: df.head()
To convert into Huggingface dataset: from datasets import Dataset dataset = Dataset.from_pandas(df)
Dataset Description
This is an italian dataset formed by 200 old (ancient) italian sentence and… See the full description on the dataset page: https://huggingface.co/datasets/cybernetic-m/oldIT2modIT.