Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so101_follower_t", "total_episodes": 1, "total_frames": 416, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 100, "splits": { "train": "0:1" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/pepijn223/test-dataset-upload.
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Dataset Card
This dataset contains a single huggingface split, named 'all_samples'. The samples contains a single huggingface feature, named called "sample". Samples are instances of plaid.containers.sample.Sample. Mesh objects included in samples follow the CGNS standard, and can be converted in Muscat.Containers.Mesh.Mesh. Example of commands: import pickle from datasets import load_dataset from plaid.containers.sample import Sample
dataset =… See the full description on the dataset page: https://huggingface.co/datasets/PLAID-datasets/AirfRANS_original.
This is a PDF document created by the Department of Information Technology (DoIT) and the Governor's Office of Performance Improvement to assist training Maryland state employees on use of the Open Data Portal, https://opendata.maryland.gov. This document covers direct data entry, uploading Excel spreadsheets, connecting source databases, and transposing data. Please note that this tutorial is intended for use by state employees, as non-state users cannot upload datasets to the Open Data Portal.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset results from 13 data description sessions conducted at U. Porto. In each session researchers have created metadata in the Dendro, research data management platform. A project for each session was created beforehand in Dendro and all the sessions were kept under the same account. All projects were kept private. This was explained to the researchers and they could have changed any information if they wanted to. When scheduling the sessions researchers were asked to choose a dataset to describe. The sessions started by introducing researchers to Dendro with a brief demonstration of its features. The researchers were then asked to create a folder and upload their datasets. During the session the selection of descriptors was mostly up to them. Exceptionally, they were asked if a given descriptor was suitable to contextualize their data. Sessions audio was recorded with the researchers’ consent and were deleted after the transcription of relevant events and comments during each session to complement the analysis of the metadata produced. The audio was also used to mark the moment the researchers started and finished the description, in order to ascertain the session duration.
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Dataset Card for DUTS
This is a FiftyOne dataset with 15572 samples.
Installation
If you haven't already, install FiftyOne: pip install -U fiftyone
Usage
import fiftyone as fo import fiftyone.utils.huggingface as fouh
dataset = fouh.load_from_hub("Voxel51/DUTS")
session = fo.launch_app(dataset)
Dataset Details
Dataset Description… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/DUTS.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This data was collected as a course project for the immersive data science course (by General Assembly and Misk Academy).
This dataset is in a CSV format, it consists of 5717 rows and 15 columns, where each row is a dataset on Kaggle and each column represents a feature of that dataset. |Feature|Description| |-------|-----------| |title| dataset name | |usability| dataset usability rating by Kaggle | |num_of_files| number of files associated with the dataset | |types_of_files| types of files associated with the dataset | |files_size| size of the dataset files | |vote_counts| total votes count by the dataset viewer | |medal| reward to popular datasets measured by the number of upvotes (votes by novices are excluded from medal calculation), [Bronze = 5 Votes, Silver = 20 Votes, Gold = 50 Votes] | |url_reference| reference to the dataset page on Kaggle in the format: www.kaggle.com/url_reference | |keywords| Topics tagged with the dataset | |num_of_columns| number of features in the dataset | |views| number of views | |downloads| number of downloads | |download_per_view| download per view ratio | |date_created| dataset creation date | |last_updated| date of the last update |
I would like to thank all my GA instructors for their continuous help and support
All data were taken from https://www.kaggle.com , collected on 30 Jan 2021
Using this dataset, we could try to predict the upcoming datasets uploaded, number of votes, number of downloads, medal type, etc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Test Coco Import is a dataset for object detection tasks - it contains Pallets QRcodes annotations for 599 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This work is grounded in the dataset of the '20-year Data of Energy Station at Kitakyushu Campus.' It encompasses a comprehensive analysis of the actual heating and cooling energy consumption of individual buildings within the energy station. The utilization of long-term data sets enables researchers and policymakers to attain a holistic comprehension of the contrasting hot and cold indicators across diverse buildings, thus facilitating their decision-making process in various scenarios.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
OverviewHessian QM9 is the first database of equilibrium configurations and numerical Hessian matrices, consisting of 41,645 molecules from the QM9 dataset at the $\omega$B97x/6-31G* level. Molecular Hessians were calculated in vacuum, as well as in water, tetrahydrofuran, and toluene using an implicit solvation model.A pre-print article associated with this dataset is available at here.Data recordsThe dataset is stored in Hugging Face's dataset format. For each of the four implicit solvent environments (vacuum, THF, toluene, and water), the data is divided into separate datasets containing vibrational analysis of 41,645 optimized geometries. Labels are associated with the QM9 molecule labelling system given by Ramakrishnan et al.Please note that only molecules containing H, C, N, O were considered. This exclusion was due to the limited number of molecules containing fluorine in the QM9 dataset, which was not sufficient to build a good description of the chemical environment for fluorine atoms. Including these molecules may have reduced the overall precision of any models trained on our data.Load the dataset:Use the following Python script to load the dataset dictionary: pythonfrom datasets import load_from_diskdataset = load_from_disk(root_directory)print(dataset)
Expected output:pythonDatasetDict({vacuum: Dataset({features: ['energy', 'positions', 'atomic_numbers', 'forces', 'frequencies', 'normal_modes', 'hessian', 'label'],num_rows: 41645}),thf: Dataset({features: ['energy', 'positions', 'atomic_numbers', 'forces', 'frequencies', 'normal_modes', 'hessian', 'label'],num_rows: 41645}),toluene: Dataset({features: ['energy', 'positions', 'atomic_numbers', 'forces', 'frequencies', 'normal_modes', 'hessian', 'label'],num_rows: 41645}),water: Dataset({features: ['energy', 'positions', 'atomic_numbers', 'forces', 'frequencies', 'normal_modes', 'hessian', 'label'],num_rows: 41645})})
DFT MethodsAll DFT calculations were carried out using the NWChem software package. The density functional used was $\omega$B97x with a 6-31G* basis set to create data compatible with the ANI-1/ANI-1x/ANI-2x datasets. The self-consistent field (SCF) cycle was converged when changes in total energy and density were less than 1e-6 eV. All molecules in the set are neutral with a multiplicity of 1. The Mura-Knowles radial quadrature and Lebedev angular quadrature were used in the integration. Structures were optimized in vacuum and three solvents (tetrahydrofuran, toluene, and water) using an implicit solvation model.The Hessian matrices, vibrational frequencies, and normal modes were computed for a subset of 41,645 molecular geometries using the finite differences method.Example model weightsAn example model trained on Hessian data is included in this dataset. Full details of the model will be provided in an upcoming publication. The model is an E(3)-equivariant graph neural network using the e3x
package with specific architecture details. To load the model weights, use:pythonparams = jnp.load('params_train_f128_i5_b16.npz', allow_pickle=True)['params'].item()
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains six years (1 January 2016 till 31 December 2021) of the electric power load data for different types of buildings in an industrial park in Suzhou, China, which is obtained from smart meters and has three different time resolutions (5 min, 30 min, and 1 hour). The presented dataset can be used for various research tasks, including load prediction, load pattern recognition, anomaly detection, and demand response strategy development. Additionally, such high-resolution data is valuable for researchers in the study of the characteristics of electric power load between different types of buildings in an industrial park.
40,000 lines of Shakespeare from a variety of Shakespeare's plays. Featured in Andrej Karpathy's blog post 'The Unreasonable Effectiveness of Recurrent Neural Networks': http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
To use for e.g. character modelling:
d = tfds.load(name='tiny_shakespeare')['train']
d = d.map(lambda x: tf.strings.unicode_split(x['text'], 'UTF-8'))
# train split includes vocabulary for other splits
vocabulary = sorted(set(next(iter(d)).numpy()))
d = d.map(lambda x: {'cur_char': x[:-1], 'next_char': x[1:]})
d = d.unbatch()
seq_len = 100
batch_size = 2
d = d.batch(seq_len)
d = d.batch(batch_size)
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('tiny_shakespeare', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tedious
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, Scientific DOcument Retrieval using Multi-level Aspect-based quEries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research.
Documentations for the DORIS-MAE dataset is publicly available at https://github.com/Real-Doris-Mae/Doris-Mae-Dataset. This upload contains both DORIS-MAE dataset version 1 and ada-002 vector embeddings for all queries and related abstracts (used in candidate pool creation). DORIS-MAE dataset version 1 is comprised of four main sub-datasets, each serving distinct purposes.
The Query dataset contains 100 human-crafted complex queries spanning across five categories: ML, NLP, CV, AI, and Composite. Each category has 20 associated queries. Queries are broken down into aspects (ranging from 3 to 9 per query) and sub-aspects (from 0 to 6 per aspect, with 0 signifying no further breakdown required). For each query, a corresponding candidate pool of relevant paper abstracts, ranging from 99 to 138, is provided.
The Corpus dataset is composed of 363,133 abstracts from computer science papers, published between 2011-2021, and sourced from arXiv. Each entry includes title, original abstract, URL, primary and secondary categories, as well as citation information retrieved from Semantic Scholar. A masked version of each abstract is also provided, facilitating the automated creation of queries.
The Annotation dataset includes generated annotations for all 165,144 question pairs, each comprising an aspect/sub-aspect and a corresponding paper abstract from the query's candidate pool. It includes the original text generated by ChatGPT (version chatgpt-3.5-turbo-0301) explaining its decision-making process, along with a three-level relevance score (e.g., 0,1,2) representing ChatGPT's final decision.
Finally, the Test Set dataset contains human annotations for a random selection of 250 question pairs used in hypothesis testing. It includes each of the three human annotators' final decisions, recorded as a three-level relevance score (e.g., 0,1,2).
The file "ada_embedding_for_DORIS-MAE_v1.pickle" contains text embeddings for the DORIS-MAE dataset, generated by OpenAI's ada-002 model. The structure of the file is as follows:
├── ada_embedding_for_DORIS-MAE_v1.pickle ├── "Query" │ ├── query_id_1 (Embedding of query_1) │ ├── query_id_2 (Embedding of query_2) │ └── query_id_3 (Embedding of query_3) │ . │ . │ . └── "Corpus" ├── corpus_id_1 (Embedding of abstract_1) ├── corpus_id_2 (Embedding of abstract_2) └── corpus_id_3 (Embedding of abstract_3) . . .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset shows major Malaysia’s import by country of origin in USD currency
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is derived from APIBench dataset by running the following script:
import json import os
import dotenv import pandas as pd from datasets import Dataset from huggingface_hub import login
dotenv.load_dotenv()
os.system("git clone https://github.com/ShishirPatil/gorilla.git")
login(os.environ["HF_API_KEY"])
def read_jsonl_as_df(file_path): data = [] with open(file_path… See the full description on the dataset page: https://huggingface.co/datasets/rbiswasfc/APIBench.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
High-quality version of the CELEBA dataset, consisting of 30000 images in 1024 x 1024 resolution.
Note: CelebAHQ dataset may contain potential bias. The fairness indicators example goes into detail about several considerations to keep in mind while using the CelebAHQ dataset.
WARNING: This dataset currently requires you to prepare images on your own.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('celeb_a_hq', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/celeb_a_hq-1024-2.0.0.png" alt="Visualization" width="500px">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data comprises loadcell force data collected from the SmartBay buoy moored in Galway Bay. The Strainstalls load shackle measures tensile loads. This dataset contains measurements of load from mooring chains attached to the SmartBay buoy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1217960 Global import shipment records of Valves with prices, volume & current Buyer's suppliers relationships based on actual Global export trade database.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created using LeRobot.
Dataset Structure
meta/info.json: { "codebase_version": "v2.1", "robot_type": "so101_follower_t", "total_episodes": 1, "total_frames": 416, "total_tasks": 1, "total_videos": 0, "total_chunks": 1, "chunks_size": 1000, "fps": 100, "splits": { "train": "0:1" }, "data_path": "data/chunk-{episode_chunk:03d}/episode_{episode_index:06d}.parquet", "video_path":… See the full description on the dataset page: https://huggingface.co/datasets/pepijn223/test-dataset-upload.