100+ datasets found

Python Import Data India – Buyers & Importers List
seair.co.in
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
India
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
h
Python-DPO
huggingface.co
Updated Jul 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NextWealth Entrepreneurs Private Limited (2024). Python-DPO [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2024
Dataset authored and provided by
NextWealth Entrepreneurs Private Limited
Description
Dataset Card for Python-DPO

This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.

Load with datasets

To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

ds = load_dataset("NextWealth/Python-DPO")

Data Fields

Each data instance contains:

instruction: The problem description/requirements… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.
SELTO Dataset
zenodo.org
data.niaid.nih.gov
application/gzip
Updated May 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch (2023). SELTO Dataset [Dataset]. http://doi.org/10.5281/zenodo.7781392
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7781392
Dataset updated
May 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch; Sören Dittmer; David Erzmann; Henrik Harms; Rielson Falck; Marco Gosch
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Benchmark Dataset for Deep Learning for 3D Topology Optimization

This dataset represents voxelized 3D topology optimization problems and solutions. The solutions have been generated in cooperation with the Ariane Group and Synera using the Altair OptiStruct implementation of SIMP within the Synera software. The SELTO dataset consists of four different 3D datasets for topology optimization, called disc simple, disc complex, sphere simple and sphere complex. Each of these datasets is further split into a training and a validation subset.

The following paper provides full documentation and examples:

Dittmer, S., Erzmann, D., Harms, H., Maass, P., SELTO: Sample-Efficient Learned Topology Optimization (2022) https://arxiv.org/abs/2209.05098.

The Python library DL4TO (https://github.com/dl4to/dl4to) can be used to download and access all SELTO dataset subsets.
Each TAR.GZ file container consists of multiple enumerated pairs of CSV files. Each pair describes a unique topology optimization problem and contains an associated ground truth solution. Each problem-solution pair consists of two files, where one contains voxel-wise information and the other file contains scalar information. For example, the i-th sample is stored in the files i.csv and i_info.csv, where i.csv contains all voxel-wise information and i_info.csv contains all scalar information. We define all spatially varying quantities at the center of the voxels, rather than on the vertices or surfaces. This allows for a shape-consistent tensor representation.

For the i-th sample, the columns of i_info.csv correspond to the following scalar information:

E - Young's modulus [Pa]

ν - Poisson's ratio [-]

σ_ys - a yield stress [Pa]

h - discretization size of the voxel grid [m]

The columns of i.csv correspond to the following voxel-wise information:

x, y, z - the indices that state the location of the voxel within the voxel mesh

Ω_design - design space information for each voxel. This is a ternary variable that indicates the type of density constraint on the voxel. 0 and 1 indicate that the density is fixed at 0 or 1, respectively. -1 indicates the absence of constraints, i.e., the density in that voxel can be freely optimized

Ω_dirichlet_x, Ω_dirichlet_y, Ω_dirichlet_z - homogeneous Dirichlet boundary conditions for each voxel. These are binary variables that define whether the voxel is subject to homogeneous Dirichlet boundary constraints in the respective dimension

F_x, F_y, F_z - floating point variables that define the three spacial components of external forces applied to each voxel. All forces are body forces given in [N/m^3]

density - defines the binary voxel-wise density of the ground truth solution to the topology optimization problem

How to Import the Dataset

with DL4TO: With the Python library DL4TO (https://github.com/dl4to/dl4to) it is straightforward to download and access the dataset as a customized PyTorch torch.utils.data.Dataset object. As shown in the tutorial this can be done via:

from dl4to.datasets import SELTODataset dataset = SELTODataset(root=root, name=name, train=train)

Here, root is the path where the dataset should be saved. name is the name of the SELTO subset and can be one of "disc_simple", "disc_complex", "sphere_simple" and "sphere_complex". train is a boolean that indicates whether the corresponding training or validation subset should be loaded. See here for further documentation on the SELTODataset class.

without DL4TO: After downloading and unzipping, any of the i.csv files can be manually imported into Python as a Pandas dataframe object:

import pandas as pd root = ... file_path = f'{root}/{i}.csv' columns = ['x', 'y', 'z', 'Ω_design','Ω_dirichlet_x', 'Ω_dirichlet_y', 'Ω_dirichlet_z', 'F_x', 'F_y', 'F_z', 'density'] df = pd.read_csv(file_path, names=columns)

Similarly, we can import a i_info.csv file via:

file_path = f'{root}/{i}_info.csv' info_column_names = ['E', 'ν', 'σ_ys', 'h'] df_info = pd.read_csv(file_path, names=info_columns)

We can extract PyTorch tensors from the Pandas dataframe df using the following function:

import torch def get_torch_tensors_from_dataframe(df, dtype=torch.float32): shape = df[['x', 'y', 'z']].iloc[-1].values.astype(int) + 1 voxels = [df['x'].values, df['y'].values, df['z'].values] Ω_design = torch.zeros(1, *shape, dtype=int) Ω_design[:, voxels[0], voxels[1], voxels[2]] = torch.from_numpy(data['Ω_design'].values.astype(int)) Ω_Dirichlet = torch.zeros(3, *shape, dtype=dtype) Ω_Dirichlet[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_x'].values, dtype=dtype) Ω_Dirichlet[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_y'].values, dtype=dtype) Ω_Dirichlet[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['Ω_dirichlet_z'].values, dtype=dtype) F = torch.zeros(3, *shape, dtype=dtype) F[0, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_x'].values, dtype=dtype) F[1, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_y'].values, dtype=dtype) F[2, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['F_z'].values, dtype=dtype) density = torch.zeros(1, *shape, dtype=dtype) density[:, voxels[0], voxels[1], voxels[2]] = torch.tensor(df['density'].values, dtype=dtype) return Ω_design, Ω_Dirichlet, F, density
Python Import Data in March - Seair.co.in
seair.co.in
Updated Mar 30, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2016). Python Import Data in March - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Mar 30, 2016
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
Lao People's Democratic Republic, Tanzania, Maldives, Bermuda, Tuvalu, Chad, French Polynesia, Germany, Cyprus, Israel
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Z
Storage and Transit Time Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. FeltonDate: 5/5/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably in this project.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a particular function:

01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
Z
Open Context Database SQL Dump
data.niaid.nih.gov
zenodo.org
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kansa, Sarah Whitcher (2025). Open Context Database SQL Dump [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14728228
Explore at:
Dataset updated
Jan 23, 2025
Dataset provided by
Kansa, Eric
Kansa, Sarah Whitcher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Open Context (https://opencontext.org) publishes free and open access research data for archaeology and related disciplines. An open source (but bespoke) Django (Python) application supports these data publishing services. The software repository is here: https://github.com/ekansa/open-context-py

The Open Context team runs ETL (extract, transform, load) workflows to import data contributed by researchers from various source relational databases and spreadsheets. Open Context uses PostgreSQL (https://www.postgresql.org) relational database to manage these imported data in a graph style schema. The Open Context Python application interacts with the PostgreSQL database via the Django Object-Relational-Model (ORM).

This database dump includes all published structured data organized used by Open Context (table names that start with 'oc_all_'). The binary media files referenced by these structured data records are stored elsewhere. Binary media files for some projects, still in preparation, are not yet archived with long term digital repositories.

These data comprehensively reflect the structured data currently published and publicly available on Open Context. Other data (such as user and group information) used to run the Website are not included.

IMPORTANT

This database dump contains data from roughly 190+ different projects. Each project dataset has its own metadata and citation expectations. If you use these data, you must cite each data contributor appropriately, not just this Zenodo archived database dump.
Python Import Data in December - Seair.co.in
seair.co.in
Updated Dec 31, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2015). Python Import Data in December - Seair.co.in [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Dec 31, 2015
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
Pitcairn, Tonga, Palau, French Guiana, Bhutan, Nicaragua, Korea (Democratic People's Republic of), Guinea, Bulgaria, Mauritius
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
T
cifar10
tensorflow.org
opendatalab.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). cifar10 [Dataset]. https://www.tensorflow.org/datasets/catalog/cifar10
Explore at:
Dataset updated
Jun 1, 2024
Description
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('cifar10', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/cifar10-3.0.2.png" alt="Visualization" width="500px">
Python package Datatable
kaggle.com
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaihua Zhang (2020). Python package Datatable [Dataset]. https://www.kaggle.com/zhangkaihua88/python-datatable/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kaihua Zhang
Description
Context

This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame; however we put specific emphasis on speed and big data support. As the name suggests, the package is closely related to R's data.table and attempts to mimic its core algorithms and API.

Content

The wheel file for installing datatable v0.11.0

Installation

!pip install ../input/python-datatable/datatable-0.11.0-cp37-cp37m-manylinux2010_x86_64.whl > /dev/null

Using

import datatable as dt data = dt.fread("filename").to_pandas()

Acknowledgements

https://github.com/h2oai/datatable

Documentation

https://datatable.readthedocs.io/en/latest/index.html

License

https://github.com/h2oai/datatable/blob/main/LICENSE
T
Data from: dolma
tensorflow.org
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). dolma [Dataset]. https://www.tensorflow.org/datasets/catalog/dolma
Explore at:
Dataset updated
Mar 14, 2025
Description
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('dolma', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
h
ChatML-WebGLM-QA
huggingface.co
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Victor Nogueira (2024). ChatML-WebGLM-QA [Dataset]. https://huggingface.co/datasets/Felladrin/ChatML-WebGLM-QA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2024
Authors
Victor Nogueira
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
THUDM/webglm-qa in ChatML format. Python code used for conversion: from datasets import load_dataset import pandas import re import random from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained( pretrained_model_name_or_path="Felladrin/Llama-160M-Chat-v1" )

dataset = load_dataset("THUDM/webglm-qa", split="train")

def format(columns): references = " ".join( [ f"- {columns['references'][i].strip()}" for i in… See the full description on the dataset page: https://huggingface.co/datasets/Felladrin/ChatML-WebGLM-QA.
h
Magicoder-Evol-Instruct-110K-python
huggingface.co
Updated Nov 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
pxy (2024). Magicoder-Evol-Instruct-110K-python [Dataset]. https://huggingface.co/datasets/pxyyy/Magicoder-Evol-Instruct-110K-python
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2024
Authors
pxy
Description
Dataset Card for "Magicoder-Evol-Instruct-110K-python"

from datasets import load_dataset

Load your dataset

dataset = load_dataset("pxyyy/Magicoder-Evol-Instruct-110K", split="train") # Replace with your dataset and split

Define a filter function

def contains_python(entry): for c in entry["messages"]: if "python" in c['content'].lower(): return True # return "python" in entry["messages"].lower() # Replace 'column_name' with the column to search

… See the full description on the dataset page: https://huggingface.co/datasets/pxyyy/Magicoder-Evol-Instruct-110K-python.
T
mnist
tensorflow.org
universe.roboflow.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
Z
Multimodal Vision-Audio-Language Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roig, Gemma (2024). Multimodal Vision-Audio-Language Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10060784
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Choksi, Bhavin
Roig, Gemma
Schaumlöffel, Timothy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities. Details can be found in the attached report. Annotation The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library. The split into train, validation and test set follows the split of the original datasets. Installation

pip install pandas pyarrow Example

import pandas as pddf = pd.read_parquet('annotation_train.parquet', engine='pyarrow')print(df.iloc[0])

dataset AudioSet filename train/---2_BBVHAA.mp3 captions_visual [a man in a black hat and glasses.] captions_auditory [a man speaks and dishes clank.] tags [Speech] Description The annotation file consists of the following fields:filename: Name of the corresponding file (video or audio file)dataset: Source dataset associated with the data pointcaptions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual contentcaptions_auditory: A list of captions related to the auditory content of the videotags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided Data files The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de
e
Ballroom Python South | See Full Import/Export Data | Eximpedia
eximpedia.app
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Ballroom Python South | See Full Import/Export Data | Eximpedia [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Jan 8, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Eritrea, Iceland, State of, Croatia, Guyana, Luxembourg, El Salvador, Mayotte, Myanmar, Zambia
Description
Ballroom Python South Company Export Import Records. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
h
census-income
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WC (2025). census-income [Dataset]. https://huggingface.co/datasets/cestwc/census-income
Explore at:
Dataset updated
Feb 12, 2025
Authors
WC
Description
Dataset Card for Census Income (Adult)

This dataset is a precise version of Adult or Census Income. This dataset from UCI somehow happens to occupy two links, but we checked and confirm that they are identical. We used the following python script to create this Hugging Face dataset. import pandas as pd from datasets import Dataset, DatasetDict, Features, Value, ClassLabel

URLs

url1 = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data" url2 =… See the full description on the dataset page: https://huggingface.co/datasets/cestwc/census-income.
T
mimic_play
tensorflow.org
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). mimic_play [Dataset]. https://www.tensorflow.org/datasets/catalog/mimic_play
Explore at:
Dataset updated
May 31, 2024
Description
Real dataset of 14 long horizon manipulation tasks. A mix of human play data and single robot arm data performing the same tasks.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('mimic_play', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
P
StudyAbroadGPT Dataset Dataset
paperswithcode.com
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Millat; Md Motiur (2025). StudyAbroadGPT Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/studyabroadgpt-dataset
Explore at:
Dataset updated
Apr 21, 2025
Authors
Md Millat; Md Motiur
Description
The StudyAbroadGPT-Dataset is a collection of conversational data focused on university application requirements for various programs, including MBA, MS in Computer Science, Data Science, and Bachelor of Medicine. The dataset includes interactions between humans asking questions about application processes (e.g., "How do I write a strong SOP for MS in Data Science at MIT?") and an assistant providing detailed responses. Covering prestigious institutions such as MIT, Oxford, Cambridge, and Stanford, this dataset serves as a valuable resource for understanding the informational needs of prospective students applying to study abroad.

Dataset Structure The dataset is organized as a list of JSON objects, where each object represents a single conversation. Each conversation contains an array of turns, structured as follows:

"from": Specifies the speaker, either "human" or "assistant". "value": Contains the text of the query or response.

Example json { "conversations": [ {"from": "human", "value": "What documents do I need for applying to MBA?"}, {"from": "assistant", "value": "## Introduction To embark on your MBA journey, it's crucial to gather the necessary documents..."} ] }

Data Collection This dataset was synthetically generated to simulate realistic conversations about study abroad applications. It is designed to reflect common queries and provide detailed, informative responses related to university application requirements.

Preprocessing The dataset is provided in its raw form, consisting of unprocessed conversational text. Depending on their specific use case, such as natural language processing (NLP) tasks, users may need to perform additional preprocessing steps like tokenization or stopword removal.

Usage You can load the dataset using the Hugging Face Datasets library with the following code:

from datasets import load_dataset dataset = load_dataset("millat/StudyAbroadGPT-Dataset")

Accessing Conversations To access and iterate through the conversations, use this example:

python for conversation in dataset["train"]: for turn in conversation["conversations"]: print(f"{turn['from']}: {turn['value']}")

Examples Here’s a sample conversation excerpt from the dataset:

json { "conversations": [ {"from": "human", "value": "What are the GMAT requirements for Oxford?"}, {"from": "assistant", "value": "## Introduction Oxford University, one of the world's most prestigious institutions, requires..."} ] }

Use Cases

Training Conversational Agents: Build chatbots to assist with university application queries. Analyzing Trends: Study application requirements across different programs and institutions. NLP Development: Create natural language understanding models tailored to educational domains.

License This dataset is licensed under the MIT License.

Citation If you use this dataset in your research, please cite it as follows:

bibtex @misc{StudyAbroadGPT-Dataset, author = {MD MILLAT HOSEN}, title = {StudyAbroadGPT-Dataset}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/datasets/millat/StudyAbroadGPT-Dataset}} }
T
spoc_robot
tensorflow.org
Updated Dec 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). spoc_robot [Dataset]. https://www.tensorflow.org/datasets/catalog/spoc_robot
Explore at:
Dataset updated
Dec 11, 2024
Description
To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('spoc_robot', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
P
OpenAsp Dataset
paperswithcode.com
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shmuel Amar; Liat Schiff; Ori Ernst; Asi Shefer; Ori Shapira; Ido Dagan (2023). OpenAsp Dataset [Dataset]. https://paperswithcode.com/dataset/openasp
Explore at:
Dataset updated
Dec 6, 2023
Authors
Shmuel Amar; Liat Schiff; Ori Ernst; Asi Shefer; Ori Shapira; Ido Dagan
Description
OpenAsp Dataset OpenAsp is an Open Aspect-based Multi-Document Summarization dataset derived from DUC and MultiNews summarization datasets.

Dataset Access To generate OpenAsp, you require access to the DUC dataset which OpenAsp is derived from.

Steps:

Grant access to DUC dataset by following NIST instructions here. you should receive two user-password pairs (for DUC01-02 and DUC06-07) you should receive a file named fwdrequestingducdata.zip Clone this repository by running the following command: git clone https://github.com/liatschiff/OpenAsp.git Optionally create a conda or virtualenv environment:

bash conda create -n openasp 'python>3.10,<3.11' conda activate openasp

Install python requirements, currently requires python3.8-3.10 (later python versions have issues with spacy)

bash pip install -r requirements.txt

copy fwdrequestingducdata.zip into the OpenAsp repo directory

run the prepare script command:

bash python prepare_openasp_dataset.py --nist-duc2001-user '<2001-user>' --nist-duc2001-password '<2001-pwd>' --nist-duc2006-user '<2006-user>' --nist-duc2006-password '<2006-pwd>'

load the dataset using huggingface datasets

from glob import glob import os import gzip import shutil from datasets import load_dataset openasp_files = os.path.join('openasp-v1', '*.jsonl.gz') data_files = { os.path.basename(fname).split('.')[0]: fname for fname in glob(openasp_files) } for ftype, fname in data_files.copy().items(): with gzip.open(fname, 'rb') as gz_file: with open(fname[:-3], 'wb') as output_file: shutil.copyfileobj(gz_file, output_file) data_files[ftype] = fname[:-3] load OpenAsp as huggingface's dataset openasp = load_dataset('json', data_files=data_files) print first sample from every split for split in ['train', 'valid', 'test']: sample = openasp[split][0] # print title, aspect_label, summary and documents for the sample title = sample['title'] aspect_label = sample['aspect_label'] summary = ' '.join(sample['summary_text']) input_docs_text = [' '.join(d['text']) for d in sample['documents']] print('* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *') print(f'Sample from {split} Split title={title} Aspect label={aspect_label}') print(f' aspect-based summary: {summary}') print(' input documents: ') for i, doc_txt in enumerate(input_docs_text): print(f'---- doc #{i} ----') print(doc_txt[:256] + '...') print('* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ')

Troubleshooting

Dataset failed loading with load_dataset() - you may want to delete huggingface datasets cache folder 401 Client Error: Unauthorized - you're DUC credentials are incorrect, please verify them (case sensitive, no extra spaces etc) Dataset created but prints a warning about content verification - you may be using different version of NLTK or spacy model which affects the sentence tokenization process. You must use exact versions as pinned on requirements.txt. IndexError: list index out of range - similar to (3), try to reinstall the requirements with exact package versions.

Under The Hood The prepare_openasp_dataset.py script downloads DUC and Multi-News source files, uses sacrerouge package to prepare the datasets and uses the openasp_v1_dataset_metadata.json file to extract the relevant aspect summaries and compile the final OpenAsp dataset.

License This repository, including the openasp_v1_dataset_metadata.json and prepare_openasp_dataset.py, are released under APACHE license.

OpenAsp dataset summary and source document for each sample, which are generated by running the script, are licensed under the respective generic summarization dataset - Multi-News license and DUC license.

Facebook

Twitter

Click to copy link

Link copied

Cite

Seair Exim, Python Import Data India – Buyers & Importers List [Dataset]. https://www.seair.co.in

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD

Explore at:

21 scholarly articles cite this dataset (View in Google Scholar)

.bin, .xml, .csv, .xlsAvailable download formats

Dataset provided by

Seair Exim Solutions

Authors

Seair Exim

Area covered

India

Description

Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

Clear search

Close search

Google apps

Main menu

Python Import Data India – Buyers & Importers List

Python-DPO

SELTO Dataset

Python Import Data in March - Seair.co.in

Storage and Transit Time Data and Code

Code information

Open Context Database SQL Dump

Python Import Data in December - Seair.co.in

cifar10

Python package Datatable

Context

Content

Installation

Using

Acknowledgements

Documentation

License

Data from: dolma

ChatML-WebGLM-QA

Magicoder-Evol-Instruct-110K-python

Load your dataset

Define a filter function

… See the full description on the dataset page: https://huggingface.co/datasets/pxyyy/Magicoder-Evol-Instruct-110K-python.

mnist

Multimodal Vision-Audio-Language Dataset

Ballroom Python South | See Full Import/Export Data | Eximpedia

census-income

URLs

mimic_play

StudyAbroadGPT Dataset Dataset

spoc_robot

OpenAsp Dataset

Python Import Data India – Buyers & Importers List

Seair Exim Solutions

Seair Info Solutions PVT LTD