Gutenberg-BookCorpus-Cleaned-Data-English
This dataset is been cleaned and preprocessed using Gutenberg_English_Preprocessor class method (given below) from preference Kaggle dataset 75,000+ Gutenberg Books and Metadata 2025. This dataset is only specialisation for english contented with rights as "Public domain in the USA" hence you can free used it anywhere. Following reference metadata of Gutenberg is also available and downloaded it using following CLI command below :- pip… See the full description on the dataset page: https://huggingface.co/datasets/incredible45/Gutenberg-BookCorpus-Cleaned-Data-English.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
linux_commands.csv:
Description: This dataset provides a comprehensive list of Linux commands commonly used in Unix-based operating systems. Each entry includes details such as the command name, a brief description of its functionality, and any additional parameters or options.
cmd_commands.csv:
Description: The "cmd_commands.csv" dataset presents a collection of commands for the Windows Command Prompt (cmd). It covers a range of command-line operations and system management tasks specific to the Windows operating system. Entries include the command name, a brief description, and relevant usage information.
macos_commands.csv:
Description: This dataset compiles a set of commands tailored for macOS command-line interfaces. It encompasses commands commonly used in Terminal on Apple's macOS operating system. Each entry in the dataset includes the command, a concise description of its purpose, and any pertinent options or arguments.
vbscript_commands.csv:
Description: The "vbscript_commands.csv" dataset contains a list of VBScript commands and functionalities. VBScript, or Visual Basic Scripting Edition, is a scripting language developed by Microsoft. This dataset provides insights into various VBScript commands, their applications, and usage details, making it a valuable resource for scripting on Windows-based systems.
Original Data Source: Linux/CMD/MacOS Commands
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Gutenberg-BookCorpus-Cleaned-Data-English
This dataset is been cleaned and preprocessed using Gutenberg_English_Preprocessor class method (given below) from preference Kaggle dataset 75,000+ Gutenberg Books and Metadata 2025. This dataset is only specialisation for english contented with rights as "Public domain in the USA" hence you can free used it anywhere. Following reference metadata of Gutenberg is also available and downloaded it using following CLI command below :- pip… See the full description on the dataset page: https://huggingface.co/datasets/incredible45/Gutenberg-BookCorpus-Cleaned-Data-English.