9 datasets found

Human resources dataset
kaggle.com
zip
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khanh Nguyen (2023). Human resources dataset [Dataset]. https://www.kaggle.com/datasets/khanhtang/human-resources-dataset
Explore at:
zip(17041 bytes)Available download formats
Dataset updated
Mar 15, 2023
Authors
Khanh Nguyen
Description
The HR dataset is a collection of employee data that includes information on various factors that may impact employee performance. To explore the employee performance factors using Python, we begin by importing the necessary libraries such as Pandas, NumPy, and Matplotlib, then load the HR dataset into a Pandas DataFrame and perform basic data cleaning and preprocessing steps such as handling missing values and checking for duplicates.

The dataset also use various data visualization to explore the relationships between different variables and employee performance. For example, scatterplots to examine the relationship between job satisfaction and performance ratings, or bar charts to compare the average performance ratings across different gender or positions.
Compare Baseball Player Statistics using Visualiza
kaggle.com
zip
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
Explore at:
zip(1030978 bytes)Available download formats
Dataset updated
Sep 28, 2024
Authors
Abdelaziz Sami
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

1. Load the Data

First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

2. Explore the Data

Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

3. Visualization

We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

Example Code

Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load the data df = pd.read_csv('judge.csv') # Display the first few rows of the dataframe print(df.head()) # Set the style of seaborn sns.set(style="whitegrid") # 1. Average Release Speed by Pitch Type plt.figure(figsize=(12, 6)) avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values() sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis") plt.title('Average Release Speed by Pitch Type') plt.xlabel('Average Release Speed (mph)') plt.ylabel('Pitch Type') plt.show() # 2. Trends in Release Speed Over Time # First, convert the 'game_date' to datetime df['game_date'] = pd.to_datetime(df['game_date']) plt.figure(figsize=(14, 7)) sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None) plt.title('Trends in Release Speed Over Time') plt.xlabel('Game Date') plt.ylabel('Average Release Speed (mph)') plt.xticks(rotation=45) plt.tight_layout() plt.show() # 3. Scatter Plot of Release Speed vs. Events plt.figure(figsize=(12, 6)) sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7) plt.title('Release Speed vs. Events') plt.xlabel('Release Speed (mph)') plt.ylabel('Event Type') plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left') plt.show()

Explanation of the Code

Data Loading: The CSV file is loaded into a Pandas DataFrame.

Average Release Speed: A bar chart shows the average release speed for each pitch type.

Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.

Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

Conclusion

These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!
OpenOrca
kaggle.com
opendatalab.com
+1more
zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
Explore at:
zip(2548102631 bytes)Available download formats
Dataset updated
Nov 22, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

By Huggingface Hub [source]

About this dataset

The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

import pandas as pd # Library used for importing datasets into Python df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'

After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on

Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

Research Ideas

Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.

Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.

Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...
Linux Terminal Commands Dataset
kaggle.com
zip
Updated May 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SUNNY THAKUR (2025). Linux Terminal Commands Dataset [Dataset]. https://www.kaggle.com/datasets/cyberprince/linux-terminal-commands-dataset
Explore at:
zip(32599 bytes)Available download formats
Dataset updated
May 21, 2025
Authors
SUNNY THAKUR
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Linux Terminal Commands Dataset Overview The Linux Terminal Commands Dataset is a comprehensive collection of 600 unique Linux terminal commands (cmd-001 to cmd-600), curated for cybersecurity professionals, system administrators, data scientists, and machine learning engineers. This dataset is designed to support advanced use cases such as penetration testing, system administration, forensic analysis, and training machine learning models for command-line automation and anomaly detection. The commands span 10 categories: Navigation, File Management, Viewing, System Info, Permissions, Package Management, Networking, User Management, Process, and Editor. Each entry includes a command, its category, a description, an example output, and a reference to the relevant manual page, ensuring usability for both human users and automated systems. Key Features

Uniqueness: 600 distinct commands with no overlap, covering basic to unconventional tools. Sophistication: Includes advanced commands for SELinux, eBPF tracing, network forensics, and filesystem debugging. Unconventional Tools: Features obscure utilities like bpftrace, tcpflow, zstd, and aa-status for red teaming and system tinkering. ML-Ready: Structured in JSON Lines (.jsonl) format for easy parsing and integration into machine learning pipelines. Professional Focus: Tailored for cybersecurity (e.g., auditing, hardening), system administration (e.g., performance tuning), and data science (e.g., log analysis).

Dataset Structure The dataset is stored in a JSON Lines file (linux_terminal_commands_dataset.jsonl), where each line represents a single command with the following fields:

Field Description

id Unique identifier (e.g., cmd-001 to cmd-600).

command The Linux terminal command (e.g., setfacl -m u:user:rw file.txt).

category One of 10 categories (e.g., Permissions, Networking).

description A concise explanation of the command's purpose and functionality.

example_output Sample output or expected behavior (e.g., [No output if successful]).

man_reference URL to the official manual page (e.g., https://man7.org/linux/man-pages/...).

Category Distribution

Category Count

Navigation 11

File Management 56

Viewing 35

System Info 51

Permissions 28

Package Management 12

Networking 56

User Management 19

Process 42

Editor 10

Usage Prerequisites

Python 3.6+: For parsing and analyzing the dataset. Linux Environment: Most commands require a Linux system (e.g., Ubuntu, CentOS, Fedora) for execution. Optional Tools: Install tools like pandas for data analysis or jq for JSON processing.

Loading the Dataset ```python Use Python to load and explore the dataset: import json import pandas as pd

Load dataset

dataset = [] with open("linux_terminal_commands_dataset.jsonl", "r") as file: for line in file: dataset.append(json.loads(line))

Convert to DataFrame

df = pd.DataFrame(dataset)

Example: View category distribution

print(df.groupby("category").size())

Example: Filter Networking commands

networking_cmds = df[df["category"] == "Networking"] print(networking_cmds[["id", "command", "description"]]) ```

Example Applications

Cybersecurity: Use bpftrace or tcpdump commands for real-time system and network monitoring. Audit permissions with setfacl, chcon, or aa-status for system hardening.

System Administration: Monitor performance with slabtop, pidstat, or systemd-analyze. Manage filesystems with btrfs, xfs_repair, or cryptsetup.

Machine Learning: Train NLP models to predict command categories or generate command sequences. Use example outputs for anomaly detection in system logs.

Pentesting: Leverage nping, tcpflow, or ngrep for network reconnaissance. Explore find / -perm /u+s to identify potential privilege escalation vectors.

Executing Commands Warning: Some commands (e.g., mkfs.btrfs, fuser -k, cryptsetup) can modify or destroy data. Always test in a sandboxed environment. To execute a command:

Example: List SELinux file contexts

semanage fcontext -l

Installation

Clone the repository:git clone https://github.com/sunnythakur25/linux-terminal-commands-dataset.git cd linux-terminal-commands-dataset

Ensure the dataset file (linux_terminal_commands_dataset.jsonl) is in the project directory. Install dependencies for analysis (optional):pip install pandas

Contribution Guidelines We welcome contributions to expand the dataset or improve its documentation. To contribute:

Fork the Repository: Create a fork on GitHub. Add Commands: Ensure new commands are unique, unconventional, and include all required fields (id, command, category, etc.). Test Commands: Verify commands work on a Linux system and provide accurate example outputs. Submit a Pull Request: Include a clear description of your changes and their purpose. Follow Standards: Use JSON Lines format. Reference man7.org for manual pages. Categorize c...
Insurance_claims
kaggle.com
data.mendeley.com
zip
Updated Oct 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Miannotti (2025). Insurance_claims [Dataset]. https://www.kaggle.com/datasets/mian91218/insurance-claims
Explore at:
zip(68984 bytes)Available download formats
Dataset updated
Oct 19, 2025
Authors
Miannotti
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AQQAD, ABDELRAHIM (2023), “insurance_claims ”, Mendeley Data, V2, doi: 10.17632/992mh7dk9y.2

https://data.mendeley.com/datasets/992mh7dk9y/2

Latest version Version 2 Published: 22 Aug 2023 DOI: 10.17632/992mh7dk9y.2

Data Acquisition: - Obtain the dataset titled "Insurance_claims" from the following Mendeley repository: https://https://data.mendeley.com/drafts/992mh7dk9y - Download and store the dataset locally for easy access during subsequent steps.

Data Loading & Initial Exploration: - Use Python's Pandas library to load the dataset into a DataFrame. python Code used:

Load the Dataset File

insurance_df = pd.read_csv('insurance_claims.csv')

Inspect the initial rows, data types, and summary statistics to get an understanding of the dataset's structure.

Data Cleaning & Pre-processing: - Handle missing values, if any. Strategies may include imputation or deletion based on the nature of the missing data. - Identify and handle outliers. In this research, particularly, outliers in the 'umbrella_limit' column were addressed. - Normalize or standardize features if necessary.

Exploratory Data Analysis (EDA): - Utilize visualization libraries such as Matplotlib and Seaborn in Python for graphical exploration. - Examine distributions, correlations, and patterns in the data, especially between features and the target variable 'fraud_reported'. - Identify features that exhibit distinct patterns for fraudulent and non-fraudulent claims.

Feature Engineering & Selection: - Create or transform existing features to improve model performance. - Use techniques like Recursive Feature Elimination (RFECV) to identify and retain only the most informative features.

Modeling: - Split the dataset into training and test sets to ensure the model's generalizability. - Implement machine learning algorithms such as Support Vector Machine, RandomForest, and Voting Classifier using libraries like Scikit-learn. - Handle class imbalance issues using methods like Synthetic Minority Over-sampling Technique (SMOTE).

Model Evaluation: - Evaluate the performance of each model using metrics like precision, recall, F1-score, ROC-AUC score, and confusion matrix. - Fine-tune the models based on the results. Hyperparameter tuning can be performed using techniques like Grid Search or Random Search.

Model Interpretation: - Use methods like SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) to interpret and understand the predictions made by the model.

Deployment & Prediction: - Utilize the best-performing model to make predictions on unseen data. - If the intention is to deploy the model in a real-world scenario, convert the trained model into a format suitable for deployment (e.g., using libraries like joblib or pickle).

Software & Tools: - Programming Language: Python (version: GoogleColab) - Libraries: Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, Imbalanced-learn, LIME, and SHAP. - Environment: Jupyter Notebook or any Python IDE.
Books Set in Bath
kaggle.com
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Books Set in Bath [Dataset]. https://www.kaggle.com/datasets/thedevastator/books-set-in-bath/code
Explore at:
zip(7510 bytes)Available download formats
Dataset updated
Dec 6, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Books Set in Bath

A catalog of books set in the city of Bath

By Leigh Dodds [source]

About this dataset

The dataset offers insights into various literary works that take place in Bath, providing an opportunity for readers and researchers to explore the rich connections between literature and this historical city. Whether you are interested in local stories or looking for inspiration for your next visit to Bath, this dataset serves as a useful resource.

Each entry includes detailed information such as the unique identifier assigned by LibraryThing (URI), which allows users to access further metadata and book covers using LibraryThing's APIs. Additionally, if available, ISBNs are provided for easy identification of specific editions or versions of each book.

With columns formatted consistently as uri,**uri,title,**title,isbn,**isbn,and author,**author,the dataset ensures clarity and enables efficient data analysis.

How to use the dataset

Dataset Overview

Columns

This dataset consists of eight columns that provide important details about each book:

uri: The unique identifier for each book in the LibraryThing database.

title: The title of the book.

isbn: The International Standard Book Number (ISBN) for the book if known.

author: The author of the book.

Getting Started

Before diving into analyzing or exploring this dataset, it's important to understand its structure and familiarize yourself with its columns and values.

To get started:

Load/import it into your preferred data analysis tool or programming language (e.g., Python pandas library).

Follow along with code examples provided below for common tasks using pandas library.

Example Code: Getting Basic Insights

import pandas as pd # Load CSV file into pandas DataFrame data = pd.read_csv('Library_Thing_Books_Set_in_Bath.csv') # Print basic insights about columns and values print(Number of rows:, data.shape[0]) print(Number of columns:, data.shape[1]) print( Column names:, list(data.columns)) print( Sample data:) print(data.head())

Exploring the Data

Once you have loaded the dataset into your preferred tool, you can begin exploring and analyzing its contents. Here are a few common tasks to get you started:

1. Checking Unique Book Count:

unique_books = data['title'].nunique() print(Number of unique books:, unique_books)

2. Finding Books by a Specific Author:

author_name = Jane Austen books_by_author = data[data['author'] == author

Research Ideas

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Library_Thing_Books_Set_in_Bath.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------------------| | uri | The unique identifier for each book in the dataset. (String) | | title | The title of the book. (String) | | isbn | The International Standard Book Number (ISBN) for the book, which is a unique identifier for published books. (String) | | author | The author of the book. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Leigh Dodds.
Adidas_Sales_Analysis
kaggle.com
zip
Updated Mar 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archis Rudra (2023). Adidas_Sales_Analysis [Dataset]. https://www.kaggle.com/datasets/archisrudra/adidas-sales-analysis/versions/1
Explore at:
zip(1863030 bytes)Available download formats
Dataset updated
Mar 11, 2023
Authors
Archis Rudra
Description
Portfolio_Adidas_Dataset A set of real-world dataset tasks is completed by using the Python Pandas and Matplotlib libraries.

Background Information: In this portfolio, we use Python Pandas & Python Matplotlib to analyze and answer business questions about 5 products worth of sales data. The data contains hundreds of thousands of footwear store purchases broken down by product type, cost, region, state, city, and so on.

We start by cleaning our data. Tasks during this section include:

Drop NaN values from DataFrame

Removing column based on a condition

Changing the column name

Removing rows based on a condition

Reindexing rows based on a condition

Adding Month and Year column (to_datetime)

Conversion of data types from string to integer (to_numeric)

Once we have cleaned up our data a bit, we move to the data exploration section. In this section we explore 5 high-level business questions related to our data:

What was the highest number of sales in which year?

What product sold the most? Why do you think it sold the most?

What was the average price for each product? And the overall average price of all products?

What was the best retailer for sales? How much was earned that retailers?

What method is most efficient for sales?

To answer these questions we walk through many different openpyxl, pandas, and matplotlib methods. They include:

Using groupby to perform aggregate analysis

Plotting bar charts, lines graphs, and pie charts to visualize our results

Labeling our graphs
Coronavirus 2019-2020 latest dataset
kaggle.com
zip
Updated Apr 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanu N Prabhu (2020). Coronavirus 2019-2020 latest dataset [Dataset]. https://www.kaggle.com/tanuprabhu/coronavirus-20192020-latest-dataset
Explore at:
zip(866 bytes)Available download formats
Dataset updated
Apr 12, 2020
Authors
Tanu N Prabhu
Description
Context

I always wanted to access a data set that was related to the coronavirus (Country wise). But I could not find a properly documented data set. Rather, I just created one manually thinking this dataset would be really helpful for others.

Content

Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Coronavirus) country-wise cases on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. The results were not satisfactory. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with more details about cases.

Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2F929b6e449f4d4962299445bc9cf9e7f2%2Fdo-web-scraping-and-data-mining-with-python.jfif?generation=1585172688729088&alt=media" alt="">

You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

Below is the code that I used to scrape the code from the website

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2F20da1f48036897a048a72e94f982acb8%2FCapture.PNG?generation=1585172815269902&alt=media" alt="">

Acknowledgements

Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data. This data was scraped on 25th March at 3:45 PM. I will try to update the data every day.

Inspiration

As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting
Submarine Pipeline Locations in USACE Dataset
kaggle.com
zip
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Submarine Pipeline Locations in USACE Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/submarine-pipeline-locations-in-usace-dataset
Explore at:
zip(17924 bytes)Available download formats
Dataset updated
Dec 8, 2023
Authors
The Devastator
Description
Submarine Pipeline Locations in USACE Dataset

Locations and details of submarine pipelines for oil or gas transport

By Homeland Infrastructure Foundation [source]

About this dataset

The Submarine Pipeline Lines in the USACE IENC dataset provides comprehensive information about the locations and characteristics of submarine pipelines used for transporting oil or gas. These submarine or land pipelines are composed of interconnected pipes that are either laid on or buried beneath the seabed or land surfaces.

This dataset is a part of the Inland Electronic Navigational Charts (IENCs) and has been derived from reliable data sources utilized for maintaining navigation channels. It serves as a valuable resource for researchers, analysts, and policymakers interested in studying and monitoring the infrastructure related to oil and gas transportation.

For each submarine pipeline, this dataset includes various attributes such as its category type, product being transported (e.g., oil or gas), unique name or identifier, current status (active or decommissioned), additional information about its purpose or characteristics, minimum scale at which it can be visible on a map, length in meters, source of data used to create the dataset, and details regarding who provided the source data.

The Category_o column categorizes each pipeline based on its type, providing insights into different classifications within this infrastructure sector. Similarly,the Product column specifies whether it carries oil or gas through these pipelines.

Moreover,this dataset's Object_Nam field contains distinct names assigned to each submarine pipeline within the USACE IENC database. These names facilitate easy identification and reference when studying specific sections of this extensive network.

The Status attribute indicates whether a particular pipeline is currently active for transport purposes or has been decommissioned. This distinction holds significance for analyzing operational capacity and overall functionality.

Informatio presents additional details that further enhance our understanding of specific aspects related to these submarine pipelines such as their construction methods,purpose,functionality,and maintenance requirements.

Scale_Mini denotes the minimum scale at which each individual pipeline can be visualized accurately on a map,enabling users to effectively browse different levels of detail based on their requirements.

Finally,the Shape_Leng attribute provides the length of each submarine pipeline in meters, which is helpful for assessing distances, evaluating potential risks or vulnerabilities, and estimating transportation efficiency.

It is important to note that this dataset's information has been sourced from the USACE IENC dataset, ensuring its reliability and relevance to navigation channels. By leveraging this comprehensive collection of submarine pipeline data, stakeholders can gain valuable insights into the infrastructure supporting oil and gas transportation systems

How to use the dataset

Dataset Overview

The dataset contains several columns with information about each submarine pipeline. Here is an overview of each column:

Category_o: The category or type of the submarine pipeline.

Product: The product being transported through the submarine pipeline, such as oil or gas.

Object_Nam: The name or identifier of the submarine pipeline.

Status: The current status of the submarine pipeline, such as active or decommissioned.

Informatio: Additional information or details about the submarine pipeline.

Scale_Mini: The minimum scale at which the submarine pipeline is visible on a map.

Source_Dat: The source of data used to create this dataset.

Source_Ind: The individual or organization that provided the source data.

Source_D_1: Additional source information or details about this specific data.

Shape_Leng: The length of the submarine pipeline in meters.

Accessing and Analyzing Data

To access and start analyzing this dataset, you can follow these steps:

Download: First, download the Submarine Pipeline Lines_USACE_IENC.csv file from its source.

The downloaded file should be saved in your project directory.

Open CSV File: Open your preferred programming environment (e.g., Python with Pandas) and read/load this CSV file into a dataframe.

Data Exploration: Explore the dataset by examining its columns, rows, and general structure. Use pandas functions like head(), info(), or `descr...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Khanh Nguyen (2023). Human resources dataset [Dataset]. https://www.kaggle.com/datasets/khanhtang/human-resources-dataset

Human resources dataset

Discover patterns and insight into employee performance.

Explore at:

zip(17041 bytes)Available download formats

Dataset updated

Mar 15, 2023

Authors

Khanh Nguyen

Description

The HR dataset is a collection of employee data that includes information on various factors that may impact employee performance. To explore the employee performance factors using Python, we begin by importing the necessary libraries such as Pandas, NumPy, and Matplotlib, then load the HR dataset into a Pandas DataFrame and perform basic data cleaning and preprocessing steps such as handling missing values and checking for duplicates.
The dataset also use various data visualization to explore the relationships between different variables and employee performance. For example, scatterplots to examine the relationship between job satisfaction and performance ratings, or bar charts to compare the average performance ratings across different gender or positions.

Clear search

Close search

Google apps

Main menu

Human resources dataset

Compare Baseball Player Statistics using Visualiza

1. Load the Data

2. Explore the Data

3. Visualization

Example Code

Explanation of the Code

Conclusion

OpenOrca

Open-Orca Augmented FLAN Dataset

Unlocking Advanced Language Understanding and ML Model Performance

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Linux Terminal Commands Dataset

Load dataset

Convert to DataFrame

Example: View category distribution

Example: Filter Networking commands

Example: List SELinux file contexts

Insurance_claims

Load the Dataset File

Books Set in Bath

Books Set in Bath

A catalog of books set in the city of Bath

About this dataset

How to use the dataset

Dataset Overview

Columns

Getting Started

Example Code: Getting Basic Insights

Exploring the Data

1. Checking Unique Book Count:

2. Finding Books by a Specific Author:

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Adidas_Sales_Analysis

Coronavirus 2019-2020 latest dataset

Context

Content

Acknowledgements

Inspiration

Submarine Pipeline Locations in USACE Dataset

Submarine Pipeline Locations in USACE Dataset

Locations and details of submarine pipelines for oil or gas transport

About this dataset

How to use the dataset

Dataset Overview

Accessing and Analyzing Data

Human resources dataset

Discover patterns and insight into employee performance.