9 datasets found
  1. Human resources dataset

    • kaggle.com
    zip
    Updated Mar 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khanh Nguyen (2023). Human resources dataset [Dataset]. https://www.kaggle.com/datasets/khanhtang/human-resources-dataset
    Explore at:
    zip(17041 bytes)Available download formats
    Dataset updated
    Mar 15, 2023
    Authors
    Khanh Nguyen
    Description
    • The HR dataset is a collection of employee data that includes information on various factors that may impact employee performance. To explore the employee performance factors using Python, we begin by importing the necessary libraries such as Pandas, NumPy, and Matplotlib, then load the HR dataset into a Pandas DataFrame and perform basic data cleaning and preprocessing steps such as handling missing values and checking for duplicates.

    • The dataset also use various data visualization to explore the relationships between different variables and employee performance. For example, scatterplots to examine the relationship between job satisfaction and performance ratings, or bar charts to compare the average performance ratings across different gender or positions.

  2. Compare Baseball Player Statistics using Visualiza

    • kaggle.com
    zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
    Explore at:
    zip(1030978 bytes)Available download formats
    Dataset updated
    Sep 28, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

    1. Load the Data

    First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

    2. Explore the Data

    Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

    3. Visualization

    We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

    Example Code

    Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Load the data
    df = pd.read_csv('judge.csv')
    
    # Display the first few rows of the dataframe
    print(df.head())
    
    # Set the style of seaborn
    sns.set(style="whitegrid")
    
    # 1. Average Release Speed by Pitch Type
    plt.figure(figsize=(12, 6))
    avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values()
    sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis")
    plt.title('Average Release Speed by Pitch Type')
    plt.xlabel('Average Release Speed (mph)')
    plt.ylabel('Pitch Type')
    plt.show()
    
    # 2. Trends in Release Speed Over Time
    # First, convert the 'game_date' to datetime
    df['game_date'] = pd.to_datetime(df['game_date'])
    
    plt.figure(figsize=(14, 7))
    sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None)
    plt.title('Trends in Release Speed Over Time')
    plt.xlabel('Game Date')
    plt.ylabel('Average Release Speed (mph)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    # 3. Scatter Plot of Release Speed vs. Events
    plt.figure(figsize=(12, 6))
    sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7)
    plt.title('Release Speed vs. Events')
    plt.xlabel('Release Speed (mph)')
    plt.ylabel('Event Type')
    plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.show()
    

    Explanation of the Code

    • Data Loading: The CSV file is loaded into a Pandas DataFrame.
    • Average Release Speed: A bar chart shows the average release speed for each pitch type.
    • Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.
    • Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

    Conclusion

    These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!

  3. OpenOrca

    • kaggle.com
    • opendatalab.com
    • +1more
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). OpenOrca [Dataset]. https://www.kaggle.com/datasets/thedevastator/open-orca-augmented-flan-dataset/versions/2
    Explore at:
    zip(2548102631 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Open-Orca Augmented FLAN Dataset

    Unlocking Advanced Language Understanding and ML Model Performance

    By Huggingface Hub [source]

    About this dataset

    The Open-Orca Augmented FLAN Collection is a revolutionary dataset that unlocks new levels of language understanding and machine learning model performance. This dataset was created to support research on natural language processing, machine learning models, and language understanding through leveraging the power of reasoning trace-enhancement techniques. By enabling models to understand complex relationships between words, phrases, and even entire sentences in a more robust way than ever before, this dataset provides researchers expanded opportunities for furthering the progress of linguistics research. With its unique combination of features including system prompts, questions from users and responses from systems, this dataset opens up exciting possibilities for deeper exploration into the cutting edge concepts underlying advanced linguistics applications. Experience a new level of accuracy and performance - explore Open-Orca Augmented FLAN Collection today!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This guide provides an introduction to the Open-Orca Augmented FLAN Collection dataset and outlines how researchers can utilize it for their language understanding and natural language processing (NLP) work. The Open-Orca dataset includes system prompts, questions posed by users, and responses from the system.

    Getting Started The first step is to download the data set from Kaggle at https://www.kaggle.com/openai/open-orca-augmented-flan and save it in a project directory of your choice on your computer or cloud storage space. Once you have downloaded the data set, launch your ‘Jupyter Notebook’ or ‘Google Colab’ program with which you want to work with this data set.

    Exploring & Preprocessing Data: To get a better understanding of the features in this dataset, import them into Pandas DataFrame as shown below. You can use other libraries as per your need:

    import pandas as pd   # Library used for importing datasets into Python 
    
    df = pd.read_csv('train.csv') #Imports train csv file into Pandas};#DataFrame 
    
    df[['system_prompt','question','response']].head() #Views top 5 rows with columns 'system_prompt','question','response'
    

    After importing check each feature using basic descriptive statistics such Pandas groupby statement: We can use groupby statements to have greater clarity over the variables present in each feature(elements). The below command will show counts of each element in System Prompt column present under train CVS file :

     df['system prompt'].value_counts().head()#shows count of each element present under 'System Prompt'column
     Output: User says hello guys 587 <br>System asks How are you?: 555 times<br>User says I am doing good: 487 times <br>..and so on   
    

    Data Transformation: After inspecting & exploring different features one may want/need certain changes that best suits their needs from this dataset before training modeling algorithms on it.
    Common transformation steps include : Removing punctuation marks : Since punctuation marks may not add any value to computation operations , we can remove them using regex functions write .replace('[^A-Za -z]+','' ) as

    Research Ideas

    • Automated Question Answering: Leverage the dataset to train and develop question answering models that can provide tailored answers to specific user queries while retaining language understanding abilities.
    • Natural Language Understanding: Use the dataset as an exploratory tool for fine-tuning natural language processing applications, such as sentiment analysis, document categorization, parts-of-speech tagging and more.
    • Machine Learning Optimizations: The dataset can be used to build highly customized machine learning pipelines that allow users to harness the power of conditioning data with pre-existing rules or models for improved accuracy and performance in automated tasks

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. [See Other Information](ht...

  4. Linux Terminal Commands Dataset

    • kaggle.com
    zip
    Updated May 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SUNNY THAKUR (2025). Linux Terminal Commands Dataset [Dataset]. https://www.kaggle.com/datasets/cyberprince/linux-terminal-commands-dataset
    Explore at:
    zip(32599 bytes)Available download formats
    Dataset updated
    May 21, 2025
    Authors
    SUNNY THAKUR
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Linux Terminal Commands Dataset Overview The Linux Terminal Commands Dataset is a comprehensive collection of 600 unique Linux terminal commands (cmd-001 to cmd-600), curated for cybersecurity professionals, system administrators, data scientists, and machine learning engineers. This dataset is designed to support advanced use cases such as penetration testing, system administration, forensic analysis, and training machine learning models for command-line automation and anomaly detection. The commands span 10 categories: Navigation, File Management, Viewing, System Info, Permissions, Package Management, Networking, User Management, Process, and Editor. Each entry includes a command, its category, a description, an example output, and a reference to the relevant manual page, ensuring usability for both human users and automated systems. Key Features

    Uniqueness: 600 distinct commands with no overlap, covering basic to unconventional tools. Sophistication: Includes advanced commands for SELinux, eBPF tracing, network forensics, and filesystem debugging. Unconventional Tools: Features obscure utilities like bpftrace, tcpflow, zstd, and aa-status for red teaming and system tinkering. ML-Ready: Structured in JSON Lines (.jsonl) format for easy parsing and integration into machine learning pipelines. Professional Focus: Tailored for cybersecurity (e.g., auditing, hardening), system administration (e.g., performance tuning), and data science (e.g., log analysis).

    Dataset Structure The dataset is stored in a JSON Lines file (linux_terminal_commands_dataset.jsonl), where each line represents a single command with the following fields:

    Field Description

    id Unique identifier (e.g., cmd-001 to cmd-600).

    command The Linux terminal command (e.g., setfacl -m u:user:rw file.txt).

    category One of 10 categories (e.g., Permissions, Networking).

    description A concise explanation of the command's purpose and functionality.

    example_output Sample output or expected behavior (e.g., [No output if successful]).

    man_reference URL to the official manual page (e.g., https://man7.org/linux/man-pages/...).

    Category Distribution

    Category Count

    Navigation 11

    File Management 56

    Viewing 35

    System Info 51

    Permissions 28

    Package Management 12

    Networking 56

    User Management 19

    Process 42

    Editor 10

    Usage Prerequisites

    Python 3.6+: For parsing and analyzing the dataset. Linux Environment: Most commands require a Linux system (e.g., Ubuntu, CentOS, Fedora) for execution. Optional Tools: Install tools like pandas for data analysis or jq for JSON processing.

    Loading the Dataset ```python Use Python to load and explore the dataset: import json import pandas as pd

    Load dataset

    dataset = [] with open("linux_terminal_commands_dataset.jsonl", "r") as file: for line in file: dataset.append(json.loads(line))

    Convert to DataFrame

    df = pd.DataFrame(dataset)

    Example: View category distribution

    print(df.groupby("category").size())

    Example: Filter Networking commands

    networking_cmds = df[df["category"] == "Networking"] print(networking_cmds[["id", "command", "description"]]) ```

    Example Applications

    Cybersecurity: Use bpftrace or tcpdump commands for real-time system and network monitoring. Audit permissions with setfacl, chcon, or aa-status for system hardening.

    System Administration: Monitor performance with slabtop, pidstat, or systemd-analyze. Manage filesystems with btrfs, xfs_repair, or cryptsetup.

    Machine Learning: Train NLP models to predict command categories or generate command sequences. Use example outputs for anomaly detection in system logs.

    Pentesting: Leverage nping, tcpflow, or ngrep for network reconnaissance. Explore find / -perm /u+s to identify potential privilege escalation vectors.

    Executing Commands Warning: Some commands (e.g., mkfs.btrfs, fuser -k, cryptsetup) can modify or destroy data. Always test in a sandboxed environment. To execute a command:

    Example: List SELinux file contexts

    semanage fcontext -l

    Installation

    Clone the repository:git clone https://github.com/sunnythakur25/linux-terminal-commands-dataset.git cd linux-terminal-commands-dataset

    Ensure the dataset file (linux_terminal_commands_dataset.jsonl) is in the project directory. Install dependencies for analysis (optional):pip install pandas

    Contribution Guidelines We welcome contributions to expand the dataset or improve its documentation. To contribute:

    Fork the Repository: Create a fork on GitHub. Add Commands: Ensure new commands are unique, unconventional, and include all required fields (id, command, category, etc.). Test Commands: Verify commands work on a Linux system and provide accurate example outputs. Submit a Pull Request: Include a clear description of your changes and their purpose. Follow Standards: Use JSON Lines format. Reference man7.org for manual pages. Categorize c...

  5. Insurance_claims

    • kaggle.com
    • data.mendeley.com
    zip
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miannotti (2025). Insurance_claims [Dataset]. https://www.kaggle.com/datasets/mian91218/insurance-claims
    Explore at:
    zip(68984 bytes)Available download formats
    Dataset updated
    Oct 19, 2025
    Authors
    Miannotti
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    AQQAD, ABDELRAHIM (2023), “insurance_claims ”, Mendeley Data, V2, doi: 10.17632/992mh7dk9y.2

    https://data.mendeley.com/datasets/992mh7dk9y/2

    Latest version Version 2 Published: 22 Aug 2023 DOI: 10.17632/992mh7dk9y.2

    Data Acquisition: - Obtain the dataset titled "Insurance_claims" from the following Mendeley repository: https://https://data.mendeley.com/drafts/992mh7dk9y - Download and store the dataset locally for easy access during subsequent steps.

    Data Loading & Initial Exploration: - Use Python's Pandas library to load the dataset into a DataFrame. python Code used:

    Load the Dataset File

    insurance_df = pd.read_csv('insurance_claims.csv')

    • Inspect the initial rows, data types, and summary statistics to get an understanding of the dataset's structure.

    Data Cleaning & Pre-processing: - Handle missing values, if any. Strategies may include imputation or deletion based on the nature of the missing data. - Identify and handle outliers. In this research, particularly, outliers in the 'umbrella_limit' column were addressed. - Normalize or standardize features if necessary.

    Exploratory Data Analysis (EDA): - Utilize visualization libraries such as Matplotlib and Seaborn in Python for graphical exploration. - Examine distributions, correlations, and patterns in the data, especially between features and the target variable 'fraud_reported'. - Identify features that exhibit distinct patterns for fraudulent and non-fraudulent claims.

    Feature Engineering & Selection: - Create or transform existing features to improve model performance. - Use techniques like Recursive Feature Elimination (RFECV) to identify and retain only the most informative features.

    Modeling: - Split the dataset into training and test sets to ensure the model's generalizability. - Implement machine learning algorithms such as Support Vector Machine, RandomForest, and Voting Classifier using libraries like Scikit-learn. - Handle class imbalance issues using methods like Synthetic Minority Over-sampling Technique (SMOTE).

    Model Evaluation: - Evaluate the performance of each model using metrics like precision, recall, F1-score, ROC-AUC score, and confusion matrix. - Fine-tune the models based on the results. Hyperparameter tuning can be performed using techniques like Grid Search or Random Search.

    Model Interpretation: - Use methods like SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) to interpret and understand the predictions made by the model.

    Deployment & Prediction: - Utilize the best-performing model to make predictions on unseen data. - If the intention is to deploy the model in a real-world scenario, convert the trained model into a format suitable for deployment (e.g., using libraries like joblib or pickle).

    Software & Tools: - Programming Language: Python (version: GoogleColab) - Libraries: Pandas, Numpy, Matplotlib, Seaborn, Scikit-learn, Imbalanced-learn, LIME, and SHAP. - Environment: Jupyter Notebook or any Python IDE.

  6. Books Set in Bath

    • kaggle.com
    zip
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Books Set in Bath [Dataset]. https://www.kaggle.com/datasets/thedevastator/books-set-in-bath/code
    Explore at:
    zip(7510 bytes)Available download formats
    Dataset updated
    Dec 6, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Books Set in Bath

    A catalog of books set in the city of Bath

    By Leigh Dodds [source]

    About this dataset

    The dataset offers insights into various literary works that take place in Bath, providing an opportunity for readers and researchers to explore the rich connections between literature and this historical city. Whether you are interested in local stories or looking for inspiration for your next visit to Bath, this dataset serves as a useful resource.

    Each entry includes detailed information such as the unique identifier assigned by LibraryThing (URI), which allows users to access further metadata and book covers using LibraryThing's APIs. Additionally, if available, ISBNs are provided for easy identification of specific editions or versions of each book.

    With columns formatted consistently as uri,**uri,title,**title,isbn,**isbn,and author,**author,the dataset ensures clarity and enables efficient data analysis.

    How to use the dataset

    Dataset Overview

    Columns

    This dataset consists of eight columns that provide important details about each book:

    • uri: The unique identifier for each book in the LibraryThing database.
    • title: The title of the book.
    • isbn: The International Standard Book Number (ISBN) for the book if known.
    • author: The author of the book.

    Getting Started

    Before diving into analyzing or exploring this dataset, it's important to understand its structure and familiarize yourself with its columns and values.

    To get started:

    • Load/import it into your preferred data analysis tool or programming language (e.g., Python pandas library).
    • Follow along with code examples provided below for common tasks using pandas library.

    Example Code: Getting Basic Insights

    import pandas as pd
    
    # Load CSV file into pandas DataFrame
    data = pd.read_csv('Library_Thing_Books_Set_in_Bath.csv')
    
    # Print basic insights about columns and values
    print(Number of rows:, data.shape[0])
    print(Number of columns:, data.shape[1])
    print(
    Column names:, list(data.columns))
    print(
    Sample data:)
    print(data.head())
    

    Exploring the Data

    Once you have loaded the dataset into your preferred tool, you can begin exploring and analyzing its contents. Here are a few common tasks to get you started:

    1. Checking Unique Book Count:

    unique_books = data['title'].nunique()
    print(Number of unique books:, unique_books)
    

    2. Finding Books by a Specific Author:

    author_name = Jane Austen
    books_by_author = data[data['author'] == author
    

    Research Ideas

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: Library_Thing_Books_Set_in_Bath.csv | Column name | Description | |:--------------|:-----------------------------------------------------------------------------------------------------------------------| | uri | The unique identifier for each book in the dataset. (String) | | title | The title of the book. (String) | | isbn | The International Standard Book Number (ISBN) for the book, which is a unique identifier for published books. (String) | | author | The author of the book. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Leigh Dodds.

  7. Adidas_Sales_Analysis

    • kaggle.com
    zip
    Updated Mar 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archis Rudra (2023). Adidas_Sales_Analysis [Dataset]. https://www.kaggle.com/datasets/archisrudra/adidas-sales-analysis/versions/1
    Explore at:
    zip(1863030 bytes)Available download formats
    Dataset updated
    Mar 11, 2023
    Authors
    Archis Rudra
    Description

    Portfolio_Adidas_Dataset A set of real-world dataset tasks is completed by using the Python Pandas and Matplotlib libraries.

    Background Information: In this portfolio, we use Python Pandas & Python Matplotlib to analyze and answer business questions about 5 products worth of sales data. The data contains hundreds of thousands of footwear store purchases broken down by product type, cost, region, state, city, and so on.

    We start by cleaning our data. Tasks during this section include:

    1. Drop NaN values from DataFrame
    2. Removing column based on a condition
    3. Changing the column name
    4. Removing rows based on a condition
    5. Reindexing rows based on a condition
    6. Adding Month and Year column (to_datetime)
    7. Conversion of data types from string to integer (to_numeric)

    Once we have cleaned up our data a bit, we move to the data exploration section. In this section we explore 5 high-level business questions related to our data:

    1. What was the highest number of sales in which year?
    2. What product sold the most? Why do you think it sold the most?
    3. What was the average price for each product? And the overall average price of all products?
    4. What was the best retailer for sales? How much was earned that retailers?
    5. What method is most efficient for sales?

    To answer these questions we walk through many different openpyxl, pandas, and matplotlib methods. They include:

    1. Using groupby to perform aggregate analysis
    2. Plotting bar charts, lines graphs, and pie charts to visualize our results
    3. Labeling our graphs
  8. Coronavirus 2019-2020 latest dataset

    • kaggle.com
    zip
    Updated Apr 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanu N Prabhu (2020). Coronavirus 2019-2020 latest dataset [Dataset]. https://www.kaggle.com/tanuprabhu/coronavirus-20192020-latest-dataset
    Explore at:
    zip(866 bytes)Available download formats
    Dataset updated
    Apr 12, 2020
    Authors
    Tanu N Prabhu
    Description

    Context

    I always wanted to access a data set that was related to the coronavirus (Country wise). But I could not find a properly documented data set. Rather, I just created one manually thinking this dataset would be really helpful for others.

    Content

    Now I knew I wanted to create a dataset but I did not know how to do so. So, I started to search for the content (Coronavirus) country-wise cases on the internet. Obviously, Wikipedia was my first search. But I don't know why the results were not acceptable. The results were not satisfactory. So then I surfed the internet for quite some time until then I stumbled upon a great website. I think you probably have heard about this. The name of the website is Worldometer. This is exactly the website I was looking for. This website had more details than Wikipedia. Also, this website had more rows I mean more countries with more details about cases.

    Once I got the data, now my next hard task was to download it. Of course, I could not get the raw form of data. I did not mail them regarding the data. Now I learned a new skill which is very important for a data scientist. I read somewhere that to obtain the data from websites you need to use this technique. Any guesses, keep reading you will come to know in the next paragraph.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2F929b6e449f4d4962299445bc9cf9e7f2%2Fdo-web-scraping-and-data-mining-with-python.jfif?generation=1585172688729088&alt=media" alt="">

    You are right its, Web Scraping. Now I learned this so that I could convert the data into a CSV format. Now I will give you the scraper code that I wrote and also I somehow found a way to directly convert the pandas data frame to a CSV(Comma-separated fo format) and store it on my computer. Now just go through my code and you will know what I'm talking about.

    Below is the code that I used to scrape the code from the website

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3200273%2F20da1f48036897a048a72e94f982acb8%2FCapture.PNG?generation=1585172815269902&alt=media" alt="">

    Acknowledgements

    Now I couldn't have got the data without Worldometer. So special thanks to the website. It is because of them I was able to get the data. This data was scraped on 25th March at 3:45 PM. I will try to update the data every day.

    Inspiration

    As far as I know, I don't have any questions to ask. You guys can let me know by finding your ways to use the data and let me know via kernel if you find something interesting

  9. Submarine Pipeline Locations in USACE Dataset

    • kaggle.com
    zip
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Submarine Pipeline Locations in USACE Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/submarine-pipeline-locations-in-usace-dataset
    Explore at:
    zip(17924 bytes)Available download formats
    Dataset updated
    Dec 8, 2023
    Authors
    The Devastator
    Description

    Submarine Pipeline Locations in USACE Dataset

    Locations and details of submarine pipelines for oil or gas transport

    By Homeland Infrastructure Foundation [source]

    About this dataset

    The Submarine Pipeline Lines in the USACE IENC dataset provides comprehensive information about the locations and characteristics of submarine pipelines used for transporting oil or gas. These submarine or land pipelines are composed of interconnected pipes that are either laid on or buried beneath the seabed or land surfaces.

    This dataset is a part of the Inland Electronic Navigational Charts (IENCs) and has been derived from reliable data sources utilized for maintaining navigation channels. It serves as a valuable resource for researchers, analysts, and policymakers interested in studying and monitoring the infrastructure related to oil and gas transportation.

    For each submarine pipeline, this dataset includes various attributes such as its category type, product being transported (e.g., oil or gas), unique name or identifier, current status (active or decommissioned), additional information about its purpose or characteristics, minimum scale at which it can be visible on a map, length in meters, source of data used to create the dataset, and details regarding who provided the source data.

    The Category_o column categorizes each pipeline based on its type, providing insights into different classifications within this infrastructure sector. Similarly,the Product column specifies whether it carries oil or gas through these pipelines.

    Moreover,this dataset's Object_Nam field contains distinct names assigned to each submarine pipeline within the USACE IENC database. These names facilitate easy identification and reference when studying specific sections of this extensive network.

    The Status attribute indicates whether a particular pipeline is currently active for transport purposes or has been decommissioned. This distinction holds significance for analyzing operational capacity and overall functionality.

    Informatio presents additional details that further enhance our understanding of specific aspects related to these submarine pipelines such as their construction methods,purpose,functionality,and maintenance requirements.

    Scale_Mini denotes the minimum scale at which each individual pipeline can be visualized accurately on a map,enabling users to effectively browse different levels of detail based on their requirements.

    Finally,the Shape_Leng attribute provides the length of each submarine pipeline in meters, which is helpful for assessing distances, evaluating potential risks or vulnerabilities, and estimating transportation efficiency.

    It is important to note that this dataset's information has been sourced from the USACE IENC dataset, ensuring its reliability and relevance to navigation channels. By leveraging this comprehensive collection of submarine pipeline data, stakeholders can gain valuable insights into the infrastructure supporting oil and gas transportation systems

    How to use the dataset

    Dataset Overview

    The dataset contains several columns with information about each submarine pipeline. Here is an overview of each column:

    • Category_o: The category or type of the submarine pipeline.
    • Product: The product being transported through the submarine pipeline, such as oil or gas.
    • Object_Nam: The name or identifier of the submarine pipeline.
    • Status: The current status of the submarine pipeline, such as active or decommissioned.
    • Informatio: Additional information or details about the submarine pipeline.
    • Scale_Mini: The minimum scale at which the submarine pipeline is visible on a map.
    • Source_Dat: The source of data used to create this dataset.
    • Source_Ind: The individual or organization that provided the source data.
    • Source_D_1: Additional source information or details about this specific data.
    • Shape_Leng: The length of the submarine pipeline in meters.

    Accessing and Analyzing Data

    To access and start analyzing this dataset, you can follow these steps:

    • Download: First, download the Submarine Pipeline Lines_USACE_IENC.csv file from its source.

    • The downloaded file should be saved in your project directory.

    • Open CSV File: Open your preferred programming environment (e.g., Python with Pandas) and read/load this CSV file into a dataframe.

    • Data Exploration: Explore the dataset by examining its columns, rows, and general structure. Use pandas functions like head(), info(), or `descr...

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Khanh Nguyen (2023). Human resources dataset [Dataset]. https://www.kaggle.com/datasets/khanhtang/human-resources-dataset
Organization logo

Human resources dataset

Discover patterns and insight into employee performance.

Explore at:
zip(17041 bytes)Available download formats
Dataset updated
Mar 15, 2023
Authors
Khanh Nguyen
Description
  • The HR dataset is a collection of employee data that includes information on various factors that may impact employee performance. To explore the employee performance factors using Python, we begin by importing the necessary libraries such as Pandas, NumPy, and Matplotlib, then load the HR dataset into a Pandas DataFrame and perform basic data cleaning and preprocessing steps such as handling missing values and checking for duplicates.

  • The dataset also use various data visualization to explore the relationships between different variables and employee performance. For example, scatterplots to examine the relationship between job satisfaction and performance ratings, or bar charts to compare the average performance ratings across different gender or positions.

Search
Clear search
Close search
Google apps
Main menu