14 datasets found
  1. o

    Linkedin Data Scientist/Analyst jobs (Berlin 2024)

    • opendatabay.com
    .undefined
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Linkedin Data Scientist/Analyst jobs (Berlin 2024) [Dataset]. https://www.opendatabay.com/data/ai-ml/8b069e64-ff57-4d82-bd60-1fbef0b49c07
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Data Science and Analytics
    Description

    Dataset of 422 jobs from LinkedIn to analyse data job market with search terms ("data analyst", "data scientist" & "data engineer")

    Specifically interested in the application of NLP to extract in-demand tools in the market

    Columns

    'job_title' 'company_name' 'post_date' 'repost_date' 'email', 'number_of_employees' 'job_desc' 'num_applicants' 'job_type' 'job_level' 'job_remote' 'language' 'salary' 'sector' 'link', 'search_term' Please note: This is only an initial dataset, further uploads with more rows with different search terms will be made in the future. For suggests or requests please make a comment.

    License

    CC-BY-NC

    Original Data Source: Linkedin Data Scientist/Analyst jobs (Berlin 2024)

  2. BCG Data Science Simulation

    • kaggle.com
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PAVITR KUMAR SWAIN (2025). BCG Data Science Simulation [Dataset]. https://www.kaggle.com/datasets/pavitrkumar/bcg-data-science-simulation
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    PAVITR KUMAR SWAIN
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description
    ** Feature Engineering for Churn Prediction**

    🚀**# BCG Data Science Job Simulation | Forage** This notebook focuses on feature engineering techniques to enhance a dataset for churn prediction modeling. As part of the BCG Data Science Job Simulation, I transformed raw customer data into valuable features to improve predictive performance.

    📊 What’s Inside? ✅ Data Cleaning: Removing irrelevant columns to reduce noise ✅ Date-Based Feature Extraction: Converting raw dates into useful insights like activation year, contract length, and renewal month ✅ New Predictive Features:

    consumption_trend → Measures if a customer’s last-month usage is increasing or decreasing total_gas_and_elec → Aggregates total energy consumption ✅ Final Processed Dataset: Ready for churn prediction modeling

    📂Dataset Used: 📌 clean_data_after_eda.csv → Original dataset after Exploratory Data Analysis (EDA) 📌 clean_data_with_new_features.csv → Final dataset after feature engineering

    🛠 Technologies Used: 🔹 Python (Pandas, NumPy) 🔹 Data Preprocessing & Feature Engineering

    🌟 Why Feature Engineering? Feature engineering is one of the most critical steps in machine learning. Well-engineered features improve model accuracy and uncover deeper insights into customer behavior.

    🚀 This notebook is a great reference for anyone learning data preprocessing, feature selection, and predictive modeling in Data Science!

    📩 Connect with Me: 🔗 GitHub Repo: https://github.com/Pavitr-Swain/BCG-Data-Science-Job-Simulation 💼 LinkedIn: https://www.linkedin.com/in/pavitr-kumar-swain-ab708b227/

    🔍 Let’s explore churn prediction insights together! 🎯

  3. Advanced: Saudi Arabian Aramco Stocks Dataset 🐪

    • kaggle.com
    Updated May 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azhar Saleem (2024). Advanced: Saudi Arabian Aramco Stocks Dataset 🐪 [Dataset]. https://www.kaggle.com/datasets/azharsaleem/advanced-saudi-arabian-aramco-stocks-dataset/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Azhar Saleem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Saudi Arabia
    Description

    Saudi Arabian Oil Company Aramco, Stocks

    👨‍💻 Author: Azhar Saleem

    "https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
    "https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
    "https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

    Dataset Description

    Welcome to the Enhanced Saudi Arabian Oil Company (Aramco) Stock Dataset! This dataset has been meticulously prepared from Yahoo Finance and further enriched with several engineered features to elevate your data analysis, machine learning, and financial forecasting projects. It captures the daily trading figures of Aramco stocks, presented in Saudi Riyal (SAR), providing a robust foundation for comprehensive market analysis.

    Columns in the Dataset

    • Date: The trading day for the data recorded (ISO 8601 format).
    • Open: The price at which the stock first traded upon the opening of an exchange on a given trading day.
    • High: The highest price at which the stock traded during the trading day.
    • Low: The lowest price at which the stock traded during the trading day.
    • Close: The price at which the stock last traded upon the close of an exchange on a given trading day.
    • Volume: The total number of shares traded during the trading day.
    • Dividends: The dividend value paid out per share on the trading day.
    • Stock Splits: The number of stock splits occurring on the trading day.
    • Lag Features (Lag_Close, Lag_High, Lag_Low): Previous day's closing, highest, and lowest prices.
    • Rolling Window Statistics (e.g., Rolling_Mean_7, Rolling_Std_7): 7-day and 30-day moving averages and standard deviations of the Close price.
    • Technical Indicators (RSI, MACD, Bollinger Bands): Key metrics used in trading to analyze short-term price movements.
    • Change Features (Change_Close, Change_Volume): Day-over-day changes in Close price and trading volume.
    • Date-Time Features (Weekday, Month, Year, Quarter): Extracted components of the trading day.
    • Volume_Normalized: The standardized trading volume using z-score normalization to adjust for scale differences.

    Potential Uses

    This dataset is tailored for a wide array of applications:

    • Financial Analysis: Explore historical performance, volatility, and market trends.
    • Forecasting Models: Utilize features like lagged prices and rolling statistics to predict future stock prices.
    • Machine Learning: Develop regression models or classification frameworks to predict market movements.
    • Deep Learning: Leverage LSTM networks for more sophisticated time-series forecasting.
    • Time-Series Analysis: Dive deep into trend analysis, seasonality, and cyclical behavior of stock prices.

    Whether you are a data scientist, a financial analyst, or a hobbyist interested in the stock market, this dataset provides a rich playground for analysis and model building. Its comprehensive feature set allows for the development of robust predictive models and offers unique insights into one of the world’s most significant oil companies. Unlock the potential of financial data with this carefully crafted dataset.

  4. d

    TagX | 100000+ Job Postings data | Job listings data | Human Resource (HR)...

    • datarade.ai
    Updated Feb 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TagX (2023). TagX | 100000+ Job Postings data | Job listings data | Human Resource (HR) data | Linkedin, Indeed , Monster, Glassdoor | Global [Dataset]. https://datarade.ai/data-products/100000-job-description-dataset-extract-entities-analyze-tagx
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Feb 13, 2023
    Dataset authored and provided by
    TagX
    Area covered
    Japan, Egypt, Djibouti, Belize, India, Belarus, Czech Republic, Mongolia, Taiwan, Denmark
    Description

    TagX has a curated dataset of jobs available in the market which can be used for various applications like Machine learning, Artificial Intelligence, and Data Science.

    The dataset can be used to predict demand, forecast predictions, current market analysis, and historical data analysis can also be performed.

    Some of the job categories that can be found are :

    Management Occupations
    Business and Financial Operations Occupations
    Computer and Mathematical Occupations Architecture and Engineering Occupations Life, Physical, and Social Science Occupations Community and Social Service Occupations Legal Occupations Educational Instruction and Library Occupations Arts, Design, Entertainment, Sports, and Media Occupations Healthcare Practitioners and Technical Occupations Healthcare Support Occupations Protective Service Occupations Food Preparation and Serving Related Occupations Building and Grounds Cleaning and Maintenance Occupations Personal Care and Service Occupations Sales and Related Occupations Office and Administrative Support Occupations Farming, Fishing, and Forestry Occupations Construction and Extraction Occupations Installation, Maintenance, and Repair Occupations Production Occupations Transportation and Material Moving Occupations Military Specific Occupations

  5. P

    EDGE-IIOTSET Dataset

    • paperswithcode.com
    Updated Oct 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
    Explore at:
    Dataset updated
    Oct 16, 2023
    Description

    ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

    Instructions:

    Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

    Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

    Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

    The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

    Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

    Link to paper : https://ieeexplore.ieee.org/document/9751703

    The directories of the Edge-IIoTset dataset include the following:

    •File 1 (Normal traffic)

    -File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

    -File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

    -File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

    -File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

    -File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

    -File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

    -File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

    -File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

    -File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

    •File 2 (Attack traffic):

    -File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

    -File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

    •File 3 (Selected dataset for ML and DL):

    -File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

    -File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

    Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

    !pip install -q kaggle

    files.upload()

    !mkdir ~/.kaggle

    !cp kaggle.json ~/.kaggle/

    !chmod 600 ~/.kaggle/kaggle.json

    !kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

    !unzip DNN-EdgeIIoT-dataset.csv.zip

    !rm DNN-EdgeIIoT-dataset.csv.zip

    Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

    import numpy as np

    df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

    Step 3 : Exploring some of the DataFrame's contents: df.head(5)

    print(df['Attack_type'].value_counts())

    Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

    drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

     "http.file_data","http.request.full_uri","icmp.transmit_timestamp",
    
     "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
    
     "tcp.dstport", "udp.port", "mqtt.msg"]
    

    df.drop(drop_columns, axis=1, inplace=True)

    df.dropna(axis=0, how='any', inplace=True)

    df.drop_duplicates(subset=None, keep="first", inplace=True)

    df = shuffle(df)

    df.isna().sum()

    print(df['Attack_type'].value_counts())

    Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.preprocessing import StandardScaler

    from sklearn import preprocessing

    def encode_text_dummy(df, name):

    dummies = pd.get_dummies(df[name])

    for x in dummies.columns:

    dummy_name = f"{name}-{x}"
    
    df[dummy_name] = dummies[x]
    

    df.drop(name, axis=1, inplace=True)

    encode_text_dummy(df,'http.request.method')

    encode_text_dummy(df,'http.referer')

    encode_text_dummy(df,"http.request.version")

    encode_text_dummy(df,"dns.qry.name.len")

    encode_text_dummy(df,"mqtt.conack.flags")

    encode_text_dummy(df,"mqtt.protoname")

    encode_text_dummy(df,"mqtt.topic")

    Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

    For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

    More information about Dr. Mohamed Amine Ferrag is available at:

    https://www.linkedin.com/in/Mohamed-Amine-Ferrag

    https://dblp.uni-trier.de/pid/142/9937.html

    https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

    https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

    https://www.scopus.com/authid/detail.uri?authorId=56115001200

    https://publons.com/researcher/1322865/mohamed-amine-ferrag/

    https://orcid.org/0000-0002-0632-3172

    Last Updated: 27 Mar. 2023

  6. data-base-linkedin

    • kaggle.com
    Updated Aug 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaston Pereyra (2020). data-base-linkedin [Dataset]. https://www.kaggle.com/gastonpereyra/databaselinkedin/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gaston Pereyra
    Description

    Dataset

    This dataset was created by Gaston Pereyra

    Contents

  7. P

    Novel COVID-19 Chestxray Repository Dataset

    • paperswithcode.com
    Updated Sep 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratik Bhowal; Subhankar Sen; Jin Hee Yoon Zong Woo Geem; Ram Sarkar (2021). Novel COVID-19 Chestxray Repository Dataset [Dataset]. https://paperswithcode.com/dataset/novel-covid-19-chestxray-repository
    Explore at:
    Dataset updated
    Sep 8, 2021
    Authors
    Pratik Bhowal; Subhankar Sen; Jin Hee Yoon Zong Woo Geem; Ram Sarkar
    Description

    Authors of the Dataset:

    Pratik Bhowal (B.E., Dept of Electronics and Instrumentation Engineering, Jadavpur University Kolkata, India) [LinkedIn], [Github] Subhankar Sen (B.Tech, Dept of Computer Science Engineering, Manipal University Jaipur, India) [LinkedIn], [Github], [Google Scholar] Jin Hee Yoon (faculty of the Dept. of Mathematics and Statistics at Sejong University, Seoul, South Korea) [LinkedIn], [Google Scholar] Zong Woo Geem (faculty of College of IT Convergence at Gachon University, South Korea) [LinkedIn], [Google Scholar] Ram Sarkar( Professor at Dept. of Computer Science Engineering, Jadavpur Univeristy Kolkata, India) [LinkedIn], [Google Scholar]

    Overview The authors have created a new dataset known as Novel COVID-19 Chestxray Repository by the fusion of publicly available chest-xray image repositories. In creating this combined dataset, three different datasets obtained from the Github and Kaggle databases,created by the authors of other research studies in this field, were utilized.In our study,frontal and lateral chest X-ray images are used since this view of radiography is widely used by radiologist in clinical diagnosis.In the following section, authors have summarized how this dataset is created.

    COVID-19 Radiography Database: The first release of this dataset reports 219 COVID-19,1345 viral pneumonia and 1341 normal radiographic chest X-ray images. This dataset was created by a team of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh in collaboration with medical doctors and specialists from Pakistan and Malaysia.This database is regularly updated with the emergence of new cases of COVID-19 patients worldwide.Related Paper:https://arxiv.org/abs/2003.13145

    COVID-Chestxray set:Joseph Paul Cohen and Paul Morrison and Lan Dao have created a public image repository on Github which consists both CT scans and digital chest x-rays.The data was collected mainly from retrospective cohorts of pediatric patients from Guangzhou Women and Children’s medical center.With the aid of metadata information provided along with the dataset,we were able to extract 521 COVID-19 positive,239 viral and bacterial pneumonias;which are of the following three broad categories:Middle East Respiratory Syndrome (MERS),Severe Acute Respiratory Syndrome (SARS), and Acute Respiratory Distress syndrome (ARDS);and 218 normal radiographic chest X-ray images of varying image resolutions. Related Paper: https://arxiv.org/abs/2006.11988

    Actualmed COVID chestxray dataset:Actualmed-COVID-chestxray-dataset comprises of 12 COVID-19 positive and 80 normal radiographic chest x-ray images.

    The combined dataset includes chest X-ray images of COVID-19,Pneumonia and Normal (healthy) classes, with a total of 752, 1584, and 1639 images respectively. Information about the Novel COVID-19 Chestxray Database and its parent image repositories is provided in Table 1.

    Table 1: Dataset Description | Dataset| COVID-19 |Pneumonia | Normal | | ------------- | ------------- | ------------- | -------------| | COVID Chestxray set | 521 |239|218| | COVID-19 Radiography Database(first release) | 219 |1345|1341| | Actualmed COVID chestxray dataset| 12 |0|80| | Total|752|1584|1639|

    DATA ACCESS AND USE: Academic/Non-Commercial Use Dataset License : Database: Open Database, Contents: Database Contents

  8. F

    Software Development Job Postings on Indeed in the United States

    • fred.stlouisfed.org
    json
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Software Development Job Postings on Indeed in the United States [Dataset]. https://fred.stlouisfed.org/series/IHLIDXUSTPSOFTDEVE
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 15, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval

    Area covered
    United States
    Description

    Graph and download economic data for Software Development Job Postings on Indeed in the United States (IHLIDXUSTPSOFTDEVE) from 2020-02-01 to 2025-07-11 about software, jobs, and USA.

  9. FMCG Daily Sales Data (2022-2024)

    • kaggle.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beata Faron (2025). FMCG Daily Sales Data (2022-2024) [Dataset]. https://www.kaggle.com/datasets/beatafaron/fmcg-daily-sales-data-to-2022-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Kaggle
    Authors
    Beata Faron
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This synthetic dataset simulates daily-level FMCG sales transactions for three consecutive years (2022, 2023, 2024), designed for practicing time series forecasting, demand planning, and machine learning in realistic business conditions.

    Inspired by real-world scenarios (e.g. Nestlé, Unilever, P&G), it includes: - Product hierarchy: SKU → Brand → Segment → Category - Sales channels: Retail / Discount / E-commerce - Regions: Central, North, and South (Poland) - Daily sales quantities, prices, promotions, stock, delivery lag (lead time) - Pack types: Single / Multipack / Carton - Seasonality and product introductions: - New SKUs are introduced in 2024 only - Prices gradually increase over the years

    Possible Use Cases - Weekly sales forecasting - Promotion effect analysis - Seasonality and trend modeling - New product forecasting (cold start) - Feature engineering for ML models

    Created by: Beata Faron
    LinkedIn profile
    Data Scientist working on demand forecasting, NLP, and business-oriented ML.

    Image by iuriimotov on Freepik

  10. Singapore Covid19 DataSet

    • kaggle.com
    Updated Jun 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TSS (2020). Singapore Covid19 DataSet [Dataset]. https://www.kaggle.com/zactey/singapore-covid19-dataset-w-location-coordinates/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    TSS
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Singapore
    Description

    SingaporeCovidAnalysis

    View it on Heroku https://singaporecovid19.herokuapp.com/

    About Me:

    Hi, everyone nice to meet you. I am Zac, a mechanical working in Singapore. My aim is to become a Data Scientist in the future. I am planning to go for a Master in Science in Data Science in the future. I will post my practice projects in github from time to time.

    My Email: shin1803@hotmail.com My Linkedin: https://www.linkedin.com/in/zac-tey-005646136/

    Summary of Project

    This is a Data analysis with visual representation using Streamlit, hosted on Heroku. It's about the statistics of Covid-19 in Singapore, last updated on April 2020. I am aware of the uncompleteness of the data (such as large amounts of NaN). However this is the best that I could find at the moment. If you have a better and larger dataset, please feel free to share around. Thank You.

    Useful Links You Can Refer for this Project

    Streamlit Documentation: https://docs.streamlit.io/en/stable/ https://github.com/streamlit/streamlit

    Deploying to Heroku with Gits: https://devcenter.heroku.com/articles/git

    Nice Read that I found useful: https://towardsdatascience.com/streamlit-101-an-in-depth-introduction-fc8aad9492f2

  11. oyo-reviews-dataset

    • kaggle.com
    zip
    Updated Jun 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepkumar patel (2023). oyo-reviews-dataset [Dataset]. https://www.kaggle.com/datasets/deeppatel9095/oyo-reviews-dataset
    Explore at:
    zip(32300432 bytes)Available download formats
    Dataset updated
    Jun 24, 2023
    Authors
    Deepkumar patel
    Description

    The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.

    Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/

  12. LinkedIn Industry List

    • kaggle.com
    Updated Jan 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Remco Strijdonk (2025). LinkedIn Industry List [Dataset]. https://www.kaggle.com/datasets/remcostrijdonk/linkedin-industry-list/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Remco Strijdonk
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Remco Strijdonk

    Released under MIT

    Contents

    A list of LinkedIn industries in all languages

  13. linkedin bulk

    • kaggle.com
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lakshay Handa (2024). linkedin bulk [Dataset]. https://www.kaggle.com/datasets/lakshaykahai/linkedin-bulk/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Lakshay Handa
    Description

    Dataset

    This dataset was created by Lakshay Handa

    Contents

  14. linkedin

    • kaggle.com
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kuncha manjula (2024). linkedin [Dataset]. https://www.kaggle.com/kunchamanjula/linkedin/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    kuncha manjula
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by kuncha manjula

    Released under Apache 2.0

    Contents

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Datasimple (2025). Linkedin Data Scientist/Analyst jobs (Berlin 2024) [Dataset]. https://www.opendatabay.com/data/ai-ml/8b069e64-ff57-4d82-bd60-1fbef0b49c07

Linkedin Data Scientist/Analyst jobs (Berlin 2024)

Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description

Dataset of 422 jobs from LinkedIn to analyse data job market with search terms ("data analyst", "data scientist" & "data engineer")

Specifically interested in the application of NLP to extract in-demand tools in the market

Columns

'job_title' 'company_name' 'post_date' 'repost_date' 'email', 'number_of_employees' 'job_desc' 'num_applicants' 'job_type' 'job_level' 'job_remote' 'language' 'salary' 'sector' 'link', 'search_term' Please note: This is only an initial dataset, further uploads with more rows with different search terms will be made in the future. For suggests or requests please make a comment.

License

CC-BY-NC

Original Data Source: Linkedin Data Scientist/Analyst jobs (Berlin 2024)

Search
Clear search
Close search
Google apps
Main menu