14 datasets found

o
Linkedin Data Scientist/Analyst jobs (Berlin 2024)
opendatabay.com
.undefined
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Linkedin Data Scientist/Analyst jobs (Berlin 2024) [Dataset]. https://www.opendatabay.com/data/ai-ml/8b069e64-ff57-4d82-bd60-1fbef0b49c07
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 26, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description
Dataset of 422 jobs from LinkedIn to analyse data job market with search terms ("data analyst", "data scientist" & "data engineer")

Specifically interested in the application of NLP to extract in-demand tools in the market

Columns

'job_title' 'company_name' 'post_date' 'repost_date' 'email', 'number_of_employees' 'job_desc' 'num_applicants' 'job_type' 'job_level' 'job_remote' 'language' 'salary' 'sector' 'link', 'search_term' Please note: This is only an initial dataset, further uploads with more rows with different search terms will be made in the future. For suggests or requests please make a comment.

License

CC-BY-NC

Original Data Source: Linkedin Data Scientist/Analyst jobs (Berlin 2024)
BCG Data Science Simulation
kaggle.com
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PAVITR KUMAR SWAIN (2025). BCG Data Science Simulation [Dataset]. https://www.kaggle.com/datasets/pavitrkumar/bcg-data-science-simulation
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PAVITR KUMAR SWAIN
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
** Feature Engineering for Churn Prediction**

🚀**# BCG Data Science Job Simulation | Forage** This notebook focuses on feature engineering techniques to enhance a dataset for churn prediction modeling. As part of the BCG Data Science Job Simulation, I transformed raw customer data into valuable features to improve predictive performance.

📊 What’s Inside? ✅ Data Cleaning: Removing irrelevant columns to reduce noise ✅ Date-Based Feature Extraction: Converting raw dates into useful insights like activation year, contract length, and renewal month ✅ New Predictive Features:

consumption_trend → Measures if a customer’s last-month usage is increasing or decreasing total_gas_and_elec → Aggregates total energy consumption ✅ Final Processed Dataset: Ready for churn prediction modeling

📂Dataset Used: 📌 clean_data_after_eda.csv → Original dataset after Exploratory Data Analysis (EDA) 📌 clean_data_with_new_features.csv → Final dataset after feature engineering

🛠 Technologies Used: 🔹 Python (Pandas, NumPy) 🔹 Data Preprocessing & Feature Engineering

🌟 Why Feature Engineering? Feature engineering is one of the most critical steps in machine learning. Well-engineered features improve model accuracy and uncover deeper insights into customer behavior.

🚀 This notebook is a great reference for anyone learning data preprocessing, feature selection, and predictive modeling in Data Science!

📩 Connect with Me: 🔗 GitHub Repo: https://github.com/Pavitr-Swain/BCG-Data-Science-Job-Simulation 💼 LinkedIn: https://www.linkedin.com/in/pavitr-kumar-swain-ab708b227/

🔍 Let’s explore churn prediction insights together! 🎯
Advanced: Saudi Arabian Aramco Stocks Dataset 🐪
kaggle.com
Updated May 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azhar Saleem (2024). Advanced: Saudi Arabian Aramco Stocks Dataset 🐪 [Dataset]. https://www.kaggle.com/datasets/azharsaleem/advanced-saudi-arabian-aramco-stocks-dataset/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Azhar Saleem
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Saudi Arabia
Description
Saudi Arabian Oil Company Aramco, Stocks

👨‍💻 Author: Azhar Saleem

"https://github.com/azharsaleem18" target="_blank"> https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github" alt="GitHub Profile"> "https://www.kaggle.com/azharsaleem" target="_blank"> https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle" alt="Kaggle Profile"> "https://www.linkedin.com/in/azhar-saleem/" target="_blank"> https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin" alt="LinkedIn Profile">
"https://www.youtube.com/@AzharSaleem19" target="_blank"> https://img.shields.io/badge/YouTube-Profile-red?style=for-the-badge&logo=youtube" alt="YouTube Profile"> "https://www.facebook.com/azhar.saleem1472/" target="_blank"> https://img.shields.io/badge/Facebook-Profile-blue?style=for-the-badge&logo=facebook" alt="Facebook Profile"> "https://www.tiktok.com/@azhar_saleem18" target="_blank"> https://img.shields.io/badge/TikTok-Profile-blue?style=for-the-badge&logo=tiktok" alt="TikTok Profile">
"https://twitter.com/azhar_saleem18" target="_blank"> https://img.shields.io/badge/Twitter-Profile-blue?style=for-the-badge&logo=twitter" alt="Twitter Profile"> "https://www.instagram.com/azhar_saleem18/" target="_blank"> https://img.shields.io/badge/Instagram-Profile-blue?style=for-the-badge&logo=instagram" alt="Instagram Profile"> "mailto:azharsaleem6@gmail.com"> https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=gmail" alt="Email Contact">

Dataset Description

Welcome to the Enhanced Saudi Arabian Oil Company (Aramco) Stock Dataset! This dataset has been meticulously prepared from Yahoo Finance and further enriched with several engineered features to elevate your data analysis, machine learning, and financial forecasting projects. It captures the daily trading figures of Aramco stocks, presented in Saudi Riyal (SAR), providing a robust foundation for comprehensive market analysis.

Columns in the Dataset

Date: The trading day for the data recorded (ISO 8601 format).

Open: The price at which the stock first traded upon the opening of an exchange on a given trading day.

High: The highest price at which the stock traded during the trading day.

Low: The lowest price at which the stock traded during the trading day.

Close: The price at which the stock last traded upon the close of an exchange on a given trading day.

Volume: The total number of shares traded during the trading day.

Dividends: The dividend value paid out per share on the trading day.

Stock Splits: The number of stock splits occurring on the trading day.

Lag Features (Lag_Close, Lag_High, Lag_Low): Previous day's closing, highest, and lowest prices.

Rolling Window Statistics (e.g., Rolling_Mean_7, Rolling_Std_7): 7-day and 30-day moving averages and standard deviations of the Close price.

Technical Indicators (RSI, MACD, Bollinger Bands): Key metrics used in trading to analyze short-term price movements.

Change Features (Change_Close, Change_Volume): Day-over-day changes in Close price and trading volume.

Date-Time Features (Weekday, Month, Year, Quarter): Extracted components of the trading day.

Volume_Normalized: The standardized trading volume using z-score normalization to adjust for scale differences.

Potential Uses

This dataset is tailored for a wide array of applications:

Financial Analysis: Explore historical performance, volatility, and market trends.

Forecasting Models: Utilize features like lagged prices and rolling statistics to predict future stock prices.

Machine Learning: Develop regression models or classification frameworks to predict market movements.

Deep Learning: Leverage LSTM networks for more sophisticated time-series forecasting.

Time-Series Analysis: Dive deep into trend analysis, seasonality, and cyclical behavior of stock prices.

Whether you are a data scientist, a financial analyst, or a hobbyist interested in the stock market, this dataset provides a rich playground for analysis and model building. Its comprehensive feature set allows for the development of robust predictive models and offers unique insights into one of the world’s most significant oil companies. Unlock the potential of financial data with this carefully crafted dataset.
d
TagX | 100000+ Job Postings data | Job listings data | Human Resource (HR)...
datarade.ai
Updated Feb 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TagX (2023). TagX | 100000+ Job Postings data | Job listings data | Human Resource (HR) data | Linkedin, Indeed , Monster, Glassdoor | Global [Dataset]. https://datarade.ai/data-products/100000-job-description-dataset-extract-entities-analyze-tagx
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 13, 2023
Dataset authored and provided by
TagX
Area covered
Japan, Egypt, Djibouti, Belize, India, Belarus, Czech Republic, Mongolia, Taiwan, Denmark
Description
TagX has a curated dataset of jobs available in the market which can be used for various applications like Machine learning, Artificial Intelligence, and Data Science.

The dataset can be used to predict demand, forecast predictions, current market analysis, and historical data analysis can also be performed.

Some of the job categories that can be found are :

Management Occupations
Business and Financial Operations Occupations
Computer and Mathematical Occupations Architecture and Engineering Occupations Life, Physical, and Social Science Occupations Community and Social Service Occupations Legal Occupations Educational Instruction and Library Occupations Arts, Design, Entertainment, Sports, and Media Occupations Healthcare Practitioners and Technical Occupations Healthcare Support Occupations Protective Service Occupations Food Preparation and Serving Related Occupations Building and Grounds Cleaning and Maintenance Occupations Personal Care and Service Occupations Sales and Related Occupations Office and Administrative Support Occupations Farming, Fishing, and Forestry Occupations Construction and Extraction Occupations Installation, Maintenance, and Repair Occupations Production Occupations Transportation and Material Moving Occupations Military Specific Occupations
P
EDGE-IIOTSET Dataset
paperswithcode.com
Updated Oct 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). EDGE-IIOTSET Dataset [Dataset]. https://paperswithcode.com/dataset/edge-iiotset
Explore at:
Dataset updated
Oct 16, 2023
Description
ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.

Instructions:

Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.

Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...

Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.

The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:

Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809

Link to paper : https://ieeexplore.ieee.org/document/9751703

The directories of the Edge-IIoTset dataset include the following:

•File 1 (Normal traffic)

-File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.

-File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.

-File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.

-File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.

-File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.

-File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.

-File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.

-File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.

-File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.

•File 2 (Attack traffic):

-File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.

-File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.

•File 3 (Selected dataset for ML and DL):

-File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.

-File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.

Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files

!pip install -q kaggle

files.upload()

!mkdir ~/.kaggle

!cp kaggle.json ~/.kaggle/

!chmod 600 ~/.kaggle/kaggle.json

!kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"

!unzip DNN-EdgeIIoT-dataset.csv.zip

!rm DNN-EdgeIIoT-dataset.csv.zip

Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd

import numpy as np

df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)

Step 3 : Exploring some of the DataFrame's contents: df.head(5)

print(df['Attack_type'].value_counts())

Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle

drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",

"http.file_data","http.request.full_uri","icmp.transmit_timestamp", "http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport", "tcp.dstport", "udp.port", "mqtt.msg"]

df.drop(drop_columns, axis=1, inplace=True)

df.dropna(axis=0, how='any', inplace=True)

df.drop_duplicates(subset=None, keep="first", inplace=True)

df = shuffle(df)

df.isna().sum()

print(df['Attack_type'].value_counts())

Step 5: Categorical data encoding (Dummy Encoding): import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn import preprocessing

def encode_text_dummy(df, name):

dummies = pd.get_dummies(df[name])

for x in dummies.columns:

dummy_name = f"{name}-{x}" df[dummy_name] = dummies[x]

df.drop(name, axis=1, inplace=True)

encode_text_dummy(df,'http.request.method')

encode_text_dummy(df,'http.referer')

encode_text_dummy(df,"http.request.version")

encode_text_dummy(df,"dns.qry.name.len")

encode_text_dummy(df,"mqtt.conack.flags")

encode_text_dummy(df,"mqtt.protoname")

encode_text_dummy(df,"mqtt.topic")

Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')

For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com

More information about Dr. Mohamed Amine Ferrag is available at:

https://www.linkedin.com/in/Mohamed-Amine-Ferrag

https://dblp.uni-trier.de/pid/142/9937.html

https://www.researchgate.net/profile/Mohamed_Amine_Ferrag

https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao

https://www.scopus.com/authid/detail.uri?authorId=56115001200

https://publons.com/researcher/1322865/mohamed-amine-ferrag/

https://orcid.org/0000-0002-0632-3172

Last Updated: 27 Mar. 2023
data-base-linkedin
kaggle.com
Updated Aug 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaston Pereyra (2020). data-base-linkedin [Dataset]. https://www.kaggle.com/gastonpereyra/databaselinkedin/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 7, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaston Pereyra
Description
Dataset

This dataset was created by Gaston Pereyra

Contents
P
Novel COVID-19 Chestxray Repository Dataset
paperswithcode.com
Updated Sep 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pratik Bhowal; Subhankar Sen; Jin Hee Yoon Zong Woo Geem; Ram Sarkar (2021). Novel COVID-19 Chestxray Repository Dataset [Dataset]. https://paperswithcode.com/dataset/novel-covid-19-chestxray-repository
Explore at:
Dataset updated
Sep 8, 2021
Authors
Pratik Bhowal; Subhankar Sen; Jin Hee Yoon Zong Woo Geem; Ram Sarkar
Description
Authors of the Dataset:

Pratik Bhowal (B.E., Dept of Electronics and Instrumentation Engineering, Jadavpur University Kolkata, India) [LinkedIn], [Github] Subhankar Sen (B.Tech, Dept of Computer Science Engineering, Manipal University Jaipur, India) [LinkedIn], [Github], [Google Scholar] Jin Hee Yoon (faculty of the Dept. of Mathematics and Statistics at Sejong University, Seoul, South Korea) [LinkedIn], [Google Scholar] Zong Woo Geem (faculty of College of IT Convergence at Gachon University, South Korea) [LinkedIn], [Google Scholar] Ram Sarkar( Professor at Dept. of Computer Science Engineering, Jadavpur Univeristy Kolkata, India) [LinkedIn], [Google Scholar]

Overview The authors have created a new dataset known as Novel COVID-19 Chestxray Repository by the fusion of publicly available chest-xray image repositories. In creating this combined dataset, three different datasets obtained from the Github and Kaggle databases,created by the authors of other research studies in this field, were utilized.In our study,frontal and lateral chest X-ray images are used since this view of radiography is widely used by radiologist in clinical diagnosis.In the following section, authors have summarized how this dataset is created.

COVID-19 Radiography Database: The first release of this dataset reports 219 COVID-19,1345 viral pneumonia and 1341 normal radiographic chest X-ray images. This dataset was created by a team of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh in collaboration with medical doctors and specialists from Pakistan and Malaysia.This database is regularly updated with the emergence of new cases of COVID-19 patients worldwide.Related Paper:https://arxiv.org/abs/2003.13145

COVID-Chestxray set:Joseph Paul Cohen and Paul Morrison and Lan Dao have created a public image repository on Github which consists both CT scans and digital chest x-rays.The data was collected mainly from retrospective cohorts of pediatric patients from Guangzhou Women and Children’s medical center.With the aid of metadata information provided along with the dataset,we were able to extract 521 COVID-19 positive,239 viral and bacterial pneumonias;which are of the following three broad categories:Middle East Respiratory Syndrome (MERS),Severe Acute Respiratory Syndrome (SARS), and Acute Respiratory Distress syndrome (ARDS);and 218 normal radiographic chest X-ray images of varying image resolutions. Related Paper: https://arxiv.org/abs/2006.11988

Actualmed COVID chestxray dataset:Actualmed-COVID-chestxray-dataset comprises of 12 COVID-19 positive and 80 normal radiographic chest x-ray images.

The combined dataset includes chest X-ray images of COVID-19,Pneumonia and Normal (healthy) classes, with a total of 752, 1584, and 1639 images respectively. Information about the Novel COVID-19 Chestxray Database and its parent image repositories is provided in Table 1.

Table 1: Dataset Description | Dataset| COVID-19 |Pneumonia | Normal | | ------------- | ------------- | ------------- | -------------| | COVID Chestxray set | 521 |239|218| | COVID-19 Radiography Database(first release) | 219 |1345|1341| | Actualmed COVID chestxray dataset| 12 |0|80| | Total|752|1584|1639|

DATA ACCESS AND USE: Academic/Non-Commercial Use Dataset License : Database: Open Database, Contents: Database Contents
F
Software Development Job Postings on Indeed in the United States
fred.stlouisfed.org
json
Updated Jul 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Software Development Job Postings on Indeed in the United States [Dataset]. https://fred.stlouisfed.org/series/IHLIDXUSTPSOFTDEVE
Explore at:
jsonAvailable download formats
Dataset updated
Jul 15, 2025
License
https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval
Area covered
United States
Description
Graph and download economic data for Software Development Job Postings on Indeed in the United States (IHLIDXUSTPSOFTDEVE) from 2020-02-01 to 2025-07-11 about software, jobs, and USA.
FMCG Daily Sales Data (2022-2024)
kaggle.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beata Faron (2025). FMCG Daily Sales Data (2022-2024) [Dataset]. https://www.kaggle.com/datasets/beatafaron/fmcg-daily-sales-data-to-2022-2024
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 9, 2025
Dataset provided by
Kaggle
Authors
Beata Faron
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This synthetic dataset simulates daily-level FMCG sales transactions for three consecutive years (2022, 2023, 2024), designed for practicing time series forecasting, demand planning, and machine learning in realistic business conditions.

Inspired by real-world scenarios (e.g. Nestlé, Unilever, P&G), it includes: - Product hierarchy: SKU → Brand → Segment → Category - Sales channels: Retail / Discount / E-commerce - Regions: Central, North, and South (Poland) - Daily sales quantities, prices, promotions, stock, delivery lag (lead time) - Pack types: Single / Multipack / Carton - Seasonality and product introductions: - New SKUs are introduced in 2024 only - Prices gradually increase over the years

Possible Use Cases - Weekly sales forecasting - Promotion effect analysis - Seasonality and trend modeling - New product forecasting (cold start) - Feature engineering for ML models

Created by: Beata Faron
LinkedIn profile
Data Scientist working on demand forecasting, NLP, and business-oriented ML.

Image by iuriimotov on Freepik
Singapore Covid19 DataSet
kaggle.com
Updated Jun 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TSS (2020). Singapore Covid19 DataSet [Dataset]. https://www.kaggle.com/zactey/singapore-covid19-dataset-w-location-coordinates/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
TSS
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Singapore
Description
SingaporeCovidAnalysis

View it on Heroku https://singaporecovid19.herokuapp.com/

About Me:

Hi, everyone nice to meet you. I am Zac, a mechanical working in Singapore. My aim is to become a Data Scientist in the future. I am planning to go for a Master in Science in Data Science in the future. I will post my practice projects in github from time to time.

My Email: shin1803@hotmail.com My Linkedin: https://www.linkedin.com/in/zac-tey-005646136/

Summary of Project

This is a Data analysis with visual representation using Streamlit, hosted on Heroku. It's about the statistics of Covid-19 in Singapore, last updated on April 2020. I am aware of the uncompleteness of the data (such as large amounts of NaN). However this is the best that I could find at the moment. If you have a better and larger dataset, please feel free to share around. Thank You.

Useful Links You Can Refer for this Project

Streamlit Documentation: https://docs.streamlit.io/en/stable/ https://github.com/streamlit/streamlit

Deploying to Heroku with Gits: https://devcenter.heroku.com/articles/git

Nice Read that I found useful: https://towardsdatascience.com/streamlit-101-an-in-depth-introduction-fc8aad9492f2
oyo-reviews-dataset
kaggle.com
zip
Updated Jun 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepkumar patel (2023). oyo-reviews-dataset [Dataset]. https://www.kaggle.com/datasets/deeppatel9095/oyo-reviews-dataset
Explore at:
zip(32300432 bytes)Available download formats
Dataset updated
Jun 24, 2023
Authors
Deepkumar patel
Description
The inspiration behind creating the OYO Review Dataset for sentiment analysis was to explore the sentiment and opinions expressed in hotel reviews on the OYO Hotels platform. Analyzing the sentiment of customer reviews can provide valuable insights into the overall satisfaction of guests, identify areas for improvement, and assist in making data-driven decisions to enhance the hotel experience. By collecting and curating this dataset, Deep Patel, Nikki Patel, and Nimil aimed to contribute to the field of sentiment analysis in the context of the hospitality industry. Sentiment analysis allows us to classify the sentiment expressed in textual data, such as reviews, into positive, negative, or neutral categories. This analysis can help hotel management and stakeholders understand customer sentiments, identify common patterns, and address concerns or issues that may affect the reputation and customer satisfaction of OYO Hotels. The dataset provides a valuable resource for training and evaluating sentiment analysis models specifically tailored to the hospitality domain. Researchers, data scientists, and practitioners can utilize this dataset to develop and test various machine learning and natural language processing techniques for sentiment analysis, such as classification algorithms, sentiment lexicons, or deep learning models. Overall, the goal of creating the OYO Review Dataset for sentiment analysis was to facilitate research and analysis in the area of customer sentiments and opinions in the hotel industry. By understanding the sentiment of hotel reviews, businesses can strive to improve their services, enhance customer satisfaction, and make data-driven decisions to elevate the overall guest experience.

Deep Patel: https://www.linkedin.com/in/deep-patel-55ab48199/ Nikki Patel: https://www.linkedin.com/in/nikipatel9/ Nimil lathiya: https://www.linkedin.com/in/nimil-lathiya-059a281b1/
LinkedIn Industry List
kaggle.com
Updated Jan 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Remco Strijdonk (2025). LinkedIn Industry List [Dataset]. https://www.kaggle.com/datasets/remcostrijdonk/linkedin-industry-list/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 11, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Remco Strijdonk
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Remco Strijdonk

Released under MIT

Contents

A list of LinkedIn industries in all languages
linkedin bulk
kaggle.com
Updated Oct 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lakshay Handa (2024). linkedin bulk [Dataset]. https://www.kaggle.com/datasets/lakshaykahai/linkedin-bulk/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 8, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lakshay Handa
Description
Dataset

This dataset was created by Lakshay Handa

Contents
linkedin
kaggle.com
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kuncha manjula (2024). linkedin [Dataset]. https://www.kaggle.com/kunchamanjula/linkedin/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
kuncha manjula
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by kuncha manjula

Released under Apache 2.0

Contents
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Datasimple (2025). Linkedin Data Scientist/Analyst jobs (Berlin 2024) [Dataset]. https://www.opendatabay.com/data/ai-ml/8b069e64-ff57-4d82-bd60-1fbef0b49c07

Linkedin Data Scientist/Analyst jobs (Berlin 2024)

Explore at:

.undefinedAvailable download formats

Dataset updated

Jun 26, 2025

Dataset authored and provided by

Datasimple

Area covered

Data Science and Analytics

Description

Dataset of 422 jobs from LinkedIn to analyse data job market with search terms ("data analyst", "data scientist" & "data engineer")

Specifically interested in the application of NLP to extract in-demand tools in the market

Columns

'job_title' 'company_name' 'post_date' 'repost_date' 'email', 'number_of_employees' 'job_desc' 'num_applicants' 'job_type' 'job_level' 'job_remote' 'language' 'salary' 'sector' 'link', 'search_term' Please note: This is only an initial dataset, further uploads with more rows with different search terms will be made in the future. For suggests or requests please make a comment.

License

CC-BY-NC

Original Data Source: Linkedin Data Scientist/Analyst jobs (Berlin 2024)

Clear search

Close search

Google apps

Main menu

Linkedin Data Scientist/Analyst jobs (Berlin 2024)

License

BCG Data Science Simulation

** Feature Engineering for Churn Prediction**

Advanced: Saudi Arabian Aramco Stocks Dataset 🐪

Saudi Arabian Oil Company Aramco, Stocks

👨‍💻 Author: Azhar Saleem

Dataset Description

Columns in the Dataset

Potential Uses

TagX | 100000+ Job Postings data | Job listings data | Human Resource (HR)...

EDGE-IIOTSET Dataset

data-base-linkedin

Dataset

Contents

Novel COVID-19 Chestxray Repository Dataset

Software Development Job Postings on Indeed in the United States

FMCG Daily Sales Data (2022-2024)

Singapore Covid19 DataSet

SingaporeCovidAnalysis

About Me:

Summary of Project

Useful Links You Can Refer for this Project

oyo-reviews-dataset

LinkedIn Industry List

Dataset

Contents

linkedin bulk

Dataset

Contents

linkedin

Dataset

Contents

Linkedin Data Scientist/Analyst jobs (Berlin 2024)

License

Feature Engineering for Churn Prediction