MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset contains US Retail companies with company size from 200-500 workers. For each company, all workers were scrapped as well.
For mode details about scrapping code, you can check my article or GitHub code
Full profile of 10,000 people in Israel - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!
There are additionally millions more Israel people profiles available, visit the LinkDB product page here.
Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Data Source
https://www.kaggle.com/datasets/andrewmvd/breast-cancer-cell-segmentation
Dataset Card Authors
Mahadi Hassan
Dataset Card Contact
mahadise01@gmail.com
Linkdin: https://www.linkedin.com/in/mahadise01
Github: https://github.com/Mahadih534
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Data Source
https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-tumor-detection
Dataset Card Authors
Mahadi Hassan
Dataset Card Contact
mahadise01@gmail.com
Linkdin: https://www.linkedin.com/in/mahadise01
Github: https://github.com/Mahadih534
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
The dataset contains information on 30,000+ job postings collected from LinkedIn till the year 2023 which provides a rich source of information on job postings on LinkedIn, with concise information on the job title, company, location, and other key attributes of each posting. This data can be used to gain insights into employment trends and dynamics, identify key skills and experiences that are in high demand, and optimize job postings to attract the right candidates.
Taxonomy of the Dataset
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13623947%2F85fde0e9bcd9e6532b63e65ca1e5b58a%2FWhatsApp%20Image%202024-02-27%20at%2012.12.59.jpeg?generation=1709016197299811&alt=media" alt="">
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Data Source
https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
Dataset Card Authors
Mahadi Hassan
Dataset Card Contact
mahadise01@gmail.com
Linkdin: https://www.linkedin.com/in/mahadise01
Github: https://github.com/Mahadih534
https://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Data Source
https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images
Dataset Card Authors
Mahadi Hassan
Dataset Card Contact
mahadise01@gmail.com
Linkdin: https://www.linkedin.com/in/mahadise01
Github: https://github.com/Mahadih534
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Precipitation Prediction in LA’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varunnagpalspyz/precipitation-prediction-in-la on 13 February 2022.
--- Dataset description provided by original source is as follows ---
This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.
The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type
All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/
Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.
--- Original source retains full ownership of the source dataset ---
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Over 500 jobs scraped from the job section of LinkedIn.
Attribute Feature's Meaning location The location of the job designation The designation of the job name Name of the company industry Industry in which the company operates employees_count Count of employees linkedin_followers Number of followers on linkedin involvement the nature of involvement in the job, for instance: Full-time, part-time level The seniority level like Mid-Senior level total_applicants total number of applicants Skills Skills required for the job
Authors of the Dataset:
Pratik Bhowal (B.E., Dept of Electronics and Instrumentation Engineering, Jadavpur University Kolkata, India) [LinkedIn], [Github] Subhankar Sen (B.Tech, Dept of Computer Science Engineering, Manipal University Jaipur, India) [LinkedIn], [Github], [Google Scholar] Jin Hee Yoon (faculty of the Dept. of Mathematics and Statistics at Sejong University, Seoul, South Korea) [LinkedIn], [Google Scholar] Zong Woo Geem (faculty of College of IT Convergence at Gachon University, South Korea) [LinkedIn], [Google Scholar] Ram Sarkar( Professor at Dept. of Computer Science Engineering, Jadavpur Univeristy Kolkata, India) [LinkedIn], [Google Scholar]
Overview The authors have created a new dataset known as Novel COVID-19 Chestxray Repository by the fusion of publicly available chest-xray image repositories. In creating this combined dataset, three different datasets obtained from the Github and Kaggle databases,created by the authors of other research studies in this field, were utilized.In our study,frontal and lateral chest X-ray images are used since this view of radiography is widely used by radiologist in clinical diagnosis.In the following section, authors have summarized how this dataset is created.
COVID-19 Radiography Database: The first release of this dataset reports 219 COVID-19,1345 viral pneumonia and 1341 normal radiographic chest X-ray images. This dataset was created by a team of researchers from Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh in collaboration with medical doctors and specialists from Pakistan and Malaysia.This database is regularly updated with the emergence of new cases of COVID-19 patients worldwide.Related Paper:https://arxiv.org/abs/2003.13145
COVID-Chestxray set:Joseph Paul Cohen and Paul Morrison and Lan Dao have created a public image repository on Github which consists both CT scans and digital chest x-rays.The data was collected mainly from retrospective cohorts of pediatric patients from Guangzhou Women and Children’s medical center.With the aid of metadata information provided along with the dataset,we were able to extract 521 COVID-19 positive,239 viral and bacterial pneumonias;which are of the following three broad categories:Middle East Respiratory Syndrome (MERS),Severe Acute Respiratory Syndrome (SARS), and Acute Respiratory Distress syndrome (ARDS);and 218 normal radiographic chest X-ray images of varying image resolutions. Related Paper: https://arxiv.org/abs/2006.11988
Actualmed COVID chestxray dataset:Actualmed-COVID-chestxray-dataset comprises of 12 COVID-19 positive and 80 normal radiographic chest x-ray images.
The combined dataset includes chest X-ray images of COVID-19,Pneumonia and Normal (healthy) classes, with a total of 752, 1584, and 1639 images respectively. Information about the Novel COVID-19 Chestxray Database and its parent image repositories is provided in Table 1.
Table 1: Dataset Description | Dataset| COVID-19 |Pneumonia | Normal | | ------------- | ------------- | ------------- | -------------| | COVID Chestxray set | 521 |239|218| | COVID-19 Radiography Database(first release) | 219 |1345|1341| | Actualmed COVID chestxray dataset| 12 |0|80| | Total|752|1584|1639|
DATA ACCESS AND USE: Academic/Non-Commercial Use Dataset License : Database: Open Database, Contents: Database Contents
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Deep-NLP’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/samdeeplearning/deepnlp on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Sheet_1.csv contains 80 user responses, in the response_text column, to a therapy chatbot. Bot said: 'Describe a time when you have acted as a resource for someone else'. User responded. If a response is 'not flagged', the user can continue talking to the bot. If it is 'flagged', the user is referred to help.
Sheet_2.csv contains 125 resumes, in the resume_text column. Resumes were queried from Indeed.com with keyword 'data scientist', location 'Vermont'. If a resume is 'not flagged', the applicant can submit a modified resume version at a later date. If it is 'flagged', the applicant is invited to interview.
Classify new resumes/responses as flagged or not flagged.
There are two sets of data here - resumes and responses. Split the data into a train set and a test set to test the accuracy of your classifier. Bonus points for using the same classifier for both problems.
Good luck.
Thank you to Parsa Ghaffari (Aylien), without whom these visuals (cover photo is in Parsa Ghaffari's excellent LinkedIn article on English, Spanish and German postive v. negative sentiment analysis) would not exist.
You can use any of the code in that kernel anywhere, on or off Kaggle. Ping me at @_samputnam for questions.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This Dataset is part of a basic DIY Machine Learning project offered by my college, Indian Institute of Technology, Guwahati (IIT G). The main aim of this project was to get familiar with the workflow and various techniques involved in a Machine Learning project.
The dataset is fairly simple and contains various features regarding precipitation. PRCP = Precipitation (tenths of mm) TMAX = Maximum temperature (tenths of degrees C) TMIN = Minimum temperature (tenths of degrees C) PGTM = Peak gust time (hours and minutes, i.e., HHMM) AWND = Average daily wind speed (tenths of meters per second) TAVG = Average temperature (tenths of degrees C) WDFx = Direction of fastest x-minute wind (degrees) WSFx = Fastest x-minute wind speed (tenths of meters per second) WT = Weather Type
All Credits go to the Coding Club of Indian Institute of Technology, Guwahati (IIT Guwahati). Instagram: https://www.instagram.com/codingclubiitg/ LinkedIn : https://www.linkedin.com/company/coding-club-iitg/
Hope that this dataset + my notebook (https://www.kaggle.com/varunnagpalspyz/precipitation-prediction/notebook) helps all beginners like me.
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
The LinkedIn and World Bank Group collaboration is a prime example of how technology companies can work with development institutions to bring new data and insights to developing countries to address pressing development challenges. The opportunities and challenges presented by the global economy require the public and private sectors to join forces, share information, share resources, and work towards a common vision to make a meaningful, positive and scalable impact.
The datasets presented here are ones that underlie the visuals at linkedindata.worldbank.org. The datasets cover four categories of metrics: 1) Industry Employment Shifts, 2) Talent Migration, 3) Industry Skills Needs, and 4) Skill Penetration. LinkedIn and the World Bank Group plan to refresh the data annually at a minimum. The datasets are annual time series and go back to 2015.
Each category of the metrics is provided in a separate file with a cover sheet listing the variables names, definitions, and caveats. Country coverage varies slightly between metrics because of different data extraction and quality control rules. Countries with at least 100,000 LinkedIn members are included in the datasets. If more countries cross this threshold in the future, new countries can be added during the annual refresh.
Updated 30 January 2023
There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.
We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:
CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://rpubs.com/rhuebner/hrd_cb_v14
PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.
HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.
This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.
Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.
We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.
Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score
Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.
We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!
There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.
If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner
You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu
http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
I developed an artificial intelligence software that predicts your Age and Gender. It has a 93% accuracy rate. I'm 21 years old and he predicted my age 100% correctly! I adjusted the algorithm and prepared the codes. A system that works together with Neural Networks in the Deep Learning system. I used Convolutional Layers from Convolutional Neural Networks. I am pleased to present this software for humanity. Doctoral students can use it in their theses or various companies can use this software! Upload your photo, guess your age and gender!
Kind regards,
Emirhan BULUT
Head of AI & AI Inventor
Python 3.9.8
TensorFlow
Keras
OpenCV
MatPlotlib
NumPy
Pandas
Scikit-learn - (SKLEARN)
https://raw.githubusercontent.com/emirhanai/Age-and-Sex-Prediction-from-Image---Convolutional-Neural-Network-with-Artificial-Intelligence/main/Age%20and%20Sex%20Prediction%20from%20Image%20-%20Convolutional%20Neural%20Network%20with%20Artificial%20Intelligence.png" alt="Age and Sex Prediction from Image - Convolutional Neural Network with Artificial Intelligence">
Name-Surname: Emirhan BULUT
Contact (Email) : emirhan@isap.solutions
LinkedIn : https://www.linkedin.com/in/artificialintelligencebulut/
Kaggle: https://www.kaggle.com/emirhanai
Official Website: https://www.emirhanbulut.com.tr
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Face mask segmentation mask dataset for more efficient detection and localization.
Contact: https://www.linkedin.com/in/pericnikola/
Big thanks to all users on Pexels and Unsplash - find their user names in the names of the images.
Why I made this? I was bored.
No animals were hurt during the creation of this dataset (dataset was presented to them and they had absolutely no idea what to do with it).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Welcome to the Enhanced Saudi Arabian Oil Company (Aramco) Stock Dataset! This dataset has been meticulously prepared from Yahoo Finance and further enriched with several engineered features to elevate your data analysis, machine learning, and financial forecasting projects. It captures the daily trading figures of Aramco stocks, presented in Saudi Riyal (SAR), providing a robust foundation for comprehensive market analysis.
Date
: The trading day for the data recorded (ISO 8601 format).Open
: The price at which the stock first traded upon the opening of an exchange on a given trading day.High
: The highest price at which the stock traded during the trading day.Low
: The lowest price at which the stock traded during the trading day.Close
: The price at which the stock last traded upon the close of an exchange on a given trading day.Volume
: The total number of shares traded during the trading day.Dividends
: The dividend value paid out per share on the trading day.Stock Splits
: The number of stock splits occurring on the trading day.Lag Features (Lag_Close, Lag_High, Lag_Low)
: Previous day's closing, highest, and lowest prices.Rolling Window Statistics (e.g., Rolling_Mean_7, Rolling_Std_7)
: 7-day and 30-day moving averages and standard deviations of the Close price.Technical Indicators (RSI, MACD, Bollinger Bands)
: Key metrics used in trading to analyze short-term price movements.Change Features (Change_Close, Change_Volume)
: Day-over-day changes in Close price and trading volume.Date-Time Features (Weekday, Month, Year, Quarter)
: Extracted components of the trading day.Volume_Normalized
: The standardized trading volume using z-score normalization to adjust for scale differences.This dataset is tailored for a wide array of applications:
Financial Analysis
: Explore historical performance, volatility, and market trends.Forecasting Models
: Utilize features like lagged prices and rolling statistics to predict future stock prices.Machine Learning
: Develop regression models or classification frameworks to predict market movements.Deep Learning
: Leverage LSTM networks for more sophisticated time-series forecasting.Time-Series Analysis
: Dive deep into trend analysis, seasonality, and cyclical behavior of stock prices.Whether you are a data scientist, a financial analyst, or a hobbyist interested in the stock market, this dataset provides a rich playground for analysis and model building. Its comprehensive feature set allows for the development of robust predictive models and offers unique insights into one of the world’s most significant oil companies. Unlock the potential of financial data with this carefully crafted dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🚀**# BCG Data Science Job Simulation | Forage** This notebook focuses on feature engineering techniques to enhance a dataset for churn prediction modeling. As part of the BCG Data Science Job Simulation, I transformed raw customer data into valuable features to improve predictive performance.
📊 What’s Inside? ✅ Data Cleaning: Removing irrelevant columns to reduce noise ✅ Date-Based Feature Extraction: Converting raw dates into useful insights like activation year, contract length, and renewal month ✅ New Predictive Features:
consumption_trend → Measures if a customer’s last-month usage is increasing or decreasing total_gas_and_elec → Aggregates total energy consumption ✅ Final Processed Dataset: Ready for churn prediction modeling
📂Dataset Used: 📌 clean_data_after_eda.csv → Original dataset after Exploratory Data Analysis (EDA) 📌 clean_data_with_new_features.csv → Final dataset after feature engineering
🛠 Technologies Used: 🔹 Python (Pandas, NumPy) 🔹 Data Preprocessing & Feature Engineering
🌟 Why Feature Engineering? Feature engineering is one of the most critical steps in machine learning. Well-engineered features improve model accuracy and uncover deeper insights into customer behavior.
🚀 This notebook is a great reference for anyone learning data preprocessing, feature selection, and predictive modeling in Data Science!
📩 Connect with Me: 🔗 GitHub Repo: https://github.com/Pavitr-Swain/BCG-Data-Science-Job-Simulation 💼 LinkedIn: https://www.linkedin.com/in/pavitr-kumar-swain-ab708b227/
🔍 Let’s explore churn prediction insights together! 🎯
By RomanianDATA Tribe [source]
Tourism is a vital component for many economies around the world, benefiting their revenues, and their job market, pushing improvements towards a country's infrastructure, and engaging cultural exchange between foreigners and citizens.
And as the summer holiday season starts, we decided to put together some data regarding tourism in Romania.
We gathered the data with ease from the CEIC Data platform, which we highly recommend using; they have a plethora of very well-structured datasets covering more than 213 economies, 23 industries, and 18 macroeconomic sectors, compiled from 2,200 sources worldwide, which include up to 8 million macro and industry time series and are available via either web or API.
Create a visualization that shows how Romania's tourism has changed over the years.
After you finish the challenge make sure to fill in ✍️the participation tracker (), then share your makeover data visualization on LinkedIn using #RomanianDATA, #RomanianDATATribe and tagging RomanianDATA Tribe
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains data on tourism in Romania from 2004 to 2022. The data includes the number of foreign visitors and the type of transport used by them. The transport types include air, land, and sea
- The data could be used to study trends in tourism in Romania over time.
- The data could be used to study the effects of different types of transport on tourism in Romania.
- The data could be used to study the effects of different types of visitors on tourism in Romania
If you use this dataset in your research, please credit the original authors.
License
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Romania - Tourism data RDT June 2022.csv | Column name | Description | |:----------------------|:-------------------------------------------------------------| | Date | The date of the observation. (Date) | | Type of transport | The type of transport used by the foreign visitors. (String) | | Foreign Visitors | The number of foreign visitors. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit RomanianDATA Tribe.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset contains US Retail companies with company size from 200-500 workers. For each company, all workers were scrapped as well.
For mode details about scrapping code, you can check my article or GitHub code