https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
https://brightdata.com/licensehttps://brightdata.com/license
The LinkedIn posts dataset is a comprehensive collection of user-generated content on LinkedIn, featuring key fields such as post ID, user ID, URL, title, post text, date posted, hashtags, and engagement metrics like the number of likes and comments. This dataset also includes additional elements such as embedded links, images, videos, top visible comments, and links to more posts by the user or relevant content. It is ideal for social media analysts, marketers, and researchers looking to analyze user behavior, content trends, and engagement on LinkedIn.
The global number of LinkedIn users in was forecast to continuously increase between 2024 and 2028 by in total 171.9 million users (+22.3 percent). After the sixth consecutive increasing year, the LinkedIn user base is estimated to reach 942.84 million users and therefore a new peak in 2028. User figures, shown here with regards to the platform LinkedIn, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of LinkedIn users in countries like Asia and South America.
https://brightdata.com/licensehttps://brightdata.com/license
The LinkedIn Jobs Listing dataset emerges as a comprehensive resource for individuals navigating the contemporary job market. With a focus on critical employment details, the dataset encapsulates key facets of job listings, including titles, company names, locations, and employment specifics such as seniority levels and functions. This wealth of information is instrumental for job seekers looking to align their skills and aspirations with the right opportunities. The inclusion of direct application links and real-time application numbers enhances the dataset's utility, offering users a streamlined approach to engaging with potential employers. Beyond aiding job seekers, the dataset serves as a valuable tool for analysts and researchers, providing nuanced insights into industry trends and the evolving demands of the job market. The temporal aspect, captured through job posting timestamps, allows for the observation of job trends over time. Moreover, the dataset's integration of company details, including unique identifiers and LinkedIn profile links, enables a deeper exploration of hiring organizations. Whether for job seekers or analysts, the LinkedIn Jobs Listing dataset emerges as a versatile and informative repository, empowering users with the knowledge to make informed decisions in their professional pursuits.
As of early 2025, LinkedIn had an audience reach of *** million users in the *************. The country was by far the leading market of the professional job networking service, with runner-up India having an audience of *** million. LinkedIn: the company Launched in 2003, LinkedIn is a professional networking service where jobseekers can post their CVs, and employers or recruiters can post job ads and search for prospective candidates. In December 2016, Microsoft acquired LinkedIn, making it a wholly owned subsidiary. In 2020, the platform generated over ***** billion U.S. dollars in revenue. Despite its great success, the company has not always seen positive numbers only, and in 2018, LinkedIn reported an operating loss of *** million U.S. dollars. LinkedIn marketing Greater exposure, lead generation and increased thought leadership are all key benefits of social media marketing, and LinkedIn is a popular marketing tool in the B2B segment. Whereas the company predominantly generates revenue by selling access to member information to professional parties, LinkedIn is the second-most popular social media platform used by B2B marketers, ranking only behind Facebook.
Success.ai’s User Profiles Data for Nonprofit and NGO Leaders provides businesses, organizations, and researchers with comprehensive access to global leaders in the nonprofit and NGO sectors. With data sourced from over 700 million verified LinkedIn profiles, this dataset includes actionable insights and contact details for executives, program managers, administrators, and decision-makers. Whether your goal is to partner with nonprofits, support global causes, or conduct research into social impact, Success.ai ensures your outreach is backed by accurate, enriched, and continuously updated data.
Why Choose Success.ai’s User Profiles Data for Nonprofit and NGO Leaders? Comprehensive Professional Profiles
Access verified LinkedIn profiles of nonprofit leaders, NGO managers, program directors, grant writers, and administrative executives. AI-driven validation ensures 99% accuracy for efficient communication and minimized bounce rates. Global Coverage Across Nonprofit Sectors
Includes profiles from nonprofits, humanitarian organizations, environmental groups, social enterprises, and advocacy organizations. Covers key markets across North America, Europe, APAC, South America, and Africa for global reach. Continuously Updated Dataset
Reflects real-time professional updates, organizational changes, and emerging trends in the nonprofit landscape to keep your targeting relevant and effective. Tailored for Nonprofit Insights
Enriched profiles include work histories, organizational affiliations, areas of expertise, and social impact projects for deeper engagement opportunities. Data Highlights: 700M+ Verified LinkedIn Profiles: Access a vast network of nonprofit and NGO professionals worldwide. 100M+ Work Emails: Direct communication with executives, managers, and decision-makers in the nonprofit sector. Enriched Organizational Data: Gain insights into leadership structures, mission focuses, and operational scales. Industry-Specific Segmentation: Target nonprofits focused on healthcare, education, environmental sustainability, human rights, and more. Key Features of the Dataset: Nonprofit and NGO Leader Profiles
Identify and connect with executives, program managers, fundraisers, and policy directors in global nonprofit and NGO sectors. Engage with individuals who drive decision-making and operational strategies for impactful organizations. Detailed Organizational Insights
Leverage firmographic data, including organizational size, mission, regional activity, and funding sources, to align with specific nonprofit goals. Advanced Filters for Precision Targeting
Refine searches by region, mission type, role, or organizational focus for tailored outreach. Customize campaigns based on social impact priorities, such as climate action, gender equality, or economic development. AI-Driven Enrichment
Enhanced datasets provide actionable insights into professional accomplishments, partnerships, and leadership achievements for targeted engagement. Strategic Use Cases: Partnership Development and Outreach
Identify nonprofits and NGOs for collaboration on social impact projects, sponsorships, or grant distribution. Build relationships with decision-makers driving advocacy, fundraising, and community initiatives. Donor Engagement and Fundraising
Target nonprofit leaders responsible for managing fundraising campaigns and donor relationships. Tailor outreach efforts to align with specific causes and funding priorities. Research and Analysis
Analyze leadership trends, mission focuses, and regional nonprofit activities to inform program design and funding strategies. Use insights to evaluate the effectiveness of social impact initiatives and partnerships. Recruitment and Talent Acquisition
Target HR professionals and administrators seeking qualified staff, consultants, or volunteers for nonprofits and NGOs. Offer talent solutions for specialized roles in program management, advocacy, and administration. Why Choose Success.ai? Best Price Guarantee
Access industry-leading, verified User Profiles Data at unmatched pricing to ensure your campaigns are cost-effective and impactful. Seamless Integration
Easily integrate verified nonprofit data into your CRM or marketing platforms with APIs or downloadable formats. AI-Validated Accuracy
Rely on 99% accuracy to minimize wasted outreach efforts and maximize engagement outcomes. Customizable Solutions
Tailor datasets to focus on specific nonprofit types, geographical regions, or areas of social impact to meet your strategic objectives. Strategic APIs for Enhanced Campaigns: Data Enrichment API
Update your internal records with verified nonprofit leader profiles to enhance targeting and engagement. Lead Generation API
Automate lead generation for a consistent pipeline of nonprofit and NGO professionals, scaling your outreach efforts efficiently. Success.ai’s User Profiles Data for Nonprofit and NGO Leader...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
From the LinkedIn users who identify themselves in the “computer software” industry on their profile and listed at least two skills, we sample about 340,000 users and their skills. To remove skills that are too rare or erroneously typed, we dropped skills appeared less than 100 times in our dataset. # skill_count.tsv
This file contains the frequency of each skill. The first column is the name of the skill; The second column is the number of users who listed the skill; the third column is the total number of users in our dataset. # skill_pair_count.tsv
This file contains the co-occurrence between each pair of skills.The first two columns are skill names and the third column is the frequency of co-occurrences.
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
The LinkedIn and World Bank Group collaboration is a prime example of how technology companies can work with development institutions to bring new data and insights to developing countries to address pressing development challenges. The opportunities and challenges presented by the global economy require the public and private sectors to join forces, share information, share resources, and work towards a common vision to make a meaningful, positive and scalable impact.
The datasets presented here are ones that underlie the visuals at linkedindata.worldbank.org. The datasets cover four categories of metrics: 1) Industry Employment Shifts, 2) Talent Migration, 3) Industry Skills Needs, and 4) Skill Penetration. LinkedIn and the World Bank Group plan to refresh the data annually at a minimum. The datasets are annual time series and go back to 2015.
Each category of the metrics is provided in a separate file with a cover sheet listing the variables names, definitions, and caveats. Country coverage varies slightly between metrics because of different data extraction and quality control rules. Countries with at least 100,000 LinkedIn members are included in the datasets. If more countries cross this threshold in the future, new countries can be added during the annual refresh.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains user reviews and ratings for the LinkedIn mobile application, extracted from its Google Store page. It provides valuable insights into the public's perception of the app over an extended period. The collection of reviews offers a basis for understanding user sentiment, identifying trends, and pinpointing common pain points experienced by users of the LinkedIn app. The dataset is particularly useful for product development teams, market analysts, and researchers interested in user feedback and app performance analysis.
This dataset is typically provided as a data file, commonly in CSV format. It comprises approximately 320,000 individual review records. The review_id
column alone contains 322,641 unique values. The data structure is tabular, with each row representing a single review and columns providing specific details about that review. Specific numbers for rows/records are available and consistent with the total count.
This dataset is ideal for a variety of analytical applications and use cases, including: * Sentiment Analysis: Extracting sentiments and trends from user feedback to gauge overall satisfaction and identify shifts in public opinion. * Version Performance Tracking: Identifying which versions of the LinkedIn app received the most positive or negative feedback. * Topic Modelling: Utilising natural language processing (NLP) techniques like topic modelling to uncover specific pain points, frequently requested features, or common praise for the application. * Product Improvement: Informing product development and user experience (UX) design by directly addressing user feedback. * Market Research: Understanding user perceptions of a leading professional networking platform.
The dataset covers reviews for the LinkedIn app, which has a global user base with over 970 million registered members from more than 200 countries and territories. The reviews themselves were extracted from its Google Store page. The time range for the reviews spans from 7th April 2011 to 18th November 2023. There are specific notes on data availability for certain groups/years visible in the timestamp distribution.
CC0
This dataset is intended for: * Data Scientists & Analysts: For performing sentiment analysis, natural language processing, and trend analysis on app reviews. * App Developers & Product Managers: To gain direct user feedback for product iteration, bug identification, and feature prioritisation. * Market Researchers: To understand user behaviour, competitive landscape, and public perception within the social media and professional networking domain. * Academic Researchers: For studies on user feedback, app development cycles, and the evolution of digital platform perception.
Original Data Source: 📝 320K LinkedIn App Google Store Reviews
Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting, employee data / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contacts ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.
We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.
What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.
Products: API Suite Web UI Full and Custom Data Feeds
Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contacts to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
The number of LinkedIn users in Kenya was forecast to continuously increase between 2024 and 2028 by in total 1.7 million users (+51.83 percent). After the ninth consecutive increasing year, the LinkedIn user base is estimated to reach 4.97 million users and therefore a new peak in 2028. Notably, the number of LinkedIn users of was continuously increasing over the past years.User figures, shown here with regards to the platform LinkedIn, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of LinkedIn users in countries like Rwanda and Zambia.
Salutary Data is a boutique, B2B contact and company data provider that's committed to delivering high quality data for sales intelligence, lead generation, marketing, recruiting / HR, identity resolution, and ML / AI. Our database currently consists of 148MM+ highly curated B2B Contact ( US only), along with over 4M+ companies, and is updated regularly to ensure we have the most up-to-date information.
We can enrich your in-house data ( CRM Enrichment, Lead Enrichment, etc.) and provide you with a custom dataset ( such as a lead list) tailored to your target audience specifications and data use-case. We also support large-scale data licensing to software providers and agencies that intend to redistribute our data to their customers and end-users.
What makes Salutary unique? - We offer our clients a truly unique, one-stop aggregation of the best-of-breed quality data sources. Our supplier network consists of numerous, established high quality suppliers that are rigorously vetted. - We leverage third party verification vendors to ensure phone numbers and emails are accurate and connect to the right person. Additionally, we deploy automated and manual verification techniques to ensure we have the latest job information for contacts. - We're reasonably priced and easy to work with.
Products: API Suite Web UI Full and Custom Data Feeds
Services: Data Enrichment - We assess the fill rate gaps and profile your customer file for the purpose of appending fields, updating information, and/or rendering net new “look alike” prospects for your campaigns. ABM Match & Append - Send us your domain or other company related files, and we’ll match your Account Based Marketing targets and provide you with B2B contact to campaign. Optionally throw in your suppression file to avoid any redundant records. Verification (“Cleaning/Hygiene”) Services - Address the 2% per month aging issue on contact records! We will identify duplicate records, contacts no longer at the company, rid your email hard bounces, and update/replace titles or phones. This is right up our alley and levers our existing internal and external processes and systems.
At source mate, we understand the value of accurate and up-to-date data in today's competitive landscape. Our CVs and B2B Linkedin data are meticulously collected, verified, and updated, ensuring their integrity and relevance.
We gather information from various trusted sources, such as our websites, job boards, professional networks, and career websites, to create a comprehensive database of potential candidates actively seeking employment opportunities.
Here's why our job seeker data sets are unparalleled in the industry:
Comprehensive and Targeted: Our data sets cover a wide range of industries, job titles, locations, and experience levels. Whether you're looking for entry-level professionals, mid-level managers, or specialized experts, we have the data to meet your specific requirements. Our data is highly segmented and customizable, enabling you to target your ideal candidates with precision.
Fresh and Updated: We understand the importance of timely information. Our dedicated team ensures that our job seeker data is regularly updated and refreshed to maintain its accuracy and relevance. This means you'll have access to the latest contact details, job preferences, skills, and qualifications of potential candidates, enabling you to engage with them at the right time and with personalized messaging.
GDPR Compliant: Privacy and data protection are paramount to us. We strictly adhere to the General Data Protection Regulation (GDPR) guidelines, ensuring that all data we provide is collected and processed lawfully and ethically. We respect the privacy rights of individuals and maintain the highest standards of data security and confidentiality.
Easy Integration: Our data sets are provided in a format that is easily integrable with your existing systems and platforms. Whether you want to import the data into your CRM, applicant tracking system, or any other software, our user-friendly formats facilitate seamless integration, saving you time and effort.
Reliable Customer Support: We pride ourselves on delivering exceptional customer service. Our dedicated support team is available to assist you at every step of the process, from helping you select the right data sets to answering any queries or concerns you may have. We strive to ensure your experience with source mate is smooth, efficient, and successful.
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Context: This dataset offers insights into the usage patterns of social media apps for 1,000 users across seven popular platforms: Facebook, Instagram, Twitter, Snapchat, TikTok, LinkedIn, and Pinterest. It tracks various metrics such as daily time spent on the app, number of posts made, likes received, and new followers gained.
Dataset Features:
User_ID: Unique identifier for each user. App: The social media platform being used. Daily_Minutes_Spent: Total time a user spends on the app each day, ranging from 5 to 500 minutes. Posts_Per_Day: Number of posts a user creates per day, ranging from 0 to 20. Likes_Per_Day: Total number of likes a user receives on their posts each day, ranging from 0 to 200. Follows_Per_Day: The number of new followers a user gains daily, ranging from 0 to 50. Context & Use Cases: This dataset could be particularly useful for social media analysts, digital marketers, or researchers interested in understanding user engagement trends across different platforms. It provides insights into how much time users spend, how actively they post, and the level of engagement they receive (in terms of likes and followers).
Conclusion & Outcome: Analyzing this dataset could yield several outcomes:
Engagement Patterns: Identifying which platforms have higher engagement in terms of time spent or likes received. Active Users: Determining which users are the most active across various platforms based on the number of posts and followers gained. User Retention: Studying the correlation between time spent and follower growth, providing insight into user retention strategies for different platforms. Overall, the dataset allows for exploration of social media usage trends and helps drive decision-making for marketing strategies, content creation, and platform engagement.
The number of LinkedIn users in Africa was forecast to continuously increase between 2024 and 2028 by in total 37 million users (+68.13 percent). After the ninth consecutive increasing year, the LinkedIn user base is estimated to reach 91.29 million users and therefore a new peak in 2028. Notably, the number of LinkedIn users of was continuously increasing over the past years.User figures, shown here with regards to the platform LinkedIn, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of LinkedIn users in countries like South America and Caribbean.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was created by Harry Field and contains the labelled images for capturing the game state of a draughts/checkers 8x8 board.
This was a fun project to develop a mobile draughts applciation enabling users to interact with draughts-based software via their mobile device's camera.
The data captured consists of: * White Pieces * White Kings * Black Pieces * Black Kings * Bottom left corner square * Top left corner square * Top right corner square * Bottom right corner square
Corner squares are captured so the board locations of the detected pieces can be estimated.
https://github.com/ShippingTycoon/roboflow-draughts/blob/main/PXL_20210603_093949805_jpg.rf.30e2a64a0a646e8ea8e121727cf0f1ee.jpg?raw=true" alt="Results of Yolov5 model after training with this dataset">
From this data, the locations of other squares can be estimated and game state can be captured. The image below shows the data of a different board configuration being captured. Blue circles refer to squares, numbers refer to square index and the coloured circles refer to pieces.
https://github.com/ShippingTycoon/roboflow-draughts/blob/main/pieces.png?raw=true" alt="">
Once game state is captured, integration with other software becomes possible. In this example, I created a simple move suggestion mobile applciation seen working here.
The developed application is a proof of concept and is not available to the public. Further development is required in training the model accross multiple draughts boards and implementing features to add vlaue to the physical draughts game.
The dataset consists of 759 images and was trained using Yolov5 with a 70/20/10 split.
The output of Yolov5 was parsed and filtered to correct for duplicated/overlapping detections before game state could be determined.
I hope you find this dataset useful and if you have any questions feel free to drop me a message on LinkedIn as per the link above.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Deep-NLP’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/samdeeplearning/deepnlp on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Sheet_1.csv contains 80 user responses, in the response_text column, to a therapy chatbot. Bot said: 'Describe a time when you have acted as a resource for someone else'. User responded. If a response is 'not flagged', the user can continue talking to the bot. If it is 'flagged', the user is referred to help.
Sheet_2.csv contains 125 resumes, in the resume_text column. Resumes were queried from Indeed.com with keyword 'data scientist', location 'Vermont'. If a resume is 'not flagged', the applicant can submit a modified resume version at a later date. If it is 'flagged', the applicant is invited to interview.
Classify new resumes/responses as flagged or not flagged.
There are two sets of data here - resumes and responses. Split the data into a train set and a test set to test the accuracy of your classifier. Bonus points for using the same classifier for both problems.
Good luck.
Thank you to Parsa Ghaffari (Aylien), without whom these visuals (cover photo is in Parsa Ghaffari's excellent LinkedIn article on English, Spanish and German postive v. negative sentiment analysis) would not exist.
You can use any of the code in that kernel anywhere, on or off Kaggle. Ping me at @_samputnam for questions.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about rental properties in São Paulo, Brazil. The data was extracted from the QuintoAndar platform using web scraping techniques on May 1st, 2023. The dataset includes several useful pieces of information, such as the property's address, district, area, number of bedrooms, garage availability, monthly rent, type of property, and total cost.
The dataset can be used for various analyses, such as understanding the average rental prices in different districts or identifying the most common types of properties in certain areas. Additionally, the data can be used to train machine learning models that predict rental prices based on property characteristics.
It's important to note that since the data was obtained through web scraping techniques, there may be errors or incomplete information. Therefore, it's recommended that users of the dataset verify the information before using it for analysis or model training. Nevertheless, this dataset is a valuable source of information for anyone interested in analyzing the real estate market in São Paulo.
Link of the webscrapping project: QuintoAndar-WebScrapping
Este conjunto de dados contém informações sobre aluguel de imóveis em São Paulo, Brasil. Os dados foram extraídos da plataforma QuintoAndar usando técnicas de web scraping em 1º de maio de 2023. O conjunto de dados inclui várias informações úteis, como o endereço do imóvel, o bairro, a área, o número de quartos, a disponibilidade de garagem, o preço mensal do aluguel, o tipo de imóvel e o custo total.
O conjunto de dados pode ser usado para diversas análises, como entender os preços médios de aluguel em diferentes bairros ou identificar os tipos de imóveis mais comuns em determinadas áreas. Além disso, os dados podem ser usados para treinar modelos de aprendizado de máquina que prevejam os preços de aluguel com base nas características do imóvel.
É importante observar que, como os dados foram obtidos por meio de técnicas de web scraping, pode haver erros ou informações incompletas. Portanto, é recomendável que os usuários do conjunto de dados verifiquem as informações antes de usá-las para análises ou treinamento de modelos. No entanto, este conjunto de dados é uma fonte valiosa de informações para quem está interessado em analisar o mercado imobiliário em São Paulo.
Link do projeto de WebScrapping: QuintoAndar-WebScrapping
ABSTRACT In this project, we propose a new comprehensive realistic cyber security dataset of IoT and IIoT applications, called Edge-IIoTset, which can be used by machine learning-based intrusion detection systems in two different modes, namely, centralized and federated learning. Specifically, the proposed testbed is organized into seven layers, including, Cloud Computing Layer, Network Functions Virtualization Layer, Blockchain Network Layer, Fog Computing Layer, Software-Defined Networking Layer, Edge Computing Layer, and IoT and IIoT Perception Layer. In each layer, we propose new emerging technologies that satisfy the key requirements of IoT and IIoT applications, such as, ThingsBoard IoT platform, OPNFV platform, Hyperledger Sawtooth, Digital twin, ONOS SDN controller, Mosquitto MQTT brokers, Modbus TCP/IP, ...etc. The IoT data are generated from various IoT devices (more than 10 types) such as Low-cost digital sensors for sensing temperature and humidity, Ultrasonic sensor, Water level detection sensor, pH Sensor Meter, Soil Moisture sensor, Heart Rate Sensor, Flame Sensor, ...etc.). However, we identify and analyze fourteen attacks related to IoT and IIoT connectivity protocols, which are categorized into five threats, including, DoS/DDoS attacks, Information gathering, Man in the middle attacks, Injection attacks, and Malware attacks. In addition, we extract features obtained from different sources, including alerts, system resources, logs, network traffic, and propose new 61 features with high correlations from 1176 found features. After processing and analyzing the proposed realistic cyber security dataset, we provide a primary exploratory data analysis and evaluate the performance of machine learning approaches (i.e., traditional machine learning as well as deep learning) in both centralized and federated learning modes.
Instructions:
Great news! The Edge-IIoT dataset has been featured as a "Document in the top 1% of Web of Science." This indicates that it is ranked within the top 1% of all publications indexed by the Web of Science (WoS) in terms of citations and impact.
Please kindly visit kaggle link for the updates: https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-sec...
Free use of the Edge-IIoTset dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is allowable after asking the leader author, Dr Mohamed Amine Ferrag, who has asserted his right under the Copyright.
The details of the Edge-IIoT dataset were published in following the paper. For the academic/public use of these datasets, the authors have to cities the following paper:
Mohamed Amine Ferrag, Othmane Friha, Djallel Hamouda, Leandros Maglaras, Helge Janicke, "Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning", IEEE Access, April 2022 (IF: 3.37), DOI: 10.1109/ACCESS.2022.3165809
Link to paper : https://ieeexplore.ieee.org/document/9751703
The directories of the Edge-IIoTset dataset include the following:
•File 1 (Normal traffic)
-File 1.1 (Distance): This file includes two documents, namely, Distance.csv and Distance.pcap. The IoT sensor (Ultrasonic sensor) is used to capture the IoT data.
-File 1.2 (Flame_Sensor): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.
-File 1.3 (Heart_Rate): This file includes two documents, namely, Flame_Sensor.csv and Flame_Sensor.pcap. The IoT sensor (Flame Sensor) is used to capture the IoT data.
-File 1.4 (IR_Receiver): This file includes two documents, namely, IR_Receiver.csv and IR_Receiver.pcap. The IoT sensor (IR (Infrared) Receiver Sensor) is used to capture the IoT data.
-File 1.5 (Modbus): This file includes two documents, namely, Modbus.csv and Modbus.pcap. The IoT sensor (Modbus Sensor) is used to capture the IoT data.
-File 1.6 (phValue): This file includes two documents, namely, phValue.csv and phValue.pcap. The IoT sensor (pH-sensor PH-4502C) is used to capture the IoT data.
-File 1.7 (Soil_Moisture): This file includes two documents, namely, Soil_Moisture.csv and Soil_Moisture.pcap. The IoT sensor (Soil Moisture Sensor v1.2) is used to capture the IoT data.
-File 1.8 (Sound_Sensor): This file includes two documents, namely, Sound_Sensor.csv and Sound_Sensor.pcap. The IoT sensor (LM393 Sound Detection Sensor) is used to capture the IoT data.
-File 1.9 (Temperature_and_Humidity): This file includes two documents, namely, Temperature_and_Humidity.csv and Temperature_and_Humidity.pcap. The IoT sensor (DHT11 Sensor) is used to capture the IoT data.
-File 1.10 (Water_Level): This file includes two documents, namely, Water_Level.csv and Water_Level.pcap. The IoT sensor (Water sensor) is used to capture the IoT data.
•File 2 (Attack traffic):
-File 2.1 (Attack traffic (CSV files)): This file includes 13 documents, namely, Backdoor_attack.csv, DDoS_HTTP_Flood_attack.csv, DDoS_ICMP_Flood_attack.csv, DDoS_TCP_SYN_Flood_attack.csv, DDoS_UDP_Flood_attack.csv, MITM_attack.csv, OS_Fingerprinting_attack.csv, Password_attack.csv, Port_Scanning_attack.csv, Ransomware_attack.csv, SQL_injection_attack.csv, Uploading_attack.csv, Vulnerability_scanner_attack.csv, XSS_attack.csv. Each document is specific for each attack.
-File 2.2 (Attack traffic (PCAP files)): This file includes 13 documents, namely, Backdoor_attack.pcap, DDoS_HTTP_Flood_attack.pcap, DDoS_ICMP_Flood_attack.pcap, DDoS_TCP_SYN_Flood_attack.pcap, DDoS_UDP_Flood_attack.pcap, MITM_attack.pcap, OS_Fingerprinting_attack.pcap, Password_attack.pcap, Port_Scanning_attack.pcap, Ransomware_attack.pcap, SQL_injection_attack.pcap, Uploading_attack.pcap, Vulnerability_scanner_attack.pcap, XSS_attack.pcap. Each document is specific for each attack.
•File 3 (Selected dataset for ML and DL):
-File 3.1 (DNN-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating deep learning-based intrusion detection systems.
-File 3.2 (ML-EdgeIIoT-dataset): This file contains a selected dataset for the use of evaluating traditional machine learning-based intrusion detection systems.
Step 1: Downloading The Edge-IIoTset dataset From the Kaggle platform from google.colab import files
!pip install -q kaggle
files.upload()
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot -f "Edge-IIoTset dataset/Selected dataset for ML and DL/DNN-EdgeIIoT-dataset.csv"
!unzip DNN-EdgeIIoT-dataset.csv.zip
!rm DNN-EdgeIIoT-dataset.csv.zip
Step 2: Reading the Datasets' CSV file to a Pandas DataFrame: import pandas as pd
import numpy as np
df = pd.read_csv('DNN-EdgeIIoT-dataset.csv', low_memory=False)
Step 3 : Exploring some of the DataFrame's contents: df.head(5)
print(df['Attack_type'].value_counts())
Step 4: Dropping data (Columns, duplicated rows, NAN, Null..): from sklearn.utils import shuffle
drop_columns = ["frame.time", "ip.src_host", "ip.dst_host", "arp.src.proto_ipv4","arp.dst.proto_ipv4",
"http.file_data","http.request.full_uri","icmp.transmit_timestamp",
"http.request.uri.query", "tcp.options","tcp.payload","tcp.srcport",
"tcp.dstport", "udp.port", "mqtt.msg"]
df.drop(drop_columns, axis=1, inplace=True)
df.dropna(axis=0, how='any', inplace=True)
df.drop_duplicates(subset=None, keep="first", inplace=True)
df = shuffle(df)
df.isna().sum()
print(df['Attack_type'].value_counts())
Step 5: Categorical data encoding (Dummy Encoding): import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import preprocessing
def encode_text_dummy(df, name):
dummies = pd.get_dummies(df[name])
for x in dummies.columns:
dummy_name = f"{name}-{x}"
df[dummy_name] = dummies[x]
df.drop(name, axis=1, inplace=True)
encode_text_dummy(df,'http.request.method')
encode_text_dummy(df,'http.referer')
encode_text_dummy(df,"http.request.version")
encode_text_dummy(df,"dns.qry.name.len")
encode_text_dummy(df,"mqtt.conack.flags")
encode_text_dummy(df,"mqtt.protoname")
encode_text_dummy(df,"mqtt.topic")
Step 6: Creation of the preprocessed dataset df.to_csv('preprocessed_DNN.csv', encoding='utf-8')
For more information about the dataset, please contact the lead author of this project, Dr Mohamed Amine Ferrag, on his email: mohamed.amine.ferrag@gmail.com
More information about Dr. Mohamed Amine Ferrag is available at:
https://www.linkedin.com/in/Mohamed-Amine-Ferrag
https://dblp.uni-trier.de/pid/142/9937.html
https://www.researchgate.net/profile/Mohamed_Amine_Ferrag
https://scholar.google.fr/citations?user=IkPeqxMAAAAJ&hl=fr&oi=ao
https://www.scopus.com/authid/detail.uri?authorId=56115001200
https://publons.com/researcher/1322865/mohamed-amine-ferrag/
https://orcid.org/0000-0002-0632-3172
Last Updated: 27 Mar. 2023
This dataset provides comprehensive real-time job listing data aggregated from multiple job boards and company websites. It includes detailed job information such as titles, descriptions, requirements, salaries, locations, and company details. The data is continuously updated to provide the most current job opportunities. Users can leverage this dataset for job search applications, market research, salary analysis, and career development tools. Whether you're building a job search platform, conducting employment market analysis, or developing career guidance tools, this dataset provides current and reliable job market data. The dataset is delivered in a JSON format via REST API.
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.