
 Facebook
Facebook Twitter
Twitter Email
Email
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about world's biggest companies.
Among them you can find companies founded in the US, the UK, Europe, Asia, South America, South Africa, Australia.
The dataset contains information about the year the company was founded, its' revenue and net income in years 2018 - 2020, and the industry.
I have included 2 csv files: the raw csv file if you want to practice cleaning the data, and the clean csv ready to be analyzed.
The third dataset includes the name of all the companies included in the previous datasets and 2 additional columns: number of employees and name of the founder.
In addition there's tesla.csv file containing shares prices for Tesla.

 Facebook
Facebook Twitter
Twitter Email
Email
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.
Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.
Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.
Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:
Background and Motivation
In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.
While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.
In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.
However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.
Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2
Paper: to be published

 Facebook
Facebook Twitter
Twitter Email
Email
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

 Facebook
Facebook Twitter
Twitter Email
Email
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!

 Facebook
Facebook Twitter
Twitter Email
Email
Success.ai’s LinkedIn Data Solutions offer unparalleled access to a vast dataset of 700 million public LinkedIn profiles and 70 million LinkedIn company records, making it one of the most comprehensive and reliable LinkedIn datasets available on the market today. Our employee data and LinkedIn data are ideal for businesses looking to streamline recruitment efforts, build highly targeted lead lists, or develop personalized B2B marketing campaigns.
Whether you’re looking for recruiting data, conducting investment research, or seeking to enrich your CRM systems with accurate and up-to-date LinkedIn profile data, Success.ai provides everything you need with pinpoint precision. By tapping into LinkedIn company data, you’ll have access to over 40 critical data points per profile, including education, professional history, and skills.
Key Benefits of Success.ai’s LinkedIn Data: Our LinkedIn data solution offers more than just a dataset. With GDPR-compliant data, AI-enhanced accuracy, and a price match guarantee, Success.ai ensures you receive the highest-quality data at the best price in the market. Our datasets are delivered in Parquet format for easy integration into your systems, and with millions of profiles updated daily, you can trust that you’re always working with fresh, relevant data.
Global Reach and Industry Coverage: Our LinkedIn data covers professionals across all industries and sectors, providing you with detailed insights into businesses around the world. Our geographic coverage spans 259M profiles in the United States, 22M in the United Kingdom, 27M in India, and thousands of profiles in regions such as Europe, Latin America, and Asia Pacific. With LinkedIn company data, you can access profiles of top companies from the United States (6M+), United Kingdom (2M+), and beyond, helping you scale your outreach globally.
Why Choose Success.ai’s LinkedIn Data: Success.ai stands out for its tailored approach and white-glove service, making it easy for businesses to receive exactly the data they need without managing complex data platforms. Our dedicated Success Managers will curate and deliver your dataset based on your specific requirements, so you can focus on what matters most—reaching the right audience. Whether you’re sourcing employee data, LinkedIn profile data, or recruiting data, our service ensures a seamless experience with 99% data accuracy.
Key Use Cases:
LinkedIn URL: Access direct links to LinkedIn profiles for immediate insights. Full Name: Verified first and last names. Job Title: Current job titles, and prior experience. Company Information: Company name, LinkedIn URL, domain, and location. Work and Per...

 Facebook
Facebook Twitter
Twitter Email
Email
Our dataset provides detailed and precise insights into the business, commercial, and industrial aspects of any given area in the USA (Including Point of Interest (POI) Data and Foot Traffic. The dataset is divided into 150x150 sqm areas (geohash 7) and has over 50 variables. - Use it for different applications: Our combined dataset, which includes POI and foot traffic data, can be employed for various purposes. Different data teams use it to guide retailers and FMCG brands in site selection, fuel marketing intelligence, analyze trade areas, and assess company risk. Our dataset has also proven to be useful for real estate investment.- Get reliable data: Our datasets have been processed, enriched, and tested so your data team can use them more quickly and accurately.- Ideal for trainning ML models. The high quality of our geographic information layers results from more than seven years of work dedicated to the deep understanding and modeling of geospatial Big Data. Among the features that distinguished this dataset is the use of anonymized and user-compliant mobile device GPS location, enriched with other alternative and public data.- Easy to use: Our dataset is user-friendly and can be easily integrated to your current models. Also, we can deliver your data in different formats, like .csv, according to your analysis requirements. - Get personalized guidance: In addition to providing reliable datasets, we advise your analysts on their correct implementation.Our data scientists can guide your internal team on the optimal algorithms and models to get the most out of the information we provide (without compromising the security of your internal data).Answer questions like: - What places does my target user visit in a particular area? Which are the best areas to place a new POS?- What is the average yearly income of users in a particular area?- What is the influx of visits that my competition receives?- What is the volume of traffic surrounding my current POS?This dataset is useful for getting insights from industries like:- Retail & FMCG- Banking, Finance, and Investment- Car Dealerships- Real Estate- Convenience Stores- Pharma and medical laboratories- Restaurant chains and franchises- Clothing chains and franchisesOur dataset includes more than 50 variables, such as:- Number of pedestrians seen in the area.- Number of vehicles seen in the area.- Average speed of movement of the vehicles seen in the area.- Point of Interest (POIs) (in number and type) seen in the area (supermarkets, pharmacies, recreational locations, restaurants, offices, hotels, parking lots, wholesalers, financial services, pet services, shopping malls, among others). - Average yearly income range (anonymized and aggregated) of the devices seen in the area.Notes to better understand this dataset:- POI confidence means the average confidence of POIs in the area. In this case, POIs are any kind of location, such as a restaurant, a hotel, or a library. - Category confidences, for example"food_drinks_tobacco_retail_confidence" indicates how confident we are in the existence of food/drink/tobacco retail locations in the area. - We added predictions for The Home Depot and Lowe's Home Improvement stores in the dataset sample. These predictions were the result of a machine-learning model that was trained with the data. Knowing where the current stores are, we can find the most similar areas for new stores to open.How efficient is a Geohash?Geohash is a faster, cost-effective geofencing option that reduces input data load and provides actionable information. Its benefits include faster querying, reduced cost, minimal configuration, and ease of use.Geohash ranges from 1 to 12 characters. The dataset can be split into variable-size geohashes, with the default being geohash7 (150m x 150m).

 Facebook
Facebook Twitter
Twitter Email
Email
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The enterprise database market size is projected to see significant growth over the coming years, with a valuation of USD 91.5 billion in 2023, and is expected to reach USD 171.1 billion by 2032, growing at a compound annual growth rate (CAGR) of 7.2% during the forecast period. This growth is driven by the increasing demand for efficient data management solutions across various industries and the rise in digital transformation initiatives that require robust database systems. The growth factors include advancements in cloud computing, the growing need for real-time data analytics, and the integration of artificial intelligence and machine learning in data management.
One of the primary growth factors in the enterprise database market is the increasing adoption of cloud-based solutions. Organizations are rapidly moving towards cloud environments due to their scalability, cost-effectiveness, and flexibility. Cloud databases offer better accessibility and reduced infrastructure costs, making them an attractive option for businesses of all sizes. Additionally, with the proliferation of data generated from various sources such as social media, IoT devices, and online transactions, the need for scalable and efficient data storage solutions is more critical than ever. Cloud-based databases provide the requisite infrastructure to handle this data surge efficiently, further propelling market growth.
Another significant driver for the enterprise database market is the rise of big data analytics. As businesses strive to harness the power of data for insights and decision-making, the demand for robust database systems capable of handling large volumes of data has intensified. Enterprises are looking for databases that not only store data but also enable advanced analytics to derive actionable insights. This trend is particularly prevalent in industries like retail, healthcare, and BFSI, where data-driven decisions can lead to improved customer experiences, better risk management, and optimized operations. The integration of artificial intelligence and machine learning with enterprise databases is further enhancing their capabilities, allowing for predictive analytics and automating data processing tasks.
The growing emphasis on data security and compliance is also contributing to the expansion of the enterprise database market. With the increasing incidences of data breaches and stringent regulatory requirements, organizations are prioritizing secure database solutions that offer robust data protection measures. Databases with built-in security features such as encryption, access control, and regular auditing are in high demand. Furthermore, industry-specific compliance standards like GDPR in Europe and HIPAA in the US are driving businesses to invest in databases that ensure compliance and mitigate the risk of penalties, thus fueling market growth.
Regionally, North America is expected to dominate the enterprise database market due to the presence of major technology companies and early adoption of advanced technologies. The Asia Pacific region, however, is anticipated to witness the fastest growth rate during the forecast period, driven by rapid industrialization, the proliferation of SMEs, and increasing investments in digital infrastructure by countries like China, India, and Japan. The growing focus on smart cities and digital transformation initiatives in these countries is further boosting the demand for enterprise databases. Europe also holds a significant share of the market, with widespread adoption of cloud technologies and heightened focus on data privacy and security driving market expansion.
Industrial Databases play a crucial role in the enterprise database market, particularly as industries undergo digital transformation. These databases are designed to manage and process large volumes of industrial data generated from various sources such as manufacturing processes, supply chain operations, and IoT devices. The ability to handle real-time data analytics and provide actionable insights is essential for industries aiming to optimize operations and enhance productivity. As industries continue to adopt smart manufacturing practices, the demand for industrial databases that offer scalability, reliability, and integration with advanced technologies like AI and machine learning is on the rise. This trend is expected to contribute significantly to the growth of the enterprise database market, as businesses seek to leverage data for competitive advantage and operational efficiency.
<br /
 Facebook
Facebook Twitter
Twitter Email
Email
At CompanyData.com (BoldData), we provide trusted, verified company data sourced directly from official trade registers. For Denmark, we offer detailed information on over 939,691 active businesses, giving you access to one of Europe’s most transparent and digitally advanced markets.
Our Denmark database includes essential firmographic fields such as company name, registration number, industry classification, company size, revenue estimates and ownership hierarchies. We also offer valuable contact information including executive names, job titles, email addresses and mobile numbers, supporting your outreach and engagement strategies.
Whether you need data for compliance checks, KYC and AML verification, sales prospecting, CRM enrichment, market analysis or AI training, our verified Danish company data is accurate, complete and ready to use.
Choose from multiple delivery options tailored to your workflow: • Custom-built lists based on your specific criteria • Full national databases for in-depth market research • Real time access through our API • Flexible file formats including Excel and CSV • Enrichment services to update and enhance your existing records
With access to 939,691 verified companies across more than 200 countries, CompanyData.com (BoldData) delivers the global scale and local precision needed to grow your business. From startups to multinational enterprises, we help clients reduce risk, unlock new markets and make smarter decisions with data they can rely on.
Looking to expand in Denmark or connect with businesses worldwide? Partner with CompanyData.com for accurate, compliant and ready-to-use company data.

 Facebook
Facebook Twitter
Twitter Email
Email
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
NEW!: Use the new Business Account Number lookup tool.
SUMMARY This dataset includes the locations of businesses that pay taxes to the City and County of San Francisco. Each registered business may have multiple locations and each location is a single row. The Treasurer & Tax Collector’s Office collects this data through business registration applications, account update/closure forms, and taxpayer filings. Business locations marked as “Administratively Closed” have not filed or communicated with TTX for 3 years, or were marked as closed following a notification from another City and County Department.
The data is collected to help enforce the Business and Tax Regulations Code including, but not limited to: Article 6, Article 12, Article 12-A, and Article 12-A-1. http://sftreasurer.org/registration.
HOW TO USE THIS DATASET
To learn more about using this dataset watch this video. To update your listing or look up your BAN see this FAQ: Registered Business Locations Explainer

 Facebook
Facebook Twitter
Twitter Email
Email
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

 Facebook
Facebook Twitter
Twitter Email
Email
The Department of State keeps a record of every filing for every incorporated business in the state of New York. This dataset contains information on all active corporations as of the last business day of the specified month and year.

 Facebook
Facebook Twitter
Twitter Email
Email
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Numbers of enterprises and local units produced from a snapshot of the Inter-Departmental Business Register (IDBR) taken on 14 March 2025.

 Facebook
Facebook Twitter
Twitter Email
Email
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset offers a comprehensive and varied analysis of an organization's employees, focusing on areas such as employee attrition, personal and job-related factors, and financials. Included are numerous parameters such as Age, Gender, Marital Status, Business Travel Frequency, Daily Rate of Pay, Departmental Information such as Distance From Home Office or Education Level Obtained by the employee in question. Also included is a variant series of parameters related to the job being performed such as Job Involvement (level), Job Level (relative to similar roles within the same organization), Job Role specifically meant for that individual(function/task), total working hours in a week/month/year be it overtime or standard hours for a given role. Furthermore detailed aspects include Percent Salary Hike during their tenure with the company from promotion or otherwise , Performance Rating based on specific criteria established by leadership , Relationship Satisfaction among peers at workplace but also taking into account outside family members that can influence stress levels in varying capacities ,Monthly Income considered at its starting point once hired then compared against their monthly payrate with overtime hours included if applicable along with Number Companies Worked before if any. Lastly the Retirement Status commonly known as Attrition is highlighted; covering whether there was an intent to stay with one employer through retirement age or if attrition took place for reasons beyond ones control earlier than expected . Through this dataset you can get an insight into various major aspect regarding today's workforce management philosphies which have changed drastically over time due to advancements in technology
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Understand the variables that make up this dataset. The dataset includes several personal and job-related variables such as Age, Gender, Marital Status, Business Travel, Daily Rate, Department, Distance From Home, Education, Education Field, Employee Count, Employee Number, Environment Satisfaction Hoursly Rate and so on. Knowing what each variable is individuallly will help when exploring employee attrition as a whole.
- Analyze the data for patterns as well as outliers or anomalies either at an individual level or across all of the data points together. Identifying these patterns or discrepancies can offer insight into factors that are related to employee attrition.
- Visualize the data using charts and graphs to allow for easy understanding of which relationships might be causing higher levels of employees leaving the organization over time dimensions like age or job role can be key factors in employee attrition rates visually displaying how they relate to one another can provide clarity into what needs to change within an organization in order to reduce attrition rates
- Explore relationships between pairs of variables through correlation analysis correlations are measures of how strongly two variables are related when looking at employment retention it’s important to analyze correlations at both an individual level and for all variables together showing which pairings have more influence than others when it comes to influencing employee decisions
5 Use descriptive analytics methods such as scatter plots histograms boxplots etc with aggregated values from each field like average age average monthly income etc These analytics help gain a deeper understanding about where changes need to be made internally
6 Utilize predictive analytics with more advanced techniques such as regressions clustering decision trees in order identify trendsfrom past data points then build models on those insights from different perspectives helping further prepare organizations against potential high levelsinvolving employees departing ?
- Identifying performance profiles of employees at risk for attrition through predictive analytics and using this insight to create personalized development plans or retention strategies.
- Using the data to assess the impact of different financial incentives or variations in job role/structure on employee attitudes, satisfaction and ultimately attrition rates.
- Analyzing different age groups' responses to various perks or turnover patterns in order to understand how organizations can better engage different demographic segments
If you use this dataset in your research, pl...

 Facebook
Facebook Twitter
Twitter Email
Email
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains HR information for employees of a multinational corporation (MNC). It includes 2 Million (20 Lakhs) employee records with details about personal identifiers, job-related attributes, performance, employment status, and salary information. The dataset can be used for HR analytics, including workforce distribution, attrition analysis, salary trends, and performance evaluation.
This data is available as a CSV file. We are going to analyse this data set using the Pandas. This analyse will be helpful for those working in HR domain.
Q.1) What is the distribution of Employee Status (Active, Resigned, Retired, Terminated) ?
Q.2) What is the distribution of work modes (On-site, Remote) ?
Q.3) How many employees are there in each department ?
Q.4) What is the average salary by Department ?
Q.5) Which job title has the highest average salary ?
Q.6) What is the average salary in different Departments based on Job Title ?
Q.7) How many employees Resigned & Terminated in each department ?
Q.8) How does salary vary with years of experience ?
Q.9) What is the average performance rating by department ?
Q.10) Which Country have the highest concentration of employees ?
Q.11) Is there a correlation between performance rating and salary ?
Q.12) How has the number of hires changed over time (per year) ?
Q.13) Compare salaries of Remote vs. On-site employees — is there a significant difference ?
Q.14) Find the top 10 employees with the highest salary in each department.
Q.15) Identify departments with the highest attrition rate (Resigned %).
Enrol in our Udemy courses : 1. Python Data Analytics Projects - https://www.udemy.com/course/bigdata-analysis-python/?referralCode=F75B5F25D61BD4E5F161 2. Python For Data Science - https://www.udemy.com/course/python-for-data-science-real-time-exercises/?referralCode=9C91F0B8A3F0EB67FE67 3. Numpy For Data Science - https://www.udemy.com/course/python-numpy-exercises/?referralCode=FF9EDB87794FED46CBDF
1) Unnamed: 0 – Index column (auto-generated, not useful for analysis, will be deleted).
2) Employee_ID – Unique identifier assigned to each employee (e.g., EMP0000001).
3) Full_Name – Full name of the employee.
4) Department – Department in which the employee works (e.g., IT, HR, Marketing, Operations).
5) Job_Title – Designation or role of the employee (e.g., Software Engineer, HR Manager).
6) Hire_Date – The date when the employee was hired by the company.
7) Location – Geographical location of the employee (city, country).
8) Performance_Rating – Performance evaluation score (numeric scale, higher is better).
9) Experience_Years – Number of years of professional experience the employee has.
10) Status – Current employment status (e.g., Active, Resigned).
11) Work_Mode – Mode of working (e.g., On-site, Hybrid, Remote).
12) Salary_INR – Annual salary of the employee in Indian Rupees.

 Facebook
Facebook Twitter
Twitter Email
Email
Use Coresignal's Company API to explore and filter our extensive, regularly updated Companies dataset directly. Easily integrate this API into your workflow or use it to look up specific company records on demand. This tool is perfect for enhancing investing and lead generation efforts.
Two ways to use Company API
Search. Use specific parametric filters, such as location, industry, size, or specific keywords to narrow down your search and pull URL lists.
Enrichment. Enrich your data using specific URLs or IDs to pull full records thanks to the 1:1 type matching.

 Facebook
Facebook Twitter
Twitter Email
Email
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Jamaica number dataset makes your telemarketing more beneficial. Thus, this Jamaica number dataset has correct and up-to-date mobile numbers for direct marketing. As of 2024, there are about 3.27 Million mobile phone connections in Jamaica. This number is a bit higher than the total population, which is around 2.83 Million. Our List To Data website can assist in getting speedy replies from new clients for publicity. Besides, the Jamaica number dataset is effective for SMS marketing as well. As well as you have multiple chances to earn huge from other countries. So, using this contact number library is a perfect choice for reaching people in specific places. By using our library, you can enhance your marketing and find new B2C clients easily. Jamaica phone data is a great way to help your business grow. Also, this Jamaica phone data provides the most real and active phone numbers so you can easily reach people in Jamaica. Everybody can select who they want to contact based on their location, what their company does, or how big their company is. Further, the Jamaica phone data is very authentic and useful for finding new customers. At the same time, the sellers can deliver sales promotions and many offers to the consumers. Also, they can connect with the largest group of customers quickly in a selected area. List To Data includes contact leads for both businesses and individuals. Jamaica phone number list will make your business more profitable. Most importantly, a Jamaica phone number list plays a vital role in marketing and business, so take it now. Just visit our List To Data website today to get the most recent phone numbers for any business. With 95% precision, this contact book offers you contact numbers for many people who might want your services. So, the Jamaica phone number list is a great tool for reaching new customers through phone calls. In fact, you can pick from other packages on our website that fit your needs and budget. If your business is big or small, our mobile number data will help you in your entire journey. Ultimately, our team supplies this correct contact number cautiously as per your needs.

 Facebook
Facebook Twitter
Twitter Email
Email
The Small Business Administration maintains the Dynamic Small Business Search (DSBS) database. As a small business registers in the System for Award Management, there is an opportunity to fill out the small business profile. The information provided populates DSBS. DSBS is another tool contracting officers use to identify potential small business contractors for upcoming contracting opportunities. Small businesses can also use DSBS to identify other small businesses for teaming and joint venturing.

 Facebook
Facebook Twitter
Twitter Email
Email
The Annual Respondents Database (ARD) is constructed from a compulsory business survey. Until 1997 it was created out of the Annual Censuses of Production and Construction (ACOP and ACOC); these were combined into the Annual Business Inquiry (ABI) in 1998. The ARD is a census of large businesses, and a sample of smaller ones. Smaller firms may receive a "short form". These do not require detailed breakdowns of totals. Hence for certain variables the values may be imputed from third party sources or estimated rather than returned by respondents.
This dataset is created for the Economic Analysis and Satellite Accounts Division for research purposes. To create the ARD, the other surveys are converted into a single consistent format linked by the Inter-Departmental Business Register references over time. Northern Ireland data is held up to 2001. From 2002, the ABI is collected and stored separately in Northern Ireland. Special permission is required to use new NI ABI data.
ABI background The ABI is the financial information survey conducted by the Office for National Statistics (ONS). This is a statutory survey conducted under the Statistics of Trade Act 1947. Organisations are obliged under this legislation to provide a response. Businesses are sampled from the ONS business register current at the time of drawing the sample: first the CSO Business Register, which ran until 1993; then the Inter-Departmental Business Register, which has run from 1994 onwards. The ONS holds firms' responses to the ABI in the Annual Respondents Database (ARD).
The ABI replaced the following annual survey systems in 1998:Annual Employment Survey (AES)Annual Censuses of Production and Construction (ACOP/ACOC), which include the Purchases Inquiry (PI)The six annual Distribution and Services (DSI) inquiries (Annual Wholesale Inquiry; Annual Retail Inquiry; Annual Motor Trades Inquiry; Annual Catering Inquiry; Annual Property Inquiry; and Annual Service Trades InquiryUntil 1997 the data were limited to the production and construction industries surveyed by the ACOP and ACOC (construction from 1993 only). The incorporation of the DSI inquiries for six additional sectors is reflected in the number of individual business contributors rising from approximately 15,000 for 1980 to 1996 to approximately 50,000 for 1997/98 and to over 70,000 for 1999.
The ABI is one of the most comprehensive surveys undertaken of business organisations in the UK, covering over 100 key economic variables, and approximately two-thirds of the UK economy. Detailed variables for turnover, employment, costs, capital and the derivation of sales and profits are included. A firm-level measure of Gross Value Added (GVA) is also generated so that the productivity of organisations can be evaluated.
The ABI samples UK businesses and other such establishments according to their employment size and industry sector. It is a census of large businesses, and a stratified sample of small and medium sized enterprises. The stratified sampling framework means that smaller firms move in and out of the survey. The forms are customised for industry sectors and sub-sectors. The statistics produced from the sample data are used primarily to assist in the generation of the National Accounts and the measurement of Gross Domestic Product (GDP).
A number of different form-types are used in the survey. Long form-types are sent to all businesses with an employment of 250 or more and also to a proportion of selected businesses with lower employment. Short form-types are sent to the remaining selected businesses. The forms differ in that long form-types ask for a detailed breakdown of purchases; employment costs; taxes, duties and levies etc, whereas short form-types just ask for the totals of these variables.
The data are collected in two parts: Part 1 is an employment record, collected as soon as possible after 12th December. Part 2 is for financial information, which may be submitted up to twelve months after the financial year end.
Geographical references: postcodes The postcodes available in these data are pseudo-anonymised postcodes. The real postcodes are not available due to the potential risk of identification of the observations. However, these replacement postcodes retain the inherent nested characteristics of real postcodes, and will allow researchers to aggregate observations to other geographic units, e.g. wards, super output areas, etc. In the dataset, the variable of the replacement postcode is 'new_PC'.
Linking to other business studies These data contain Inter-Departmental Business Register reference numbers. These are anonymous but unique reference numbers assigned to business organisations.

 Facebook
Facebook Twitter
Twitter Email
Email
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global open source database market size was valued at approximately USD 15.5 billion in 2023 and is projected to reach around USD 40.6 billion by 2032, expanding at a compound annual growth rate (CAGR) of 11.5% during the forecast period. The growth of this market is primarily driven by the increasing adoption of open-source databases by both SMEs and large enterprises due to their cost-effectiveness and flexibility.
A significant growth factor for the open source database market is the rising demand for data analytics and business intelligence across various industries. Organizations are increasingly leveraging big data to gain actionable insights, enhance decision-making processes, and improve operational efficiency. Open source databases provide the scalability and performance required to handle large volumes of data, making them an attractive option for businesses looking to maximize their data-driven strategies. Additionally, the continuous advancements and contributions from the open-source community help in keeping these databases at the cutting edge of technology.
Another driving factor is the cost-efficiency associated with open-source databases. Unlike proprietary databases, which can be expensive due to licensing fees, open-source databases are usually free to use, offering a significant cost advantage. This factor is especially crucial for small and medium enterprises (SMEs), which often operate with limited budgets. The lower total cost of ownership, combined with the flexibility to customize the database according to specific needs, makes open-source solutions highly appealing for businesses of all sizes.
The increasing trend of digital transformation is also playing a crucial role in the growth of the open source database market. As businesses across various sectors accelerate their digital initiatives, the need for robust, scalable, and efficient data management solutions becomes paramount. Open-source databases provide the agility and innovation that organizations require to keep up with the rapidly changing digital landscape. Moreover, the support for cloud deployment further enhances their appeal, providing businesses with the scalability and flexibility needed to adapt to evolving technological demands.
From a regional perspective, North America holds a significant share in the open source database market, driven by the presence of major technology companies and a highly developed IT infrastructure. The region's focus on technological innovation and early adoption of advanced technologies contributes to its dominant position. Europe follows closely, with increasing investments in digital transformation initiatives. The Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by rapid technological advancements, a burgeoning IT sector, and increased adoption of open-source solutions by businesses.
Relational Databases Software plays a crucial role in the open-source database market, offering structured data management solutions that are essential for various business applications. These databases are known for their ability to handle complex queries and transactions, making them ideal for industries that require high levels of data integrity and consistency. The flexibility and robustness of relational databases software allow organizations to efficiently manage large volumes of structured data, which is critical for applications such as financial systems, enterprise resource planning, and customer relationship management. As businesses continue to prioritize data-driven decision-making, the demand for relational databases software is expected to grow, further driving the expansion of the open-source database market.
The open source database market is segmented into SQL, NoSQL, and NewSQL databases. SQL databases are the most widely used and have been the backbone of data management for decades. They offer robust transaction management and are ideal for structured data storage and retrieval. The ongoing improvements in SQL databases, such as enhanced performance and security features, continue to make them a preferred choice for many organizations. Additionally, the availability of various SQL-based open-source solutions like MySQL, PostgreSQL, and MariaDB provides organizations with reliable options to manage their data effectively.
NoSQL databases are gainin

 Facebook
Facebook Twitter
Twitter Email
Email
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Bank Enterprise Surveys (WBES) are nationally representative firm-level surveys with top managers and owners of businesses in over 150 economies that provide insight into many business environment topics such as access to finance, corruption, infrastructure, and performance, among others. Data are used to create over 100 indicators that benchmark the business environment across the globe. Each country is surveyed every 3 years. In addition to country-level aggregated data, firm-level data are available to registered users on the Enterprise Surveys site at http://www.enterprisesurveys.org/.
Details on the methodology are available at https://www.enterprisesurveys.org/en/methodology

 Facebook
Facebook Twitter
Twitter Email
Email
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains information about world's biggest companies.
Among them you can find companies founded in the US, the UK, Europe, Asia, South America, South Africa, Australia.
The dataset contains information about the year the company was founded, its' revenue and net income in years 2018 - 2020, and the industry.
I have included 2 csv files: the raw csv file if you want to practice cleaning the data, and the clean csv ready to be analyzed.
The third dataset includes the name of all the companies included in the previous datasets and 2 additional columns: number of employees and name of the founder.
In addition there's tesla.csv file containing shares prices for Tesla.