84 datasets found

d
WebAutomation Employee Data | Github Developer Profiles | Global 40M+...
datarade.ai
.json, .csv
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Webautomation (2022). WebAutomation Employee Data | Github Developer Profiles | Global 40M+ Developer Records | Explore Developer Repositories, Contributions and more [Dataset]. https://datarade.ai/data-products/webautomation-github-developer-profiles-dataset-global-webautomation
Explore at:
.json, .csvAvailable download formats
Dataset updated
Dec 5, 2022
Dataset authored and provided by
Webautomation
Area covered
Montserrat, Falkland Islands (Malvinas), Estonia, Paraguay, Guadeloupe, Suriname, Uruguay, Canada, Greenland, Ukraine
Description
Extensive Developer Coverage: Our employee dataset includes a diverse range of developer profiles from GitHub, spanning various skill levels, industries, and expertise. Access information on developers from all corners of the software development world.

Developer Profiles: Explore detailed developer profiles, including user bios, locations, company affiliations, and skills. Understand developer backgrounds, experiences, and areas of expertise.

Repositories and Contributions: Access information about the repositories created by developers and their contributions to open-source projects. Analyze the projects they've worked on, their coding activity, and the impact they've made on the developer community.

Programming Languages: Gain insights into the programming languages that developers are proficient in. Identify skilled developers in specific programming languages that align with your project needs.

Customizable Data Delivery: The dataset is available in flexible formats, such as CSV, JSON, or API integration, allowing seamless integration with your existing data infrastructure. Customize the data to meet your specific research and analysis requirements.
Human Resources Data Set Sample
kaggle.com
zip
Updated Aug 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarkhon (2024). Human Resources Data Set Sample [Dataset]. https://www.kaggle.com/datasets/tarkhon/human-resources-data-set-sample/data
Explore at:
zip(8268330 bytes)Available download formats
Dataset updated
Aug 10, 2024
Authors
Tarkhon
Description
This dataset provides a detailed SQL-based employee database, which is ideal for practicing SQL queries and performing database-related operations. The dataset is structured to simulate a real-world organizational database, featuring various tables related to employee information, job roles, departments, and more.

The dataset is sourced from the GitHub repository https://github.com/cmoeser5/Employee-Database-SQL. It is intended for educational purposes, particularly for learning and practicing SQL.

Tables Included - employees: Contains records of employees with fields such as employee ID, name, job title, and department. - departments: Lists departments within the organization with fields including department ID and department name. - jobs: Includes details about job roles with fields such as job ID, job title, and job description. - salaries: Provides salary information for employees, including employee ID, salary amount, and salary date. - titles: Contains historical job title data for employees, including employee ID, job title, and title date.
Employee Database for SQL Case Study
kaggle.com
zip
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riddhi N Divecha (2025). Employee Database for SQL Case Study [Dataset]. https://www.kaggle.com/datasets/riddhindivecha/employee-database-for-sql-case-study/code
Explore at:
zip(890 bytes)Available download formats
Dataset updated
Jun 21, 2025
Authors
Riddhi N Divecha
Description
SQL Case Study Project: Employee Database Analysis 📊

I recently completed a comprehensive SQL project involving a simulated employee database with multiple tables:

🏢 DEPARTMENT

👨‍💼 EMPLOYEE

💼 JOB

🌍 LOCATION

In this project, I practiced and applied a wide range of SQL concepts:

 ✅ Simple Queries  ✅ Filtering with WHERE conditions  ✅ Sorting with ORDER BY  ✅ Aggregation using GROUP BY and HAVING  ✅ Multi-table JOINs  ✅ Conditional Logic using CASE  ✅ Subqueries and Set Operators

💡 Key Highlights:

Salary grade classifications

Department-level insights

Employee trends based on hire dates

Advanced queries like Nth highest salary

🛠️ Tools Used:  Azure Data Studio

📂 You can find the entire project and scripts here: 

👉 https://github.com/RiddhiNDivecha/Employee-Database-Analysis

This project helped me sharpen my SQL skills and understand business logic more deeply in a practical context.

💬 I’m open to feedback and happy to connect with fellow data enthusiasts!

SQL #DataAnalytics #PortfolioProject #CaseStudy #LearningByDoing #DataScience #SQLProject
d
Number of Active Employees by Industry
catalog.data.gov
data.ct.gov
+3more
Updated Nov 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2025). Number of Active Employees by Industry [Dataset]. https://catalog.data.gov/dataset/number-of-active-employees-by-industry
Explore at:
Dataset updated
Nov 22, 2025
Dataset provided by
data.ct.gov
Description
Number of active employees, aggregating information from multiple data providers. This series is based on firm-level payroll data from Paychex and Intuit, worker-level data on employment and earnings from Earnin, and firm-level timesheet data from Kronos. This data is compiled by Opportunity Insights. Data notes from Opportunity Insights: Data Source: Paychex, Intuit, Earnin, Kronos Update Frequency: Weekly Date Range: January 15th 2020 until the most recent date available. The most recent date available for the full series depends on the combination of Paychex, Intuit and Earnin data. We extend the national trend of aggregate employment and employment by income quartile by using Kronos timecard data and Paychex data for workers paid on a weekly paycycle to forecast beyond the end of the Paychex, Intuit and Earnin data. Data Frequency: Daily, presented as a 7-day moving average Indexing Period: January 4th - January 31st Indexing Type: Change relative to the January 2020 index period, not seasonally adjusted. More detailed documentation on Opportunity Insights data can be found here: https://github.com/OpportunityInsights/EconomicTracker/blob/main/docs/oi_tracker_data_documentation.pdf
Fake Employee Dataset
kaggle.com
zip
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oyekanmi Olamilekan (2023). Fake Employee Dataset [Dataset]. https://www.kaggle.com/datasets/oyekanmiolamilekan/fake-employee-dataset
Explore at:
zip(162874 bytes)Available download formats
Dataset updated
Nov 20, 2023
Authors
Oyekanmi Olamilekan
Description
Creating a robust employee dataset for data analysis and visualization involves several key fields that capture different aspects of an employee's information. Here's a list of fields you might consider including: Employee ID: A unique identifier for each employee. Name: First name and last name of the employee. Gender: Male, female, non-binary, etc. Date of Birth: Birthdate of the employee. Email Address: Contact email of the employee. Phone Number: Contact number of the employee. Address: Home or work address of the employee. Department: The department the employee belongs to (e.g., HR, Marketing, Engineering, etc.). Job Title: The specific job title of the employee. Manager ID: ID of the employee's manager. Hire Date: Date when the employee was hired. Salary: Employee's salary or compensation. Employment Status: Full-time, part-time, contractor, etc. Employee Type: Regular, temporary, contract, etc. Education Level: Highest level of education attained by the employee. Certifications: Any relevant certifications the employee holds. Skills: Specific skills or expertise possessed by the employee. Performance Ratings: Ratings or evaluations of employee performance. Work Experience: Previous work experience of the employee. Benefits Enrollment: Information on benefits chosen by the employee (e.g., healthcare plan, retirement plan, etc.). Work Location: Physical location where the employee works. Work Hours: Regular working hours or shifts of the employee. Employee Status: Active, on leave, terminated, etc. Emergency Contact: Contact information of the employee's emergency contact person. Employee Satisfaction Survey Responses: Data from employee satisfaction surveys, if applicable.

Code Url: https://github.com/intellisenseCodez/faker-data-generator
Task content of occupations based on the ESCO database
zenodo.org
csv
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Matysiak; Anna Matysiak; Hardy Wojciech; Hardy Wojciech; Lucas van der Velde; Lucas van der Velde (2024). Task content of occupations based on the ESCO database [Dataset]. http://doi.org/10.5281/zenodo.12699781
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12699781
Dataset updated
Jul 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anna Matysiak; Anna Matysiak; Hardy Wojciech; Hardy Wojciech; Lucas van der Velde; Lucas van der Velde
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
When using this resource please cite the article for which it was developed (an accepted version is uploaded in this repository):

Matysiak, A., Hardy, W. and van der Velde Lucas (2024). Structural Labour Market Change and Gender Inequality in Earnings. Work, Employment and Society, DOI: 10.1177/09500170241258953.

The dataset contributes a categorisation of tasks conducted across occupations, with a distinction between social tasks directed "inward" (e.g. towards members of own organisation, co-workers, employees, etc.) and those directed "outward" (e.g. towards students, clients, patients, etc.). This provides more depth to the discussion on technology, labour market changes and gender differences in how these trends are experienced. The dataset builds on the ESCO database v1.0.8 found here.

The following task categories are available at occupation levels:

Social

Social Inward

Social Outward

Analytical*

Routine**

Manual

* Additionally, a distinction between technical and creative/artistic tasks is provided although it is not used in Matysiak et al. (2024).

** In the initial files, some task items are categorised as Routine, while some are categorised as Non-Routine. In the subsequent steps for occupation-level information, the Routine task score consists of a difference between the Routine score and the Non-Routine score (see the paper for more information).

The repository contains four data files at different stages of task development. For the codes, please see the accompanying GitHub repository. The ESCO database covers, i.a., skills/competences and attitudes, to which we jointly refer as task items (as is standard in the literature using other databases such as ONET). For detailed methodology and interpretation see Matysiak et al. (2024).

1) esco_tasks.csv - encompasses all ESCO occupations and all task items with tags on task categorisation into broader categories. It also includes the split between the "essential" and "optional" task items and the variant "management-focused" and "care-focused" measures of social tasks as used in the robustness checks in the Matysiak et al. (2024) paper.

2) esco_onet_tasks.csv - additionally includes pre-prepped task items from the ONET database, traditionally used to describe the task content of occupations. These data can be used to validate the ESCO measures.

3) esco_onet_matysiaketal2024.csv - contains a subset of the variables from esco_onet_tasks.csv used for the Matysiak et al. (2024) paper.

4) tasks_isco08_2018_stdlfs.csv - contains the final task measures after the standardisation and derivation procedures described in Matysiak et al. (2024).

5a) Matysiak et al 2024 - Structural Labour Market Change and Gender Inequality in Earnings.pdf - the Accepted Manuscript version of the Matysiak et al. (2024) paper.

5b) Appendix to Matysiak et al 2024 - Structural Labour Market Change and Gender Inequality in Earnings.pdf - the appendix with additional tables and figures for the paper.

For all details on the procedures, applied crosswalks, methods, etc. please refer to the GitHub repository and the Matysiak et al. (2024) paper.
SF Salaries (gender column included)
kaggle.com
zip
Updated Dec 11, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronald Troncoso (2016). SF Salaries (gender column included) [Dataset]. https://www.kaggle.com/datasets/ronaldtroncoso20/sf-salaries-extended
Explore at:
zip(5259458 bytes)Available download formats
Dataset updated
Dec 11, 2016
Authors
Ronald Troncoso
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
San Francisco
Description
This is an edited dataset that was originally uploaded by Kaggle, and this is the original link.

This data allows us to take a good look into how government employees were compensated from 2011 to 2014.

Edits

This data contains the names, job title, and compensation for San Francisco city employees. My only contribution to this dataset is that I added a gender column based on the first name of the person. If you would like to see how I did this, please take a look at my git hub repository. https://github.com/RonaldTroncoso/SF_Salary_Trends

Here is the original for this data.
EXIM Bank's Employees by Level
archive.data.gov.my
Updated Dec 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
archive.data.gov.my (2018). EXIM Bank's Employees by Level [Dataset]. https://archive.data.gov.my/data/dataset/exim-bank-s-employees-by-level
Explore at:
Dataset updated
Dec 13, 2018
Dataset provided by
Data.govhttps://data.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This datasets shows the number of employees in EXIM Bank by level
h
bird-sql-train-xresults
huggingface.co
Updated Aug 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akhil Dhavala (2025). bird-sql-train-xresults [Dataset]. https://huggingface.co/datasets/1sf/bird-sql-train-xresults
Explore at:
Dataset updated
Aug 31, 2025
Authors
Akhil Dhavala
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is the training data with execution results of https://bird-bench.github.io/ Sample [ { "db_id": "book_publishing_company", "question": "Please list the first names of the employees who work as Managing Editor.", "evidence": "Managing Editor is a job description which refers to job_desc", "SQL": "SELECT T1.fname FROM employee AS T1 INNER JOIN jobs AS T2 ON T1.job_id = T2.job_id WHERE T2.job_desc = 'Managing Editor'", "execution_result": { "status": "success"… See the full description on the dataset page: https://huggingface.co/datasets/1sf/bird-sql-train-xresults.
APS Employment Data 31 December 2013
researchdata.edu.au
Updated Jun 19, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Public Service Commission (2017). APS Employment Data 31 December 2013 [Dataset]. https://researchdata.edu.au/aps-employment-data-december-2013/3801334
Explore at:
Dataset updated
Jun 19, 2017
Dataset provided by
Data.govhttps://data.gov/
Authors
Australian Public Service Commission
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Area covered

Description
These tables present a summary of employment under the Public Service Act 1999 at 31 December 2013 and during the 2013 calendar year. The data is an update of that presented in the APS Statistical Bulletin 2012-13.\r Data becomes available at six monthly intervals — through this summary for calendar year data and through the annual State of the Service Report (SOSR) and APS Statistical Bulletin (the Bulletin). The data in these tables is sourced from the APS Employment Database (APSED), which contains data extracted from agencies’ HR systems.\r \r The Australian Public Service Commission continues to work with agencies to improve the quality and timeliness of the data they provide to APSED. Each year extensive audits and error checking of APSED are undertaken, to ensure that sound conclusions can be drawn from the data. Through this audit process, previously published data has been updated. The June 2013 data published in SOSR and the Bulletin has been revised.\r \r As in the Bulletin, a headcount approach is used in these tables—that is, people working part-time are aggregated with people working full-time without weighting. Data also includes inoperative staff. Employees’ classification in these tables refers to their base or substantive classification.
g
Coronavirus (Covid-19) Data in the United States
github.com
openicpsr.org
+4more
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
Explore at:
csvAvailable download formats
Dataset provided by
New York Times
License
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Interactive Equity Analysis Tool and Data (formerly ETAs)
hub.arcgis.com
opendata.atlantaregional.com
Updated Feb 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia Association of Regional Commissions (2019). Interactive Equity Analysis Tool and Data (formerly ETAs) [Dataset]. https://hub.arcgis.com/documents/0aabbeff23614f87a5e0450f4d751ba1
Explore at:
Dataset updated
Feb 27, 2019
Dataset provided by
The Georgia Association of Regional Commissions
Authors
Georgia Association of Regional Commissions
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Atlanta Regional Commission (ARC) has created an interactive equity analysis tool to help address these questions as featured on 33n, the ARC Research & Analytics Blog. In the past, the ARC used a static map of their Equitable Target Areas (ETA) for project evaluation. The ETA index was a tool that helped ARC better identify areas with minority or low-income populations to understand how proposed projects might impact these groups.Now, Atlanta Regional Commission has created a transportation data platform called DASH where a more nuanced equity analysis can be visualized. This equity analysis helps ARC understand where people reside across all nine of the federally protected classes. Coming soon, there will be transportation data added to DASH. Combined with the data already included, the equity analysis provided by DASH will help guide regional transportation and land use planning and will be used as input for project prioritization and evaluation, monitoring resource allocation, and assisting in decision-making.ARC's Equity Analysis is widely used throughout the agency to demonstrate compliance with Federal Guidance, including Title VI of the Civil Rights Act of 1964, Limited English Proficiency Executive Order, Americans with Disabilities Act of 1990, Environmental Justice Executive Order, and FHWA and FTA's Title VI and Environmental Justice documents.MethodologyThe Equity Analysis methodology generates a composite score based on the concentrations of the criteria selected, which is used to meet the nondiscrimination requirements and recommendations of Title VI and EJ for ARC's plans, programs, and decision-making processes.The score calculation is determined by standard deviations relative to a criteria's regional average. This score classifies the concentration of the populations of interest under Title VI and EJ present in every census tract in the region. These population groups are represented by the nine equity analysis criteria: youth, older adults, females, racial minorities, ethnic minorities, foreign-born, limited English proficiency, people with disabilities, and low-income.The data for each of the criteria in the equity analysis are split into five "bins" based on the relative concentration across the region: well below average (score of 0); below average (score of 1); average (score of 2); above average (score of 3); and well above average (score of 4). See Figure 1 below. A summary score of all nine indicators for each census tract (ranging from 0-36) is used to show regional concentrations of populations of interest under Title VI and EJ. A summary score of racial minority, ethnic minority, and low-income for each census tract is used in ARC's Project Evaluation Framework to prioritize projects in the Transportation Improvement Program (TIP). This view is the map default.Bin 2 for each indicator contains census tracts at or near (within a half standard deviation from) the regional average (mean) for that indicator. Bins 4, 3, 1, and 0 are then built out from the regio nal average; Bins 1 and 3 go another full standard deviation out from bin 2, and bins 0 and 4 contain any remaining tracts further out from 1 or 3, respectively.This Equity Analysis supplants previous equity analysis iterations, including ARC's Equitable Targ et Areas (ETAs).The design of this methodology is supported by both FHWA's and FTA's Title VI recommendations to simply identify the protected classes using demographic data from the US Census Bureau as the first step in conducting equity analyses. Additionally, FTA's EJ guidance cautions recipients of federal funds to not be too reliant on population thresholds to determ ine the impact of a program, plan, or policy to a population group, but rather design a meaning ful measure to identify the presence of all protected and considered population groups and then calculate the possibility of discrimination or disproportionately high and adverse effect on these populations.ARC plans to continue the conversation with its staff, partners, and Transportation Equity Advisory Group (TEAG) about measuring and evaluating transportation benefits and burdens, as well as layering the Equity Analysis with supplemental analyses such as access to essential services, affordability, and displacement.Data DisclaimerThis webpage is a public resource using ACS data. The Atlanta Regional Commission (ARC) makes no warranty, representation, or guarantee as to the content, sequence, accuracy, timeliness, or completeness of any of the spatial data or database information provided herein. ARC and partner state, regional, local, and other agencies shall assume no liability for errors, omissions, or inaccuracies in the information provided regardless of how caused, or any decision made or action taken or not taken by any person relying on any information or data furnished within.ARC is committed to enforcing the provisions of Title VI of the Civil Rights Act of 1964 and taking positive and realistic affirmative steps to ensure the protection of rights and opportunities for all persons affected by its programs, services, and activities.CSV DownloadGIS Data Available SoonDate: 2018Equity Analysis Contact Info:Aileen DaneySenior PlannerTransportation Access & Mobility Group470.378.1579adaney@atlantaregional.orgTitle VI Policy and Complaint Contact Info:Brittany ZwaldTitle VI Officer/Grants and Contracts AnalystFinance Group470.378.1494bzwald@atlantaregional.orgFor more information on ARC's Title VI program or to obtain a Title VI Policy and Complaint Form please visit:https://atlantaregional.org/leadership-and-engagement/guidelines-compliance/title-vi-plan-and-program/
An example dataset of interaction logs of software company employees
zenodo.org
csv
Updated Feb 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zeynep Yucel; Zeynep Yucel (2021). An example dataset of interaction logs of software company employees [Dataset]. http://doi.org/10.5281/zenodo.4500028
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4500028
Dataset updated
Feb 17, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Zeynep Yucel; Zeynep Yucel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a sample piece from a dataset of interaction logs recorded from software company employees.

The data set is recorded at an overseas branch office of a Japanese software development company. The subjects are 18 employees of this branch office, who have varying qualifications and responsibilities. The data collection campaign is carried out with the consent of the company. The subjects are informed in a clear manner about the nature and method of the research, and agreed to participate in the experiments.

The data set is composed of the employees' interactions logs, which are basically a registry of their interaction with the GUI. The recording software is a user activity monitoring tool called TaskPit [1], which is designed to be deployed particularly at software development companies. Specifically, it registers a log file containing the name of the active application (i.e. exe name), start and end time of its deployment, its window title, and number of left clicks, right clicks, and key strokes on the active window.

We consider each line of the log file to arise from an action of the subject. Moreover, each action is considered to be associated with a single task of the subject (e.g. Programming) and the tasks associated with those actions are exactly what our study intends to unveil.

A coder, who is a senior student at the department of computer science of Okayama University, carried out manual annotation by assigning a single task to each action (i.e. each line of the log file). To that end, he evaluated the information contained in the columns of the log file and selected one task from the set of potential tasks. Here, by taking into account the background information of the subjects (i.e. being employees of software development company) and the expectations and requirements of our corporate partner, the set of potential tasks is tailored to be comprised of Programming, Test, Documentation, Administration, and Leisure. Note that by documentation we refer to reading, writing or editing of project documentation.

The data set is used to test the task estimation method proposed in our article submitted to Empirical Software Engineering (currently under review) [2]. The codes generated during the current study are publicly available at our repository [3].

References:

[1] Suthipornopas P, Leelaprute P, Monden A, Uwano H, Kamei Y, Ubayashi N, Araki K, Yamada K, Matsumoto K (2017) Industry application of software development task measurement system: Taskpit. IEICE Transactions on Information and Systems (3):462–472

[2] Pellegrin F, Yücel Z, Monden A., Leelaprute P. , Estimating tasks of software company employees based on computer interaction logs, Empirical Software Engineering (under review)

[3] Yücel Z, Software applications and custom codes. https://github.com/yucelzeynep/Task-estimation-from-activity-logs, 2020
O
Womply State-level Business Revenue
data.ct.gov
datasets.ai
+2more
csv, xlsx, xml
Updated May 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opportunity Insights (2022). Womply State-level Business Revenue [Dataset]. https://data.ct.gov/Business/Womply-State-level-Business-Revenue/kypk-e3qu
Explore at:
xlsx, csv, xmlAvailable download formats
Dataset updated
May 9, 2022
Dataset authored and provided by
Opportunity Insights
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Small business transactions and revenue data aggregated from several credit card processors, collected by Womply and compiled by Opportunity Insights. Transactions and revenue are reported based on the ZIP code where the business is located.

Data provided for CT (FIPS code 9), MA (25), NJ (34), NY (36), and RI (44).

Data notes from Opportunity Insights: Seasonally adjusted change since January 2020. Data is indexed in 2019 and 2020 as the change relative to the January index period. We then seasonally adjust by dividing year-over-year, which represents the difference between the change since January observed in 2020 compared to the change since January observed since 2019. We account for differences in the dates of federal holidays between 2019 and 2020 by shifting the 2019 reference data to align the holidays before performing the year-over-year division.

Small businesses are defined as those with annual revenue below the Small Business Administration’s thresholds. Thresholds vary by 6 digit NAICS code ranging from a maximum number of employees between 100 to 1500 to be considered a small business depending on the industry.

County-level and metro-level data and breakdowns by High/Middle/Low income ZIP codes have been temporarily removed since the August 21st 2020 update due to revisions in the structure of the raw data we receive. We hope to add them back to the OI Economic Tracker soon.

More detailed documentation on Opportunity Insights data can be found here: https://github.com/OpportunityInsights/EconomicTracker/blob/main/docs/oi_tracker_data_documentation.pdf
Z
Task content of occupations based on the ESCO database
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated May 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matysiak, Anna; Wojciech, Hardy; van der Velde, Lucas (2024). Task content of occupations based on the ESCO database [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_11092166
Explore at:
Dataset updated
May 10, 2024
Dataset provided by
Faculty of Economic Sciences, University of Warsawy
Faculty of Economic Sciences, University of Warsaw
Authors
Matysiak, Anna; Wojciech, Hardy; van der Velde, Lucas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
When using this resource please cite the article for which it was developed:

Matysiak, A., Hardy, W. and van der Velde Lucas (2024). Structural Labour Market Change and Gender Inequality in Earnings. Work, Employment and Society, vol. (), pp. - (to be filled in upon publication).

The dataset contributes a categorisation of tasks conducted across occupations, with a distinction between social tasks directed "inward" (e.g. towards members of own organisation, co-workers, employees, etc.) and those directed "outward" (e.g. towards students, clients, patients, etc.). This provides more depth to the discussion on technology, labour market changes and gender differences in how these trends are experienced. The dataset builds on the ESCO database v1.0.8 found here.

The following task categories are available at occupation levels:

Social

Social Inward

Social Outward

Analytical*

Routine**

Manual

Additionally, a distinction between technical and creative/artistic tasks is provided although it is not used in Matysiak et al. (2024).

** In the initial files, some task items are categorised as Routine, while some are categorised as Non-Routine. In the subsequent steps for occupation-level information, the Routine task score consists of a difference between the Routine score and the Non-Routine score (see the paper for more information).

The repository contains four data files at different stages of task development. For the codes, please see the accompanying GitHub repository. The ESCO database covers, i.a., skills/competences and attitudes, to which we jointly refer as task items (as is standard in the literature using other databases such as ONET). For detailed methodology and interpretation see Matysiak et al. (2024).

1) esco_tasks.csv - encompasses all ESCO occupations and all task items with tags on task categorisation into broader categories. It also includes the split between the "essential" and "optional" task items and the variant "management-focused" and "care-focused" measures of social tasks as used in the robustness checks in the Matysiak et al. (2024) paper.

2) esco_onet_tasks.csv - additionally includes pre-prepped task items from the ONET database, traditionally used to describe the task content of occupations. These data can be used to validate the ESCO measures.

3) esco_onet_matysiaketal2024.csv - contains a subset of the variables from esco_onet_tasks.csv used for the Matysiak et al. (2024) paper.

4) tasks_isco08_2018_stdlfs.csv - contains the final task measures after the standardisation and derivation procedures described in Matysiak et al. (2024).

For all details on the procedures, applied crosswalks, methods, etc. please refer to the GitHub repository and the Matysiak et al. (2024) paper.
Enterprise-Driven Open Source Software
data.europa.eu
zenodo.org
unknown
Updated Feb 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2020). Enterprise-Driven Open Source Software [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-3653878?locale=da
Explore at:
unknown(8339687)Available download formats
Dataset updated
Feb 6, 2020
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a dataset of open source software developed mainly by enterprises rather than volunteers. This can be used to address known generalizability concerns, and, also, to perform research on open source business software development. Based on the premise that an enterprise's employees are likely to contribute to a project developed by their organization using the email account provided by it, we mine domain names associated with enterprises from open data sources as well as through white- and blacklisting, and use them through three heuristics to identify 17,252 enterprise GitHub projects. We provide these as a dataset detailing their provenance and properties. A manual evaluation of a dataset sample shows an identification accuracy of 89%. Through an exploratory data analysis we found that projects are staffed by a plurality of enterprise insiders, who appear to be pulling more than their weight, and that in a small percentage of relatively large projects development happens exclusively through enterprise insiders. The main dataset is provided as a 17,252 record tab-separated file named enterprise_projects.txt with the following 27 fields. url: the project's GitHub URL project_id: the project's GHTorrent identifier sdtc: true if selected using the same domain top committers heuristic (9,006 records) mcpc: true if selected using the multiple committers from a valid enterprise heuristic (8,289 records) mcve: true if selected using the multiple committers from a probable company heuristic (7,990 records), star_number: number of GitHub watchers commit_count: number of commits files: number of files in current main branch lines: corresponding number of lines in text files pull_requests: number of pull requests most_recent_commit: date of the most recent commit committer_count: number of different committers author_count: number of different authors dominant_domain: the projects dominant email domain dominant_domain_committer_commits: number of commits made by committers whose email matches the project's dominant domain dominant_domain_author_commits: corresponding number for commit authors dominant_domain_committers: number of committers whose email matches the project's dominant domain dominant_domain_authors: corresponding number of commit authors cik: SEC's EDGAR "central index key" fg500: true if this is a Fortune Global 500 company (2,232 records) sec10k: true if the company files SEC 10-K forms (4,178 records) sec20f: true if the company files SEC 20-F forms (429 records) project_name: GitHub project name owner_login: GitHub project's owner login company_name: company name as derived from the SEC and Fortune 500 data owner_company: GitHub project's owner company name license: SPDX license identifier The file cohost_project_details.txt provides the full set of 309,531 cohort projects that are not part of the enterprise data set, but have comparable quality attributes. url: the project's GitHub URL project_id: the project's GHTorrent identifier stars: number of GitHub watchers commit_count: number of commits
APS Employment Data 31 December 2012
researchdata.edu.au
Updated May 20, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Public Service Commission (2013). APS Employment Data 31 December 2012 [Dataset]. https://researchdata.edu.au/aps-employment-data-december-2012/2985559
Explore at:
Dataset updated
May 20, 2013
Dataset provided by
Data.govhttps://data.gov/
Authors
Australian Public Service Commission
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
These tables present a summary of employment under the Public Service Act 1999 at 31 December 2012 and during the 2012 calendar year. The data is an update of that presented in the APS Statistical Bulletin 2011-12.\r Data becomes available at six monthly intervals — through this summary for calendar year data and through the annual State of the Service Report (SOSR) and APS Statistical Bulletin (the Bulletin). The data in these tables is sourced from the APS Employment Database (APSED), which contains data extracted from agencies’ HR systems.\r \r The Australian Public Service Commission continues to work with agencies to improve the quality and timeliness of the data they provide to APSED. Each year extensive audits and error checking of APSED are undertaken, to ensure that sound conclusions can be drawn from the data. Through this audit process, previously published data has been updated. The June 2012 data published in SOSR and the Bulletin has been revised.\r \r As in the Bulletin, a headcount approach is used in these tables—that is, people working part-time are aggregated with people working full-time without weighting. Data also includes inoperative staff. Employees’ classification in these tables refers to their base or substantive classification. Abbreviations used in these tables and changes to administrative arrangements for the period July to December 2012 are explained in the link below. The Explanatory Notes at Appendix 1 of the Bulletin provide definitions of terms used in these tables.\r The Bulletin is available at http://www.apsc.gov.au/about-the-apsc/parliamentary/aps-statistical-bulletin/snapshots-december-2012
g
Wisdom Data Usage
fsadata.github.io
data.wu.ac.at
csv
Updated Aug 28, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Wisdom Data Usage [Dataset]. https://fsadata.github.io/wisdom-data-usage/
Explore at:
csvAvailable download formats
Dataset updated
Aug 28, 2018
Description
The Food Standards Agency uses a system called Wisdom for document and Record Management. Wisdom is the default system where official records should be added, managed and stored. We produce reports on usage levels by directorate to monitor system usage and ensure that staff are complying with the FSA Information Management policy. The reports are ad hoc in terms of timing and relate to usage in the month that they are titled by. The reports show the percentage of staff (represented as a decimal) using Wisdom against headcount figures.
Average hourly earnings of female and male employees
researchdata.edu.au
Updated Jul 5, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sustainable Development Goals (2018). Average hourly earnings of female and male employees [Dataset]. https://researchdata.edu.au/average-hourly-earnings-male-employees/2983495
Explore at:
Dataset updated
Jul 5, 2018
Dataset provided by
Data.govhttps://data.gov/
Authors
Sustainable Development Goals
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
2004 to 2017\r annual data\r source: ABS characteristics of employment cat no. 6333.0
Z
Data from: Hybrid LCA database generated using ecoinvent and EXIOBASE
data.niaid.nih.gov
Updated Oct 9, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agez Maxime (2021). Hybrid LCA database generated using ecoinvent and EXIOBASE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3890378
Explore at:
Dataset updated
Oct 9, 2021
Dataset provided by
CIRAIG
Authors
Agez Maxime
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hybrid LCA database generated using ecoinvent and EXIOBASE, i.e., each process of the original ecoinvent database is added new direct inputs (coming from EXIOBASE) deemed missing (e.g., services). Each process of the resulting hybrid database is thus not (or at least less) truncated and the calculated lifecycle emissions/impacts should therefore be closer to reality.

For license reasons, only the added inputs for each process of ecoinvent are provided (and not all the inputs).

Why are there two versions for hybrid-ecoinvent3.5?

One of the version corresponds to ecoinvent hybridized with the normal version of EXIOBASE and the other is hybridized with a capital-endogenized version of EXIOBASE.

What does capital endogenization do?

It matches capital goods formation to the value chains of products where they are required. In a more LCA way of speaking, EXIOBASE in its normal version does not allocate capital use to value chains. It's like if ecoinvent processes had no inputs of buildings, etc. in their unit process inventory. For more detail on this, refer to (Södersten et al., 2019) or (Miller et al., 2019).

So which version do I use?

Using the version "with capitals" gives a more comprehensive coverage. Using the "without capitals" version means that if a process of ecoinvent misses inputs of capital goods (e.g., a process does not include the company laptops of the employees), it won't be added. It comes with its fair share of assumptions and uncertainties however.

Why is it only available for hybrid-ecoinvent3.5?

The work used for capital endogenization is not available for exiobase3.8.1.

How do I use the dataset?

First, to use it, you will need both the corresponding ecoinvent [cut-off] and EXIOBASE [product x product] versions. For the reference year of EXIOBASE to-be-used, take 2011 if using the hybrid-ecoinvent3.5 and 2019 for hybrid-ecoinvent3.6 and 3.7.1.

In the four datasets of this package, only added inputs are given (i.e. inputs from EXIOBASE added to ecoinvent processes). Ecoinvent and EXIOBASE processes/sectors are not included, for copyright issues. You thus need both ecoinvent and EXIOBASE to calculate life cycle emissions/impacts.

Module to get ecoinvent in a Python format: https://github.com/majeau-bettez/ecospold2matrix (make sure to take the most up-to-date branch)

Module to get EXIOBASE in a Python format: https://github.com/konstantinstadler/pymrio (can also be installed with pip)

If you want to use the "with capitals" version of the hybrid database, you also need to use the capital endogenized version of EXIOBASE, available here: https://zenodo.org/record/3874309. Choose the pxp version of the year you plan to study (which should match with the year of the EXIOBASE version). You then need to normalize the capital matrix (i.e., divide by the total output x of EXIOBASE). Then, you simply add the normalized capital matrix (K) to the technology matrix (A) of EXIOBASE (see equation below).

Once you have all the data needed, you just need to apply a slightly modified version of the Leontief equation:

(\begin{equation} \textbf{q}^{hyb} = \begin{bmatrix} \textbf{C}^{lca}\cdot\textbf{S}^{lca} & \textbf{C}^{io}\cdot\textbf{S}^{io} \end{bmatrix} \cdot \left( \textbf{I} - \begin{bmatrix} \textbf{A}^{lca} & \textbf{C}^{d} \ \textbf{C}^{u} & \textbf{A}^{io}+\textbf{K}^{io} \end{bmatrix} \right) ^{-1} \cdot \left( \begin{bmatrix} \textbf{y}^{lca} \ 0 \end{bmatrix} \right) \end{equation})

qhyb gives the hybridized impact, i.e., the impacts of each process including the impacts generated by their new inputs.

Clca and Cio are the respective characterization matrices for ecoinvent and EXIOBASE.

Slca and Sio are the respective environmental extension matrices (or elementary flows in LCA terms) for ecoinvent and EXIOBASE.

I is the identity matrix.

Alca and Aio are the respective technology matrices for ecoinvent and EXIOBASE (the ones loaded with ecospold2matrix and pymrio).

Kio is the capital matrix. If you do not use the endogenized version, do not include this matrix in the calculation.

Cu (or upstream cut-offs) is the matrix that you get in this dataset.

Cd (or downstream cut-offs) is simply a matrix of zeros in the case of this application.

Finally you define your final demand (or functional unit/set of functional units for LCA) as ylca.

Can I use it with different versions/reference years of EXIOBASE?

Technically speaking, yes it will work, because the temporal aspect does not intervene in the determination of the hybrid database presented here. However, keep in mind that there might be some inconsistencies. For example, you would need to multiply each of the inputs of the datasets by a factor to account for inflation. Prices of ecoinvent (which were used to compile the hybrid databases, for all versions presented here) are defined in €2005.

What are the weird suite of numbers in the columns?

Ecoinvent processes are identified through unique identifiers (uuids) to which metadata (i.e., name, location, price, etc.) can be retraced with the appropriate metadata files in each dataset package.

Why is the equation (I-A)-1 and not A-1 like in LCA?

IO and LCA have the same computational background. In LCA however, the convention is to represents outputs and inputs in the technology matrix. That's why there is a diagonal of 1s (the outputs, i.e. functional units) and negative values elsewhere (inputs). In IO, the technology matrix does not include outputs and only registers inputs as positive values. In the end, it is just a convention difference. If we call T the technology matrix of LCA and A the technology matrix of IO we have T = I-A. When you load ecoinvent using ecospold2matrix, the resulting version of ecoinvent will already be in IO convention and you won't have to bother with it.

Pymrio does not provide a characterization matrix for EXIOBASE, what do I do?

You can find an up-to-date characterization matrix (with Impact World+) for environmental extensions of EXIOBASE here: https://zenodo.org/record/3890339

If you want to match characterization across both EXIOBASE and ecoinvent (which you should do), here you can find a characterization matrix with Impact World+ for ecoinvent: https://zenodo.org/record/3890367

It's too complicated...

The custom software that was used to develop these datasets already deals with some of the steps described. Go check it out: https://github.com/MaximeAgez/pylcaio. You can also generate your own hybrid version of ecoinvent using this software (you can play with some parameters like correction for double counting, inflation rate, change price data to be used, etc.). As of pylcaio v2.1, the resulting hybrid database (generated directly by pylcaio) can be exported to and manipulated in brightway2.

Where can I get more information?

The whole methodology is detailed in (Agez et al., 2021).

Facebook

Twitter

Click to copy link

Link copied

Cite

Webautomation (2022). WebAutomation Employee Data | Github Developer Profiles | Global 40M+ Developer Records | Explore Developer Repositories, Contributions and more [Dataset]. https://datarade.ai/data-products/webautomation-github-developer-profiles-dataset-global-webautomation

WebAutomation Employee Data | Github Developer Profiles | Global 40M+ Developer Records | Explore Developer Repositories, Contributions and more

Explore at:

.json, .csvAvailable download formats

Dataset updated

Dec 5, 2022

Dataset authored and provided by

Webautomation

Area covered

Montserrat, Falkland Islands (Malvinas), Estonia, Paraguay, Guadeloupe, Suriname, Uruguay, Canada, Greenland, Ukraine

Description

Extensive Developer Coverage: Our employee dataset includes a diverse range of developer profiles from GitHub, spanning various skill levels, industries, and expertise. Access information on developers from all corners of the software development world.

Developer Profiles: Explore detailed developer profiles, including user bios, locations, company affiliations, and skills. Understand developer backgrounds, experiences, and areas of expertise.

Repositories and Contributions: Access information about the repositories created by developers and their contributions to open-source projects. Analyze the projects they've worked on, their coding activity, and the impact they've made on the developer community.

Programming Languages: Gain insights into the programming languages that developers are proficient in. Identify skilled developers in specific programming languages that align with your project needs.

Customizable Data Delivery: The dataset is available in flexible formats, such as CSV, JSON, or API integration, allowing seamless integration with your existing data infrastructure. Customize the data to meet your specific research and analysis requirements.

Clear search

Close search

Google apps

Main menu

WebAutomation Employee Data | Github Developer Profiles | Global 40M+...

Human Resources Data Set Sample

Employee Database for SQL Case Study

SQL #DataAnalytics #PortfolioProject #CaseStudy #LearningByDoing #DataScience #SQLProject

Number of Active Employees by Industry

Fake Employee Dataset

Task content of occupations based on the ESCO database

SF Salaries (gender column included)

Edits

EXIM Bank's Employees by Level

bird-sql-train-xresults

APS Employment Data 31 December 2013

Coronavirus (Covid-19) Data in the United States

Interactive Equity Analysis Tool and Data (formerly ETAs)

An example dataset of interaction logs of software company employees

Womply State-level Business Revenue

Task content of occupations based on the ESCO database

Enterprise-Driven Open Source Software

APS Employment Data 31 December 2012

Wisdom Data Usage

Average hourly earnings of female and male employees

Data from: Hybrid LCA database generated using ecoinvent and EXIOBASE

WebAutomation Employee Data | Github Developer Profiles | Global 40M+ Developer Records | Explore Developer Repositories, Contributions and more