2 datasets found
  1. Amazon Employee Access Challenge

    • kaggle.com
    Updated Aug 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luca Massaron (2021). Amazon Employee Access Challenge [Dataset]. https://www.kaggle.com/lucamassaron/amazon-employee-access-challenge/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Luca Massaron
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    When an employee at any company starts work, they first need to obtain the computer access necessary to fulfill their role. This access may allow an employee to read/manipulate resources through various applications or web portals. It is assumed that employees fulfilling the functions of a given role will access the same or similar resources. It is often the case that employees figure out the access they need as they encounter roadblocks during their daily work (e.g. not able to log into a reporting portal). A knowledgeable supervisor then takes time to manually grant the needed access in order to overcome access obstacles. As employees move throughout a company, this access discovery/recovery cycle wastes a nontrivial amount of time and money.

    There is a considerable amount of data regarding an employee’s role within an organization and the resources to which they have access. Given the data related to current employees and their provisioned access, models can be built that automatically determine access privileges as employees enter and leave roles within a company. These auto-access models seek to minimize the human involvement required to grant or revoke employee access.

    Part of the competition "Amazon.com - Employee Access Challenge" (https://www.kaggle.com/c/amazon-employee-access-challenge), the data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. Your task is to create an algorithm capable of learning from this historical data to predict approval/denial for an unseen set of employees.

    Content

    The data comes from Amazon Inc. collected from 2010-2011 (published on Kaggle platform). The training set consists of 32769 samples and the testing one of 58922 samples. The training set has one label attribute named “ACTION”, whose value “1” indicates an application is approved whereas “0” indicates rejection. As predictors of this state, there are eight features, indicating characteristics of the required resource anf the role and work group of the employee at Amazon requesting access.

    train.csv - The training set. Each row has the ACTION (ground truth), RESOURCE, and information about the employee's role at the time of approval

    test.csv - The test set for which predictions should be made. Each row asks whether an employee having the listed characteristics should have access to the listed resource.

    Column NameDescription
    ACTIONACTION is 1 if the resource was approved, 0 if the resource was not
    RESOURCEAn ID for each resource
    MGR_IDThe EMPLOYEE ID of the manager of the current EMPLOYEE ID record; an employee may have only one manager at a time
    ROLE_ROLLUP_1Company role grouping category id 1 (e.g. US Engineering)
    ROLE_ROLLUP_2Company role grouping category id 2 (e.g. US Retail)
    ROLE_DEPTNAMECompany role department description (e.g. Retail)
    ROLE_TITLECompany role business title description (e.g. Senior Engineering Retail Manager)
    ROLE_FAMILY_DESCCompany role family extended description (e.g. Retail Manager, Software Engineering)
    ROLE_FAMILYCompany role family description (e.g. Retail Manager)
    ROLE_CODECompany role code; this code is unique to each role (e.g. Manager)

    Models are judged on area under the ROC curve (https://en.wikipedia.org/wiki/Receiver_operating_characteristic)

    Acknowledgements

    The data has been donated by Amazon and the original competition has been hosted in collaboration with the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2013)

  2. Amazon Employee access

    • kaggle.com
    Updated Aug 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SARMISTHA DASH (2021). Amazon Employee access [Dataset]. https://www.kaggle.com/sarmisthadash/amazon-employee-access/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SARMISTHA DASH
    Description

    Dataset

    This dataset was created by SARMISTHA DASH

    Contents

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Luca Massaron (2021). Amazon Employee Access Challenge [Dataset]. https://www.kaggle.com/lucamassaron/amazon-employee-access-challenge/code
Organization logo

Amazon Employee Access Challenge

Explore at:
92 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Luca Massaron
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

When an employee at any company starts work, they first need to obtain the computer access necessary to fulfill their role. This access may allow an employee to read/manipulate resources through various applications or web portals. It is assumed that employees fulfilling the functions of a given role will access the same or similar resources. It is often the case that employees figure out the access they need as they encounter roadblocks during their daily work (e.g. not able to log into a reporting portal). A knowledgeable supervisor then takes time to manually grant the needed access in order to overcome access obstacles. As employees move throughout a company, this access discovery/recovery cycle wastes a nontrivial amount of time and money.

There is a considerable amount of data regarding an employee’s role within an organization and the resources to which they have access. Given the data related to current employees and their provisioned access, models can be built that automatically determine access privileges as employees enter and leave roles within a company. These auto-access models seek to minimize the human involvement required to grant or revoke employee access.

Part of the competition "Amazon.com - Employee Access Challenge" (https://www.kaggle.com/c/amazon-employee-access-challenge), the data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. Your task is to create an algorithm capable of learning from this historical data to predict approval/denial for an unseen set of employees.

Content

The data comes from Amazon Inc. collected from 2010-2011 (published on Kaggle platform). The training set consists of 32769 samples and the testing one of 58922 samples. The training set has one label attribute named “ACTION”, whose value “1” indicates an application is approved whereas “0” indicates rejection. As predictors of this state, there are eight features, indicating characteristics of the required resource anf the role and work group of the employee at Amazon requesting access.

train.csv - The training set. Each row has the ACTION (ground truth), RESOURCE, and information about the employee's role at the time of approval

test.csv - The test set for which predictions should be made. Each row asks whether an employee having the listed characteristics should have access to the listed resource.

Column NameDescription
ACTIONACTION is 1 if the resource was approved, 0 if the resource was not
RESOURCEAn ID for each resource
MGR_IDThe EMPLOYEE ID of the manager of the current EMPLOYEE ID record; an employee may have only one manager at a time
ROLE_ROLLUP_1Company role grouping category id 1 (e.g. US Engineering)
ROLE_ROLLUP_2Company role grouping category id 2 (e.g. US Retail)
ROLE_DEPTNAMECompany role department description (e.g. Retail)
ROLE_TITLECompany role business title description (e.g. Senior Engineering Retail Manager)
ROLE_FAMILY_DESCCompany role family extended description (e.g. Retail Manager, Software Engineering)
ROLE_FAMILYCompany role family description (e.g. Retail Manager)
ROLE_CODECompany role code; this code is unique to each role (e.g. Manager)

Models are judged on area under the ROC curve (https://en.wikipedia.org/wiki/Receiver_operating_characteristic)

Acknowledgements

The data has been donated by Amazon and the original competition has been hosted in collaboration with the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2013)

Search
Clear search
Close search
Google apps
Main menu