89 datasets found
  1. N

    HIA attribute dictionary

    • find.data.gov.scot
    • dtechtive.com
    Updated Feb 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NatureScot (2024). HIA attribute dictionary [Dataset]. https://find.data.gov.scot/datasets/20061
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    NatureScot
    Area covered
    United Kingdom of Great Britain and Northern Ireland
    Description

    This dataset combines all available field data for upland habitat impact assessment (HIA) survey work commissioned by SNH between 2004 and 2018 using the methods described in (Macdonald et al 1998). Data collation was carried out by Edward Wilson as part of the review of deer data Project 017079 managed by Jenny Bryce of the Wildlife Management Activity Team. The aim was to bring together all the available into a corporate dataset, standardise and clean the data and make it accessible for internal and external use. It should be used as the primary source of HIA data as it supersedes (through data cleansing) the data held in spreadsheets.

  2. PACO-LVIS Dataset

    • kaggle.com
    • datasetninja.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Swapnil Tripathi (2023). PACO-LVIS Dataset [Dataset]. https://www.kaggle.com/datasets/swaptr/pacolvis-dataset
    Explore at:
    zip(9244519306 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Swapnil Tripathi
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    PACO is a detection dataset that provides richer annotations such as part masks, object categories, object-part categories, and attributes. It spans 75 object categories, 456 object-part categories and 55 attributes across two datasets: LVIS and Ego4D. It has 641K part masks annotated across 260K object boxes, with roughly half of them annotated with attributes. It provides evaluation metrics and benchmark results for three tasks on the datasets: part mask segmentation, object and part attribute prediction and zero-shot instance detection. PACO-github PACO-meta PACO-arxiv

    Parts and Attributes of Common Objects - Large Vocabulary Instance Segmentation (PACO-LVIS)

    The PACO-LVIS dataset is formed from the LVIS dataset of images. The images sourced from the dataset has been annotated according to Meta's internal platform Halo, with 75 object classes. The LVIS dataset provides pixel-level annotations of objects and their categories, making it useful for part mask segmentation, object and part attribute prediction and zero-shot instance detection. | Stats | train/val/test | | :--- | :----: | | number of images | 45790/2410/9443 | | number of images with annotations | 45790/2410/9443 | | number of bounding boxes | 217117/10794/45861 | | number of object segments | 217117/10794/45861 | | number of part segments | 395071/20945/86041 | | number of bboxes with obj attributes | 58846/3140/12407 | | number of bboxes with part attributes | 52088/2812/11003 |

    Dataset Format

    # data: the variable we're loading and saving json dictionary to
    
    with open('annotations/paco_lvis_v1_train.json', 'r') as file:
      data = json.load(file)
    
    data["images"]: # a list of dictionaries, each dictionary corresponds to one image
    {
      'id':                  int,
      'file_name':              str,
      'width':                int,
      'height':                int,
      'neg_category_ids':           list,
      'not_exhaustive_category_ids':     list,
      'neg_category_ids_attrs':        list,
      'not_exhaustive_category_ids_attrs':  list,
      'license':               int,
    }
    data["annotations"]: # a list of dictionaries, each dictionary corresponds to one object or part bounding box
    {
      'id':            int,
      'bbox':           [x,y,width,height],
      'area':           float,
      'category_id':       int,
      'image_id':         int,
      'segmentation':       RLE,
      'attribute_ids':      List[int],
      'dom_color_ids':      List[int],
      'obj_ann_id':        int,
      'unknown_color':      0 or 1,
      'unknown_pattern_marking': 0 or 1,
      'unknown_material':     0 or 1,
      'unknown_transparency':   0 or 1,
      'instance_id':       int,  # PACO-EGO4D only
      'blur_level':        int,  # PACO-EGO4D only
    }
    data["categories"]: # a list of dictionaries, each dictionary corresponds to one object category
    {
      'supercategory':  'OBJECT',
      'id':        int,
      'name':       str,
      'image_count':   int,
      'instance_count':  int,
      'synset':      str,
      'frequency':    char,
    }
    data["part_categories"]: # a list of dictionaries, each dictionary corresponds to one part category
    {
      'supercategory': 'PART',
      'id':      int,
      'name':     str
    }
    data['attributes']: # a list of dictionaries, each dictionary corresponds to one attribute category
    {
      'supercategory': 'ATTR',
      'id':      int,
      'name':     str
    }
    data["attr_type_to_attr_idxs"]: # dictionary, key is the attribute name (one of: color, pattern, marking, material, transparency, value is the list of ids each attribute corresponds to)
    {
      'color':      range(30),
      'pattern_marking': range(30,41),
      'material':     range(41,55),
      'transparency':   range(55,59)
    }
    

    Object, Part, and Attribute Classes

    Object Classes: 75

    'trash_can','handbag','ball','basket','bicycle','book','bottle','bowl','can','car_(automobile)','carton','cellular_telephone','chair','cup','dog','drill','drum_(musical_instrument)','glass_(drink_container)','guitar','hat','helmet','jar','knife','laptop_computer','mug','pan_(for_cooking)','plate','remote_control','scissors','shoe','slipper_(footwear)','stool','table','towel','wallet','watch','wrench','belt','bench','blender','box','broom',...
    
  3. High-fidelity Fraudulent Activity Dataset 2023

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahzad Aslam (2023). High-fidelity Fraudulent Activity Dataset 2023 [Dataset]. https://www.kaggle.com/datasets/zeesolver/credit-card
    Explore at:
    zip(149953614 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Shahzad Aslam
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    Context

    The credit card dataset comprises various attributes that capture essential information about individual transactions. Each entry in the dataset is uniquely identified by an 'ID', which aids in precise record-keeping and analysis. The 'V1-V28' features encompass a wide range of transaction-related details, including time, location, type, and several other parameters. These attributes collectively provide a comprehensive snapshot of each transaction. 'Amount' denotes the monetary value involved in the transaction, indicating the specific charge or credit associated with the card. Lastly, the 'Class' attribute plays a pivotal role in fraud detection, categorizing transactions into distinct classes like 'legitimate' and 'fraudulent'. This classification is instrumental in identifying potentially suspicious activities, helping financial institutions safeguard against fraudulent transactions. Together, these attributes form a crucial dataset for studying and mitigating risks associated with credit card transactions.

    Column Details

    ID:

    This is likely a unique identifier for a specific credit card transaction. It helps in keeping track of individual transactions and distinguishing them from one another.

    V1-V28:

    These are possibly features or attributes associated with the credit card transaction. They might include information such as time, amount, location, type of transaction, and various other details that can be used for analysis and fraud detection.

    Amount:

    This refers to the monetary value involved in the credit card transaction. It indicates how much money was either charged or credited to the card during that particular transaction.

    Class:

    This is an important attribute indicating the category or type of the transaction. It typically classifies transactions into different groups, like 'fraudulent' or 'legitimate'. This classification is crucial for identifying potentially suspicious or fraudulent activities.

  4. m

    Updated Ljubljana Breast Cancer Data Set: reduced and cleaned version

    • data.mendeley.com
    Updated Oct 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gennady Chuiko (2023). Updated Ljubljana Breast Cancer Data Set: reduced and cleaned version [Dataset]. http://doi.org/10.17632/fgs9pyfv2z.2
    Explore at:
    Dataset updated
    Oct 25, 2023
    Authors
    Gennady Chuiko
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains information for Machine Learning algorithms to forecast recurrence events (RE) for patients with breast cancer stages I to III. The dataset contains 252 instances and six attributes, including a binary class indicating whether RE occurred. This dataset has been reduced and denoised from the original Ljubljana, which holds 286 instances with ten attributes each (LBCD, Zwitter M. and Soklic M. (1988). Breast Cancer. UCI Machine Learning Repository. https://archive.ics.uci.edu/dataset/14/breast+cancer). The ranking results by eight different Machine learning algorithms and statistical handling of the ranking 8-component vectors for attributes allow one to reduce ten features to six of the most relevant ones. The most pertinent features were the following five: {deg_malig, irradiat, node_caps, tumor_size, inv_nodes}. Less relevant found four attributes: {age, breast_quad, breast, menopause}. The CAIRAD: Co-appearance based Analysis for Incorrect Records and Attribute-values Detection ( Rahman MG, Islam MZ, Bossomaier T, Gao J. CAIRAD: A co-appearance based analysis for incorrect records and attribute-values detection. Proc Int Jt Conf Neural Networks. 2012;(June). https://doi.org/10.1109/IJCNN.2012.6252669) filter has been determined the noises in attributes and class features. Per the filtering results, 34 instances of LBCD had noises in half (or even more than half) of their features. Those were removed from the data. It is known that the noises in the class are riskier and teasing than those of attributes. Meantime, the class attribute had 35 (14%) missed values from 252 after COIRAD filtering. It was unacceptable, considering the comparable number (only 85 cases) of recurrence events in the class of initial LBCD. The imputation (reconstruction, "cure") of missed values was performed via the algorithm offered in:
    Bai BM, Mangathayaru N, Rani BP. An approach to find missing values in medical datasets. In: ACM International Conference Proceeding Series. Vol 24-26-Sept. ; 2015. https://doi.org/10.1145/2832987.2833083. The noises presented in the remaining attributes, ranging from 1% to 14%, were neglected. There are 252 instances in the dataset, of which 206 do not have RE, and the remaining 46 have RE. Six attributes, including its class, define each instance. This dataset is obtained from the initial version of the LBCD betterment, and it provides a significant advantage in the performance over the original LBCD for most classifying algorithms of Machine Learning. However, the dataset is slightly more imbalanced than the LBCD, which is a minus.

  5. a

    Core Based Statistical Areas

    • data-usdot.opendata.arcgis.com
    • catalog.data.gov
    • +2more
    Updated Jul 1, 2008
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Transportation: ArcGIS Online (2008). Core Based Statistical Areas [Dataset]. https://data-usdot.opendata.arcgis.com/datasets/usdot::core-based-statistical-areas/about
    Explore at:
    Dataset updated
    Jul 1, 2008
    Dataset authored and provided by
    U.S. Department of Transportation: ArcGIS Online
    Area covered
    Description

    The Core Based Statistical Areas dataset was updated on September 22, 2025 from the U.S. Department of Commerce, U.S. Census Bureau, Geography Division and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Metropolitan and Micropolitan Statistical Areas are together termed Core Based Statistical Areas (CBSAs) and are defined by the Office of Management and Budget (OMB) and consist of the county or counties or equivalent entities associated with at least one urban core of at least 10,000 population, plus adjacent counties having a high degree of social and economic integration with the core as measured through commuting ties with the counties containing the core. Categories of CBSAs are: Metropolitan Statistical Areas, based on urban areas of 50,000 or more population; and Micropolitan Statistical Areas, based on urban areas of at least 10,000 population but less than 50,000 population. The CBSA boundaries are those defined by OMB based on the 2020 Census and published in 2023. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529014

  6. Pet Cats Australia

    • kaggle.com
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2024). Pet Cats Australia [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/pet-cats-australia
    Explore at:
    zip(12773089 bytes)Available download formats
    Dataset updated
    Jan 19, 2024
    Authors
    Joakim Arvidsson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Australia
    Description

    This data file is published by the Movebank Data Repository (www.datarepository.movebank.org). As of the time of publication, a version of this published animal tracking dataset can be viewed on Movebank (www.movebank.org) in the study "Pet Cats Australia" (Movebank Study ID 64283289). Individual attributes in the data files are defined below, in the NERC Vocabulary Server at http://vocab.nerc.ac.uk/collection/MVB and in the Movebank Attribute Dictionary at www.movebank.org/node/2381. Metadata describing this data package are maintained at https://datacite.org.

    This data package includes the following data files: Pet Cats Australia.csv Pet Cats Australia-reference-data.csv

    Data package citation: Roetman P, Tindle H (2020) Data from: The small home ranges and large local ecological impacts of pet cats [Australia]. Movebank Data Repository. https://doi.org/10.5441/001/1.289p5s77

    These data are described in the following written publication: Kays R, Dunn RR, Parsons AW, Mcdonald B, Perkins T, Powers S, Shell L, McDonald JL, Cole H, Kikillus H, Woods L, Tindle H, Roetman P (2020) The small home ranges and large local ecological impacts of pet cats. Animal Conservation. https://doi.org/10.1111/acv.12563

    Terms of Use This data file is licensed by the Creative Commons Zero (CC0 1.0) license. The intent of this license is to facilitate the re-use of works. The Creative Commons Zero license is a "no rights reserved" license that allows copyright holders to opt out of copyright protections automatically extended by copyright and other laws, thus placing works in the public domain with as little legal restriction as possible. However, works published with this license must still be appropriately cited following professional and ethical standards for academic citation.

    We highly recommend that you contact the data creator if possible if you will be re-using or re-analyzing data in this file. Researchers will likely be interested in learning about new uses of their data, might also have important insights about how to properly analyze and interpret their data, and/or might have additional data they would be willing to contribute to your project. Feel free to contact us at support@movebank.org if you need assistance contacting data owners.

    See here for the full description of this license http://creativecommons.org/publicdomain/zero/1.0

    Data Attributes These definitions come from the Movebank Attribute Dictionary, available at http://vocab.nerc.ac.uk/collection/MVB and www.movebank.org/node/2381.

    algorithm marked outlier: Identifies events marked as outliers using a user-selected filter algorithm in Movebank. Outliers have the value TRUE. units: none entity described: event

    animal comments: Additional information about the animal that is not described by other reference data terms. example: sibling of #1423 units: none entity described: individual

    animal ID: An individual identifier for the animal, provided by the data owner. If the data owner does not provide an Animal ID, an internal Movebank animal identifier is sometimes shown. example: 91876A, Gary units: none entity described: individual same as: individual local identifier

    animal life stage: The age class or life stage of the animal at the beginning of the deployment. Can be years or months of age or terms such as 'adult', 'subadult' and 'juvenile'. Best practice is to define units in the values if needed (e.g. '2 years'). example: juvenile, adult units: not defined entity described: deployment

    animal reproductive condition: The reproductive condition of the animal at the beginning of the deployment. example: lactating units: none entity described: deployment

    animal sex: The sex of the animal. Allowed values are m = male f = female format: controlled list entity described: individual

    animal taxon: The scientific name of the species on which the tag was deployed, as defined by the Integrated Taxonomic Information System (ITIS, www.itis.gov). If the species name can not be provided, this should be the lowest level taxonomic rank that can be determined and that is used in the ITIS taxonomy. Additional information can be provided using the term 'taxon detail'. example: Buteo swainsoni format: controlled list entity described: individual same as: individual taxon canonical name

    attachment type: The way a tag is attached to an animal. Values are chosen from a controlled list: collar = The tag is attached by a collar around the animal's neck glue = The tag is attached to the animal using glue harness = The tag is attached to the animal using a harness implant = The tag is placed under the skin of the animal tape = The tag is attached to the animal using tape other = user specified format: controlled list entity described: deployment

    comments: Additional information about events that is not described by other event data terms. ...

  7. a

    Open Data Dictionary Template Individual

    • hub.arcgis.com
    • catalog.data.gov
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Washington, DC (2023). Open Data Dictionary Template Individual [Dataset]. https://hub.arcgis.com/documents/cb6a686b1e344eeb8136d0103c942346
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    City of Washington, DC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This template covers section 2.5 Resource Fields: Entity and Attribute Information of the Data Discovery Form cited in the Open Data DC Handbook (2022). It completes documentation elements that are required for publication. Each field column (attribute) in the dataset needs a description clarifying the contents of the column. Data originators are encouraged to enter the code values (domains) of the column to help end-users translate the contents of the column where needed, especially when lookup tables do not exist.

  8. a

    SES Water Domestic Consumption

    • hub.arcgis.com
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SESWater2 (2024). SES Water Domestic Consumption [Dataset]. https://hub.arcgis.com/maps/f2cdc1248fcf4fd289ac1d3f25e75b3b_0/about
    Explore at:
    Dataset updated
    Apr 26, 2024
    Dataset authored and provided by
    SESWater2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview    This dataset offers valuable insights into yearly domestic water consumption across various Lower Super Output Areas (LSOAs) or Data Zones, accompanied by the count of water meters within each area. It is instrumental for analysing residential water use patterns, facilitating water conservation efforts, and guiding infrastructure development and policy making at a localised level. Key Definitions    Aggregation   The process of summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes.     AMR Meter Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from a water meter remotely and periodically. Dataset   Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields.  Data Zone Data zones are the key geography for the dissemination of small area statistics in Scotland Dumb Meter A dumb meter or analogue meter is read manually. It does not have any external connectivity. Granularity   Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours   ID   Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.    LSOA Lower Layer Super Output Areas (LSOA) are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales. Open Data Triage   The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data.    Schema   Structure for organising and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.    Smart Meter A smart meter is an electronic device that records information and communicates it to the consumer and the supplier. It differs from automatic meter reading (AMR) in that it enables two-way communication between the meter and the supplier. Units   Standard measurements used to quantify and compare different physical quantities.  Water Meter Water metering is the practice of measuring water use. Water meters measure the volume of water used by residential and commercial building units that are supplied with water by a public water supply system. Data History    Data Origin    Domestic consumption data is recorded using water meters. The consumption recorded is then sent back to water companies. This dataset is extracted from the water companies. Data Triage Considerations    This section discusses the careful handling of data to maintain anonymity and addresses the challenges associated with data updates, such as identifying household changes or meter replacements. Identification of Critical Infrastructure  This aspect is not applicable for the dataset, as the focus is on domestic water consumption and does not contain any information that reveals critical infrastructure details. Commercial Risks and Anonymisation Individual Identification Risks There is a potential risk of identifying individuals or households if the consumption data is updated irregularly (e.g., every 6 months) and an out-of-cycle update occurs (e.g., after 2 months), which could signal a change in occupancy or ownership. Such patterns need careful handling to avoid accidental exposure of sensitive information. Meter and Property Association Challenges arise in maintaining historical data integrity when meters are replaced but the property remains the same. Ensuring continuity in the data without revealing personal information is crucial. Interpretation of Null Consumption Instances of null consumption could be misunderstood as a lack of water use, whereas they might simply indicate missing data. Distinguishing between these scenarios is vital to prevent misleading conclusions. Meter Re-reads The dataset must account for instances where meters are read multiple times for accuracy. Joint Supplies & Multiple Meters per Household Special consideration is required for households with multiple meters as well as multiple households that share a meter as this could complicate data aggregation. Schema Consistency with the Energy Industry: In formulating the schema for the domestic water consumption dataset, careful consideration was given to the potential risks to individual privacy. This evaluation included examining the frequency of data updates, the handling of property and meter associations, interpretations of null consumption, meter re-reads, joint suppliers, and the presence of multiple meters within a single household as described above. After a thorough assessment of these factors and their implications for individual privacy, it was decided to align the dataset's schema with the standards established within the energy industry. This decision was influenced by the energy sector's experience and established practices in managing similar risks associated with smart meters. This ensures a high level of data integrity and privacy protection. Schema The dataset schema is aligned with those used in the energy industry, which has encountered similar challenges with smart meters. However, it is important to note that the energy industry has a much higher density of meter distribution, especially smart meters. Aggregation to Mitigate Risks The dataset employs an elevated level of data aggregation to minimise the risk of individual identification. This approach is crucial in maintaining the utility of the dataset while ensuring individual privacy. The aggregation level is carefully chosen to remove identifiable risks without excluding valuable data, thus balancing data utility with privacy concerns. Data Freshness  Users should be aware that this dataset reflects historical consumption patterns and does not represent real-time data. Publish Frequency  Annually Data Triage Review Frequency    An annual review is conducted to ensure the dataset's relevance and accuracy, with adjustments made based on specific requests or evolving data trends. Data Specifications   For the domestic water consumption dataset, the data specifications are designed to ensure comprehensiveness and relevance, while maintaining clarity and focus. The specifications for this dataset include: Each dataset encompasses recordings of domestic water consumption as measured and reported by the data publisher. It excludes commercial consumption. Where it is necessary to estimate consumption, this is calculated based on actual meter readings. Meters of all types (smart, dumb, AMR) are included in this dataset. The dataset is updated and published annually. Historical data may be made available to facilitate trend analysis and comparative studies, although it is not mandatory for each dataset release. Context   Users are cautioned against using the dataset for immediate operational decisions regarding water supply management. The data should be interpreted considering potential seasonal and weather-related influences on water consumption patterns. The geographical data provided does not pinpoint locations of water meters within an LSOA. The dataset aims to cover a broad spectrum of households, from single-meter homes to those with multiple meters, to accurately reflect the diversity of water use within an LSOA.

  9. CRITEO FAIRNESS IN JOB ADS DATASET

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Abdur Rahman (2024). CRITEO FAIRNESS IN JOB ADS DATASET [Dataset]. https://www.kaggle.com/datasets/borhanitrash/fairness-in-job-ads-dataset
    Explore at:
    zip(201430692 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Md. Abdur Rahman
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Summary

    This dataset is released by Criteo to foster research and innovation on Fairness in Advertising and AI systems in general. See also Criteo pledge for Fairness in Advertising.

    The dataset is intended to learn click predictions models and evaluate by how much their predictions are biased between different gender groups.

    Data description

    The dataset contains pseudononymized users' context and publisher features that was collected from a job targeting campaign ran for 5 months by Criteo AdTech company. Each line represents a product that was shown to a user. Each user has an impression session where they can see several products at the same time. Each product can be clicked or not clicked by the user. The dataset consists of 1072226 rows and 55 columns.

    • features
      • user_id is a unique identifier assigned to each user. This identifier has been anonymized and does not contain any information related to the real users.
      • product_id is a unique identifier assigned to each product, i.e. job offer.
      • impression_id is a unique identifier assigned to each impression, i.e. online session that can have several products at the same time.
      • cat0 to cat5 are anonymized categorical user features.
      • cat6 to cat12 are anonymized categorical product features.
      • num13 to num47 are anonymized numerical user features.
    • labels
      • protected_attribute is a binary feature that describes user gender proxy, i.e. female is 0, male is 1. The detailed description on the meaning can be found below.
      • senior is a binary feature that describes the seniority of the job position, i.e. an assistant role is 0, a managerial role is 1. This feature was created during data processing step from the product title feature: if the product title contains words describing managerial role (e.g. 'president', 'ceo', and others), it is assigned to 1, otherwise to 0.
      • rank is a numerical feature that corresponds to the positional rank of the product on the display for given impression_id. Usually, the position on the display creates the bias with respect to the click: lower rank means higher position of the product on the display.
      • displayrandom is a binary feature that equals 1 if the display position on the banner of the products associated with the same impression_id was randomized. The click-rank metric should be computed on displayrandom = 1 to avoid positional bias.
      • click is a binary feature that equals 1 if the product product_id in the impression impression_id was clicked by the user user_id.

    Data statistics

    dimensionaverage
    click0.077
    protected attribute0.500
    senior0.704

    License

    The data is released under the CC-BY-NC-SA 4.0 license. You are free to Share and Adapt this data provided that you respect the Attribution, NonCommercial and ShareAlike conditions. Please read carefully the full license before using.

    Protected attribute

    As Criteo does not have access to user demographics we report a proxy of gender as protected attribute. This proxy is reported as binary for simplicity yet we acknowledge gender is not necessarily binary.

    The value of the proxy is computed as the majority of gender attributes of products seen in the user timeline. Product having a gender attribute are typically fashion and clothing. We acknowledge that this proxy does not necessarily represent how users relate to a given gender yet we believe it to be a realistic approximation for research purposes.

    We encourage research in Fairness defined with respect to other attributes as well.

    Limitations and interpretations

    We remark that the proposed gender proxy does not give a definition of the gender. Since we do not have access to the sensitive information, this is the best solution we have identified at this stage to idenitify bias on pseudonymised data, and we encourage any discussion on better approximations. This proxy is reported as binary for simplicity yet we acknowledge gender is not necessarily binary. Although our research focuses on gender, this should not diminish the importance of investigating other types of algorithmic discrimination. While this dataset provides important application of fairness-aware algorithms in a high-risk domain, there are several fundamental limitation that can not be addressed easily through data collection or curation processes. These limitations in...

  10. Pet Cats USA

    • kaggle.com
    zip
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2024). Pet Cats USA [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/pet-cats-usa
    Explore at:
    zip(6985705 bytes)Available download formats
    Dataset updated
    Jan 19, 2024
    Authors
    Joakim Arvidsson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This data file is published by the Movebank Data Repository (www.datarepository.movebank.org). As of the time of publication, a version of this published animal tracking dataset can be viewed on Movebank (www.movebank.org) in the study "Pet Cats United States" (Movebank Study ID 10857031). Individual attributes in the data files are defined below, in the NERC Vocabulary Server at http://vocab.nerc.ac.uk/collection/MVB and in the Movebank Attribute Dictionary at www.movebank.org/node/2381. Metadata describing this data package are maintained at https://datacite.org.

    This data package includes the following data files: Pet Cats United States.csv Pet Cats United States-reference-data.csv

    Data package citation: Kays R, Dunn RR, Parsons A, Mcdonald B, Perkins T, Powers S, Shell LData from: The small home ranges and large local ecological impacts of pet cats [United States]. Movebank Data Repository. https://doi.org/10.5441/001/1.bg0v5531

    These data are described in the following written publication: Kays R, Dunn RR, Parsons AW, Mcdonald B, Perkins T, Powers S, Shell L, McDonald JL, Cole H, Kikillus H, Woods L, Tindle H, Roetman P (2020) The small home ranges and large local ecological impacts of pet cats. Animal Conservation. https://doi.org/10.1111/acv.12563

    Terms of Use This data file is licensed by the Creative Commons Zero (CC0 1.0) license. The intent of this license is to facilitate the re-use of works. The Creative Commons Zero license is a "no rights reserved" license that allows copyright holders to opt out of copyright protections automatically extended by copyright and other laws, thus placing works in the public domain with as little legal restriction as possible. However, works published with this license must still be appropriately cited following professional and ethical standards for academic citation.

    We highly recommend that you contact the data creator if possible if you will be re-using or re-analyzing data in this file. Researchers will likely be interested in learning about new uses of their data, might also have important insights about how to properly analyze and interpret their data, and/or might have additional data they would be willing to contribute to your project. Feel free to contact us at support@movebank.org if you need assistance contacting data owners.

    See here for the full description of this license http://creativecommons.org/publicdomain/zero/1.0

    Data Attributes These definitions come from the Movebank Attribute Dictionary, available at http://vocab.nerc.ac.uk/collection/MVB and www.movebank.org/node/2381.

    algorithm marked outlier: Identifies events marked as outliers using a user-selected filter algorithm in Movebank. Outliers have the value TRUE. units: none entity described: event

    animal comments: Additional information about the animal that is not described by other reference data terms. example: sibling of #1423 units: none entity described: individual

    animal ID: An individual identifier for the animal, provided by the data owner. If the data owner does not provide an Animal ID, an internal Movebank animal identifier is sometimes shown. example: 91876A, Gary units: none entity described: individual same as: individual local identifier

    animal life stage: The age class or life stage of the animal at the beginning of the deployment. Can be years or months of age or terms such as 'adult', 'subadult' and 'juvenile'. Best practice is to define units in the values if needed (e.g. '2 years'). example: juvenile, adult units: not defined entity described: deployment

    animal reproductive condition: The reproductive condition of the animal at the beginning of the deployment. example: lactating units: none entity described: deployment

    animal sex: The sex of the animal. Allowed values are m = male f = female format: controlled list entity described: individual

    animal taxon: The scientific name of the species on which the tag was deployed, as defined by the Integrated Taxonomic Information System (ITIS, www.itis.gov). If the species name can not be provided, this should be the lowest level taxonomic rank that can be determined and that is used in the ITIS taxonomy. Additional information can be provided using the term 'taxon detail'. example: Buteo swainsoni format: controlled list entity described: individual same as: individual taxon canonical name

    attachment type: The way a tag is attached to an animal. Values are chosen from a controlled list: collar = The tag is attached by a collar around the animal's neck glue = The tag is attached to the animal using glue harness = The tag is attached to the animal using a harness implant = The tag is placed under the skin of the animal tape = The tag is attached to the animal using tape other = user specified format: controlled list entity described: deployment

    data processing software: Name of the ...

  11. Diamonds Prices

    • kaggle.com
    zip
    Updated Jul 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ms. Nancy Al Aswad (2022). Diamonds Prices [Dataset]. https://www.kaggle.com/datasets/nancyalaswad90/diamonds-prices
    Explore at:
    zip(728251 bytes)Available download formats
    Dataset updated
    Jul 9, 2022
    Authors
    Ms. Nancy Al Aswad
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    What is Diamonds Prices Dataset?

    This document explores a dataset containing prices and attributes for approximately 54,000 round-cut diamonds. There are 53,940 diamonds in the dataset with 10 features (carat, cut, color, clarity, depth, table, price, x, y, and z). Most variables are numeric in nature, but the variables cut, color, and clarity are ordered factor variables with the following levels.

    About the currency for the price column: it is Price ($)

    And About the columns x,y, and z they are diamond measurements as (( x: length in mm, y: width in mm,z: depth in mm ))

    .

    https://user-images.githubusercontent.com/36210723/182397020-a1bcc086-d086-4e37-9975-99a762f328c6.png" alt="2022-08-02_171709">

    .

    Acknowledgments

    When we use this dataset in our research, we credit the authors as :

    The main idea for uploading this dataset is to practice data analysis with my students, as I am working in college and want my student to train our studying ideas in a big dataset, It may be not up to date and I mention the collecting years, but it is a good resource of data to practice

  12. Large Scale International Boundaries

    • s.cnmilf.com
    • geodata.state.gov
    • +1more
    Updated Aug 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (Point of Contact) (2025). Large Scale International Boundaries [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/large-scale-international-boundaries
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Description

    Overview The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control. National Geospatial Data Asset This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee. Dataset Source Details Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground. Cartographic Visualization The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below. Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://data.geodata.state.gov/guidance/index.html Contact Direct inquiries to internationalboundaries@state.gov. Direct download: https://data.geodata.state.gov/LSIB.zip Attribute Structure The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB. Core Attributes The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields. County Code and Country Name Fields “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard. The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user. Descriptive Fields The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line. ATTRIBUTE NAME | | VALUE | RANK | 1 | 2 | 3 STATUS | International Boundary | Other Line of International Separation | Special Line A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively. The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps. The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line. Use of Core Attributes in Cartographic Visualization Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: International Boundaries (Rank 1); Other Lines of International Separation (Rank 2); and Special Lines (Rank 3). Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction. The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling. Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic labeling purposes is prohibited. Extension Attributes Certain elements of the attributes within the LSIB dataset extend data functionality to make the data more interoperable or to provide clearer linkages to other datasets. The fields “CC1_GENC3” and “CC2_GENC” contain the corresponding three-character GENC code to the “CC1” and “CC2” attributes. The code “QX2” is the three-character counterpart of the code “Q2,” which denotes a line in the LSIB representing a boundary associated with a geographic area not contained within the GENC standard. To allow for linkage between individual lines in the LSIB and World Polygons dataset, the “CC1_WPID” and “CC2_WPID” fields contain a Universally Unique Identifier (UUID), version 4, which provides a stable description of each geographic entity in a boundary pair relationship. Each UUID corresponds to a geographic entity listed in the World Polygons dataset. These fields allow for linkage between individual lines in the LSIB and the overall World Polygons dataset. Five additional fields in the LSIB expand on the UUID concept and either describe features that have changed across space and time or indicate relationships between previous versions of the feature. The “LSIB_ID” attribute is a UUID value that defines a specific instance of a feature. Any change to the feature in a lineset requires a new “LSIB_ID.” The “ANTECIDS,” or antecedent ID, is a UUID that references line geometries from which a given line is descended in time. It is used when there is a feature that is entirely new, not when there is a new version of a previous feature. This is generally used to reference countries that have dissolved. The “PREVIDS,” or Previous ID, is a UUID field that contains old versions of a line. This is an additive field, that houses all Previous IDs. A new version of a feature is defined by any change to the

  13. a

    NTAD Metadata Data Dictionary External Link Test

    • hub.arcgis.com
    Updated Aug 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Transportation: ArcGIS Online (2025). NTAD Metadata Data Dictionary External Link Test [Dataset]. https://hub.arcgis.com/datasets/a0dd55c959c44c39b33b55da19a6ba6d
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset authored and provided by
    U.S. Department of Transportation: ArcGIS Online
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Description

    The National Transit Map - Routes dataset was compiled on June 02, 2025 from the Bureau of Transportation Statistics (BTS) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The National Transit Map (NTM) is a nationwide catalog of fixed-guideway and fixed-route transit service in America. It is compiled using General Transit Feed Specification (GTFS) Schedule data. The NTM Routes dataset shows transit routes, which is a group of trips that are displayed to riders as a single service. To display the route alignment and trips for each route, this dataset combines the following GTFS files: routes.txt, trips.txt, and shapes.txt. The GTFS Schedule documentation is available at, https://gtfs.org/schedule/. To improve the spatial accuracy of the NTM Routes, the Bureau of Transportation Statistics (BTS) adjusts transit routes using context from the submitted GTFS source data and/or from other publicly available information about the transit service. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529048

  14. CALFIRE FPGIS Data Dictionary v4

    • hub.arcgis.com
    • data.ca.gov
    • +7more
    Updated Feb 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Forestry and Fire Protection (2021). CALFIRE FPGIS Data Dictionary v4 [Dataset]. https://hub.arcgis.com/documents/0145dc102fa7411d898eba7ee97cf683
    Explore at:
    Dataset updated
    Feb 11, 2021
    Dataset authored and provided by
    California Department of Forestry and Fire Protectionhttp://calfire.ca.gov/
    Description

    Attribute field definitions for data created by Forest Practice GIS on plans and notices for timber harvesting either submitted to, approved, or accepted by, the California Department of Forestry and Fire Protection. Includes roads and hydrology within and adjacent to harvest areas.

  15. g

    The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1)...

    • gimi9.com
    • s.cnmilf.com
    • +1more
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: GeoChemPhys [Dataset]. https://gimi9.com/dataset/data-gov_476e2603871ea332a8a4e47594e9b6fe7f989fbf/
    Explore at:
    Area covered
    Contiguous United States, United States
    Description

    This dataset represents geochemical or geophysical attributes in surface or near surface geology within individual local and accumulated upstream catchments for NHDPlusV2 Waterbodies. Catchment boundaries in LakeCat are defined in one of two ways, on-network or off-network. The on-network catchment boundaries follow the catchments provided in the NHDPlusV2 and the metrics for these lakes mirror metrics from StreamCat, but will substitute the COMID of the NHDWaterbody for that of the NHDFlowline. The off-network catchment framework uses the NHDPlusV2 flow direction rasters to define non-overlapping lake-catchment boundaries and then links them through an off-network flow table. For information regarding how the Landscape layers were created see https://www.sciencebase.gov/catalog/item/53481333e4b06f6ce034aae7. Landscape Layers are partitioned into 4 tables based on the location of no-data cells within their rasters to correctly reflect the PctFull attributes within each table.

  16. The Insurance Company (TIC) Benchmark

    • kaggle.com
    zip
    Updated May 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kush Shah (2020). The Insurance Company (TIC) Benchmark [Dataset]. https://www.kaggle.com/datasets/kushshah95/the-insurance-company-tic-benchmark/code
    Explore at:
    zip(268454 bytes)Available download formats
    Dataset updated
    May 27, 2020
    Authors
    Kush Shah
    Description

    This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data

    DETAILED DATA DESCRIPTION

    THE INSURANCE COMPANY (TIC) 2000

    (c) Sentient Machine Research 2000

    DISCLAIMER

    This dataset is owned and supplied by the Dutch data mining company Sentient Machine Research, and is based on real-world business data. You are allowed to use this dataset and accompanying information for non-commercial research and education purposes only. It is explicitly not allowed to use this dataset for commercial education or demonstration purposes. For any other use, please contact Peter van der Putten, info@smr.nl.

    This dataset has been used in the CoIL Challenge 2000 data mining competition. For papers describing results on this dataset, see the TIC 2000 homepage: http://www.wi.leidenuniv.nl/~putten/library/cc2000/

    REFERENCE P. van der Putten and M. van Someren (eds). CoIL Challenge 2000: The Insurance Company Case. Published by Sentient Machine Research, Amsterdam. Also a Leiden Institute of Advanced Computer Science Technical Report 2000-09. June 22, 2000. See http://www.liacs.nl/~putten/library/cc2000/

    RELEVANT FILES

    tic_2000_train_data.csv: Dataset to train and validate prediction models and build a description (5822 customer records). Each record consists of 86 attributes, containing sociodemographic data (attribute 1-43) and product ownership (attributes 44-86). The sociodemographic data is derived from zip codes. All customers living in areas with the same zip code have the same sociodemographic attributes. Attribute 86, "CARAVAN: Number of mobile home policies", is the target variable.

    tic_2000_eval_data.csv: Dataset for predictions (4000 customer records). It has the same format as TICDATA2000.txt, only the target is missing. Participants are supposed to return the list of predicted targets only. All datasets are in CSV format. The meaning of the attributes and attribute values is given dictionary.csv

    tic_2000_target_data.csv Targets for the evaluation set.

    dictionary.txt: Data description with numerical labeled categories descriptions. It has columnar description data and the labels of the dummy/Labeled encoding.

    Original Task description Link: http://liacs.leidenuniv.nl/~puttenpwhvander/library/cc2000/problem.html UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets/Insurance+Company+Benchmark+%28COIL+2000%29

  17. g

    The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1)...

    • gimi9.com
    • datasets.ai
    • +1more
    Updated Apr 23, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2015). The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: Runoff [Dataset]. https://gimi9.com/dataset/data-gov_3933cb6a62dcfa342815ad2846c4fa83a8c7b912/
    Explore at:
    Dataset updated
    Apr 23, 2015
    Area covered
    Contiguous United States, United States
    Description

    This dataset represents the estimated surface water runoff within individual local and accumulated upstream catchments for NHDPlusV2 Waterbodies. Catchment boundaries in LakeCat are defined in one of two ways, on-network or off-network. The on-network catchment boundaries follow the catchments provided in the NHDPlusV2 and the metrics for these lakes mirror metrics from StreamCat, but will substitute the COMID of the NHDWaterbody for that of the NHDFlowline. The off-network catchment framework uses the NHDPlusV2 flow direction rasters to define non-overlapping lake-catchment boundaries and then links them through an off-network flow table. The landscape layer (raster) was developed with a water-balance model developed by Dave Wolock of the USGS and is detailed further in the paper "Independent effects of temperature and precipitation on modeled runoff in the conterminous United States". McCabe and Wolock[2011] Runoff is defined as the flow per unit area delivered to streams and rivers in units of millimeters per month. The runoff estimates were summarized to produce local catchment-level and watershed-level metrics as a continuous data type.

  18. a

    Street Centerline Data Dictionary

    • hub.arcgis.com
    • datasets.ai
    • +4more
    Updated Oct 7, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lake County Illinois GIS (2017). Street Centerline Data Dictionary [Dataset]. https://hub.arcgis.com/documents/b8eb69cc30994d13a6b7e9b32f1f9807
    Explore at:
    Dataset updated
    Oct 7, 2017
    Dataset authored and provided by
    Lake County Illinois GIS
    License

    https://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/datahttps://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/data

    Area covered
    Description

    An in-depth description of the Street Centerline GIS dataset outlining terms of use, update frequency, attribute explanations, and more.

  19. Superstore

    • kaggle.com
    zip
    Updated Oct 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ibrahim Elsayed (2022). Superstore [Dataset]. https://www.kaggle.com/datasets/ibrahimelsayed182/superstore
    Explore at:
    zip(167457 bytes)Available download formats
    Dataset updated
    Oct 3, 2022
    Authors
    Ibrahim Elsayed
    Description

    Context

    super Store in USA , the data contain about 10000 rows

    Data Dictionary

    AttributesDefinitionexample
    Ship ModeSecond Class
    SegmentSegment CategoryConsumer
    CountryUnited State
    CityLos Angeles
    StateCalifornia
    Postal Code90032
    RegionWest
    CategoryCategories of productTechnology
    Sub-CategoryPhones
    Salesnumber of sales114.9
    Quantity3
    Discount0.45
    Profit14.1694

    Acknowledgements

    All thanks to The Sparks Foundation For making this data set

    Inspiration

    Get the data and try to take insights. Good luck ❤️

    Don't forget to Upvote😊🥰

  20. g

    The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1)...

    • gimi9.com
    • s.cnmilf.com
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The LakeCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: National Land Cover Database - Impervious Surfaces [Dataset]. https://gimi9.com/dataset/data-gov_the-lakecat-dataset-accumulated-attributes-for-nhdplusv2-version-2-1-catchments-for-the-co/
    Explore at:
    Area covered
    United States
    Description

    This dataset represents the impervious surface coefficients within individual local and accumulated upstream catchments for NHDPlusV2 Waterbodies based on the National Land Cover Data. AOI boundaries in LakeCat are defined in one of two ways, on-network or off-network. The on-network catchment boundaries follow the catchments provided in the NHDPlusV2 and the metrics for these lakes mirror metrics from StreamCat, but will substitute the COMID of the NHDWaterbody for that of the NHDFlowline. The off-network catchment framework uses the NHDPlusV2 flow direction rasters to define non-overlapping lake-AOI boundaries and then links them through an off-network flow table. This data set is derived from the NLCD Impervious Surfaces raster which describes percent imperviousness (continuous data type). Values indicate the degree to which the area is composed of impervious anthropogenic materials (e.g., parking surfaces, roads, building roofs). This raster was produced based on a decision-tree classification of 2001, 2004, 2006, 2008, 2011, 2013, 2016, and 2019 Landsat satellite data. This dataset will include additional years as they become available.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
NatureScot (2024). HIA attribute dictionary [Dataset]. https://find.data.gov.scot/datasets/20061

HIA attribute dictionary

Explore at:
Dataset updated
Feb 7, 2024
Dataset provided by
NatureScot
Area covered
United Kingdom of Great Britain and Northern Ireland
Description

This dataset combines all available field data for upland habitat impact assessment (HIA) survey work commissioned by SNH between 2004 and 2018 using the methods described in (Macdonald et al 1998). Data collation was carried out by Edward Wilson as part of the review of deer data Project 017079 managed by Jenny Bryce of the Wildlife Management Activity Team. The aim was to bring together all the available into a corporate dataset, standardise and clean the data and make it accessible for internal and external use. It should be used as the primary source of HIA data as it supersedes (through data cleansing) the data held in spreadsheets.

Search
Clear search
Close search
Google apps
Main menu