41 datasets found
  1. Freesound Loop Dataset

    • zenodo.org
    bin, zip
    Updated Jul 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra (2020). Freesound Loop Dataset [Dataset]. http://doi.org/10.5281/zenodo.3967852
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jul 31, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Freesound Loop Dataset

    This dataset contains 9,455 loops from Freesound.org and the corresponding annotations. These loops have tempo, key, genre and instrumentation annotation.

    Dataset Construction

    To collect this dataset, the following steps were performed:

    • Freesound was queried with "loop" and "bpm", so as to collect loops which have a beats-per-minute(BPM) annotations.

    • The sounds were analysed with AudioCommons extractor, so as to obtain key information.

    • The textual metadata of each sound was analysed, to obtain the BPM proposed by the user, and to obtain genre information.

    • Annotators used a web interface to annotate around 3,000 loops.

    Dataset Organisation

    The dataset contains two folders and two files in the root directory:

    • 'FSL10K' encloses the audio files and their metadata and analysis. The audios are in the 'audio' folder and are named '

    • 'annotations' holds the expert provided annotation for the sounds in the dataset. The annotations are separated in a folder for each annotator and each annotation is stored as a .json file, named 'sound-

    Licenses

    All the sounds have some kind of Creative Commons license. The license of each sound in the dataset can be obtained from the 'FSL10K/metadata.json' file

    Authors and Contact

    This dataset was developed by António Ramires et. al.

    Any questions related to this dataset please contact:

    António Ramires

    antonio.ramires@upf.edu

    aframires@gmail.com

    References

    Please cite this paper if you use this dataset:

    @inproceedings{ramires2020, author = "Antonio Ramires and Frederic Font and Dmitry Bogdanov and Jordan B. L. Smith and Yi-Hsuan Yang and Joann Ching and Bo-Yu Chen and Yueh-Kao Wu and Hsu Wei-Han and Xavier Serra", title = "The Freesound Loop Dataset and Annotation Tool", booktitle = "Proc. of the 21st International Society for Music Information Retrieval (ISMIR)", year = "2020" }

    Acknowledgements

    This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers).

  2. z

    SymbioLCD - Datasets - Dataset - data.govt.nz - discover and use data

    • portal.zero.govt.nz
    • catalogue.data.govt.nz
    Updated Jan 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zero.govt.nz (2022). SymbioLCD - Datasets - Dataset - data.govt.nz - discover and use data [Dataset]. https://portal.zero.govt.nz/77d6ef04507c10508fcfc67a7c24be32/dataset/oai-figshare-com-article-14958228
    Explore at:
    Dataset updated
    Jan 18, 2022
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Overview:Three new datasets available here represent normal household areas with common objects - lounge, kitchen and garden - with varying trajectories.Description:Lounge: The lounge dataset with common household objects.Lounge_oc: The lounge dataset with object occlusions near the end of trajectory.Kitchen: The kitchen dataset with common household objects.Kitchen_oc: The kitchen dataset with object occlusions near the end of trajectory.Garden: The garden dataset with common household objects.Garden_oc: The garden dataset with object occlusions near the end of trajectory.convert.py: Python script to convert a video file into jpgs.Paper:The datasets were used for the paper "SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words", accepted at 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems.Abstract:Loop closure detection is an essential tool of Simultaneous Localization and Mapping (SLAM) to minimize drift in its localization. Many state-of-the-art loop closure detection (LCD) algorithms use visual Bag-of-Words (vBoW), which is robust against partial occlusions in a scene but cannot perceive the semantics or spatial relationships between feature points. CNN object extraction can address those issues, by providing semantic labels and spatial relationships between objects in a scene. Previous work has mainly focused on replacing vBoW with CNN derived features.In this paper we propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction. When used in tandem, the added elements of object semantics and spatial-awareness creates a more robust and symbiotic loop closure detection system. The proposed SymbioLCD uses scale-invariant spatial and semantic matching, Hausdorff distance with temporal constraints, and a Random Forest that utilizes combined information from both CNN-extracted objects and vBoW features for predicting accurate loop closure candidates. Evaluation of the proposed method shows it outperforms other Machine Learning (ML) algorithms - such as SVM, Decision Tree and Neural Network, and demonstrates that there is a strong symbiosis between CNN-extracted object information and vBoW features which assists accurate LCD candidate prediction. Furthermore, it is able to perceive loop closure candidates earlier than state-of-the-art SLAM algorithms, utilizing added spatial and semantic information from CNN-extracted objects.Citation:Please use the bibtex below for citing the paper:@inproceedings{kim2021symbiolcd,title = {SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words},author = {Jonathan Kim and Martin Urschler and Pat Riddle and J"{o}rg Wicker},year = {2021},date = {2021-09-27},booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems},keywords = {},pubstate = {forthcoming},tppubtype = {inproceedings}}

  3. Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier)

    • openneuro.org
    Updated Oct 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haydn G. Herrema; Michael J. Kahana (2024). Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier) [Dataset]. http://doi.org/10.18112/openneuro.ds005557.v1.0.0
    Explore at:
    Dataset updated
    Oct 6, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Haydn G. Herrema; Michael J. Kahana
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Free Recall with Closed-Loop Stimulation at Encoding (Encoding Classifier)

    Description

    This dataset contains behavioral events and intracranial electrophysiology recordings from a delayed free recall task with closed-loop stimulation at encoding, using a classifier trained on encoding data. The experiment consists of participants studying a list of words, presented visually one at a time, completing simple arithmetic problems that function as a distractor, and then freely recalling the words from the just-presented list in any order. The data was collected at clinical sites across the country as part of a collaboration with the Computational Memory Lab at the University of Pennsylvania. This dataset is a closed-loop stimulation version of the FR1 and FR2 datasets.

    This study contains closed-loop electrical stimulation of the brain during encoding. There is no stimulation during the distractor or retrieval phases. Stimulation is delivered to a single electrode at a time, and the stimulation parameters are included in the bevavioral events tsv files, denoting the anode/cathode labels, amplitude, pulse frequency, pulse width, and pulse count.

    Classifier Details

    The L2 logistic regression classifier is trained to predict whether an encoded item will be subsequently recalled based on the neural features during encoding, using data from a participant's FR1 sessions. The bipolar recordings during the 0-1366 ms interval after word presentation are filtered with a Butterworth band stop filter (58-62 Hz, 4th order) to remove 60 Hz line noise, and then a Morlet wavelet transformation (wavenumber = 5) is applied to the signal to estimate spectral power, using 8 log-spaced wavelets between 3-180 Hz (center frequencies 3.0, 5.4, 9.7, 17.4, 31.1, 55.9, 100.3, 180 Hz) and 1365 ms mirrored buffers. The powers are log-transformed prior to removal of the buffer, and then z-transformed based on the within-session mean and standard deviation across all encoding events. These z-transformed log power values represent the feature matrix, and the label vector is the recalled status of the encoded items. The penalty parameter is chosen based on the value that leads to the highest average AUC for all prior participants with at least two FR1 sessiona, and is inversely weighted according to the class (i.e., recalled v. not recalled) imbalance to ensure the best fit values of the penalty parameter are comparable across different class distributions (recall rates). Class weights are computed as: (1/Na) / ((1/Na + 1/Nb) / 2) where Na and Nb are the number of events in each class.

    After at least 3 training sessions with a minimum of 15 lists, each participant's classifier is tested using leave-one-session-out (LOSO) cross validation, and the true AUC is compared to a 200-sample AUC distribution generated from classification of label-permuted data. p < 0.05 (one-sided) is used as the significance threshold for continuing to the closed-loop task.

    Closed-Loop Procedure

    Each session contains 26 lists (the first being a practice list) and there is no stimulation on the first 4 lists. The classifier ouput for each presented item on the first 4 lists is compared to the classifier output when tested on data from all previous sessions using a two-sample Kolmogorov-Smirnov test. The null hypothesis that the current session and the training data come from the same distribution must not be rejected (p > 0.05) for the closed-loop task to continue.

    The remaining 22 lists are equally divided into stimulation and no stimulation lists, with conditions balanced in each half of the session. On stimulation lists, classifier output is evaluated during the 0-1366 ms interval following word presentation onset. The input values are normalized using the mean and standard deviation across encoding events on all prior no stimulation lists in the session. If the classifier output is below the median classifier output from the training sessions, stimulation occurs immediately following the 1366 ms decoding interval and lasts for 500 ms. With a 750-1000 ms inter-stimulus interval, there is enough time for stimulation artifacts to subside before the next word onset (next classifier decoding).

    To Note

    • The iEEG recordings are labeled either "monopolar" or "bipolar." The monopolar recordings are referenced (typically a mastoid reference), but should always be re-referenced before analysis. The bipolar recordings are referenced according to a paired scheme indicated by the accompanying bipolar channels tables.
    • Each subject has a unique montage of electrode locations. MNI and Talairach coordinates are provided when available.
    • Recordings done with the Blackrock system are in units of 250 nV, while recordings done with the Medtronic system are estimated through testing to have units of 0.1 uV. We have completed the scaling to provide values in V.

    Contact

    For questions or inquiries, please contact sas-kahana-sysadmin@sas.upenn.edu.

  4. g

    Simple download service (Atom) of the dataset: Features of the Poses Loop...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simple download service (Atom) of the dataset: Features of the Poses Loop PPRI | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-5d3d696d-a435-48b6-bd36-e8e1bd543532
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Features of the Poses Loop PPRI

  5. A benchmark dataset of Solidity smart contracts

    • zenodo.org
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianyuan Hu; Tianyuan Hu (2023). A benchmark dataset of Solidity smart contracts [Dataset]. http://doi.org/10.5281/zenodo.7606610
    Explore at:
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tianyuan Hu; Tianyuan Hu
    Description

    A benchmark dataset contains 4,364 real-world Solidity smart contracts, which are manually labeled with ten types of vulnerabilities.

    • DC (DelegateCall).

    The address.delegatecall() function allows a smart contract to dynamically load external contracts from address at runtime. If the attacker can control the external contract and affect the current contract status, the contract is vulnerable to DC.

    • IOU (Arithmetic/Integer Overflow and Underflow).

    An arithmetic overflow or underflow, often called Integer Overflow or Underflow (IOU), occurs when an arithmetic operation attempts to create a numeric variable value that is larger than the maximum value or smaller than the minimum value of the variable type. If the arithmetic operation may pass a variable type’s maximum or minimum value and is performed without using SafeMath, the contract is vulnerable to IOU.

    • NC (Nested Call).

    The function containing the loop has a high risk of exceeding its gas limitation and causing an out-of-gas error. If the attacker can control the loop iteration and causes the out-of-gas error, the contract is vulnerable to NC,

    • RE (Reentrancy).

    The contract vulnerable to RE uses the call() function to transfer ether to an external contract. The external contract can reenter the vulnerable contract by fallback function. If the state variable change is after the call() function, the reentrance will cause status inconsistency.

    • TD (Timestamp Dependency).

    The contract uses the timestamp as the deciding factor for critical operations, e.g., sending ether. If the attacker can get ether from the contract by manipulating the timestamp or affecting the critical operations, the contract is vulnerable to TD.

    • TO (TxOrigin).

    If the contract only uses tx.origin to verify the caller's identification for critical operations, it is vulnerable to TO.

    • TOD (Transaction Order Dependency).

    The contract may send out ether differently according to different values of a global state variable or different balance values of the contract. If the attackers can get ether from the contract by manipulating the transaction sequences, the contract is vulnerable to TOD.

    • UcC (Unchecked Call).

    The contract uses the function call() or send() without result checking. If the send() or call() function fails and leads to status inconsistency, the contract is vulnerable to UcC.

    • Unprotected Suicide).

    If an attacker can self-destruct the contract by calling the selfdestruct(address) function, the contract is vulnerable to UpS.

    • FE (Frozen Ether).

    If the contract can receive ether but cannot transfer it by itself, it is vulnerable to FE.

    For the purpose of protection for smart contracts, the dataset can be available after request.

  6. d

    Seattle 20 Second Freeway

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Department of Transportation (2025). Seattle 20 Second Freeway [Dataset]. https://catalog.data.gov/dataset/seattle-20-second-freeway
    Explore at:
    Dataset updated
    Mar 16, 2025
    Dataset provided by
    US Department of Transportation
    Area covered
    Seattle
    Description

    This set of data files is one of the four test data sets acquired by the USDOT Data Capture and Management program. It contains the following data for the six months from May 1 2011 to October 31 2011: -Raw and cleaned data for traffic detectors deployed by Washington Department of Transportation (WSDOT) along I-5 in Seattle. Data includes 20-second raw reports. -Incident response records from the WSDOT's Washington Incident Tracking System (WITS). -A record of all messages and travel times posted on WSDOT's Active Traffic -Management signs and conventional variable message signs on I-5. -Loop detector volume and occupancy data from arterials parallel to I-5, estimated travel times on arterials derived from Automatic License Plate Reader (ALPR) data, and arterial signal timing plans. -Scheduled and actual bus arrival times from King County Metro buses and Sound Transit buses. -Incidents on I-5 during the six month period -Seattle weather data for the six month period -A dataset of GPS breadcrumb data from commercial trucks described in the documentation is not available to the public because of data ownership and privacy issues. This legacy dataset was created before data.transportation.gov and is only currently available via the attached file(s). Please contact the dataset owner if there is a need for users to work with this data using the data.transportation.gov analysis features (online viewing, API, graphing, etc.) and the USDOT will consider modifying the dataset to fully integrate in data.transportation.gov. Note: All extras are attached in Seattle Freeway Travel Times https://data.transportation.gov/Automobiles/Seattle-Freeway-Travel-Times/9v5g-t8u8

  7. z

    CISC-LIVE-LAB-3/dataset: v1.0.2

    • zenodo.org
    zip
    Updated Jan 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ammar N. Abbas; Winniewelsh; Ammar N. Abbas; Winniewelsh (2024). CISC-LIVE-LAB-3/dataset: v1.0.2 [Dataset]. http://doi.org/10.5281/zenodo.10600674
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 31, 2024
    Dataset provided by
    Zenodo
    Authors
    Ammar N. Abbas; Winniewelsh; Ammar N. Abbas; Winniewelsh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Human-in-the-Loop Decision Support in Process Control Rooms Dataset

    Overview

    This repository contains a comprehensive dataset to assess cognitive states, workload, situational awareness, stress, and performance in human-in-the-loop process control rooms. The dataset includes objective and subjective measures from various data collection tools such as NASA-TLX, SART, eye tracking, EEG, Health Monitoring Watch, surveys, and think-aloud situational awareness assessments. It is based on an experimental study of a formaldehyde production plant based on participants' interactions in a controlled control room experimental setting.

    Purpose

    The study compared three different setups of human system interfaces in four human-in-the-loop (HITL) configurations, incorporating two alarm design formats (Prioritised vs non-prioritised) and three procedural guidance setups (e.g. one presenting paper procedures, one offering digitised screen-based procedures, and lastly an AI-based procedural guidance system).

    Key Features

    • Subject Area: Chemical Engineering, Control and Safety Engineering, Human Factors and Ergonomics, Human-Computer Interaction, and Artificial Intelligence
    • Data Format: Raw, Analyzed, Filtered
    • Type of Data: CSV File (.csv), Matlab File (.mat), Excel (.xlsx), Table
    • Data Collection: The dataset contains behavioural, cognitive, and performance data from 92 participants, including system data under each participant from three scenarios, each simulating a typical control room monitoring, alarm handling, planning, and intervention tasks and subtasks. The participants consented to participate on the test day, after which the researchers trained them. They performed tasks under three scenarios, each lasting 15 - 18 minutes. During these tests, the participant wore a watch for health monitoring, including an eye tracker. They were asked situational awareness questions based on the SPAM methodology at specific periods within 15 minutes, especially at the 6th, 8th, and 12th minutes. These questions assessed the three levels of situational awareness: perception, comprehension, and projection. This feedback collection process on situational awareness differed for one of the groups that used an AI-based decision support system. The question for this group was asked right after specific actions. Therefore, for the overall study, the following performance-shaping factors are considered: type of decision support system (alarm display design, procedure format, AI support, interface design), communication, situational awareness, cognitive workload, experience/training, task complexity, and stress. In both cases, communication was excluded as a factor considered in the first and second scenarios based on this absence. The data collected was normalized using the Min-Max normalization.

    Potential Applications

    The dataset provides an opportunity for various applications, including:

    • Developing human performance models and process safety models
    • Developing a digital twin simulating human-machine interaction in process control rooms
    • Optimizing human-AI interaction in safety-critical industries
    • Qualifying and quantifying the performance and effectiveness of AI-enhanced decision support systems incorporating Deep Reinforcement Learning (DRL) using a Specialized Reinforcement Learning Agent (SRLA) framework
    • Validating proposed solutions for the industry

    Usage

    The dataset is instrumental for researchers, decision-makers, system engineers, human factor engineers, and teams developing guidelines and standards. It is also applicable for validating proposed solutions for the industry and for researchers in similar or close domains.

    Data Structure

    The concatenated Excel file for the dataset may include the following detailed data:

    1. Demographic and Educational Background Data:

      • Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.
      • Age: The age of each participant at the time of the experiment.
      • Gender: The gender of each participant, typically categorized as male, female, or other.
      • Educational Background: Details of participants' academic qualifications, including degree type (e.g., Masters, PhD), year of study, and field of study (e.g., Chemical Engineering, IT).
      • Dominant Hand: Information on whether participants are right or left-handed, which could influence their interaction with the simulation interface.
      • Familiarity with Industry and Control Room: Self-reported familiarity levels with the industry in general and control room environments specifically, on a scale from 1 to 5.
    2. SPAM Metrics:

      • Participant Identifier: Unique codes for participants (e.g., P04, P06), maintaining anonymity while allowing for individual analysis.
      • Group Assignment: Indicates the experimental group (e.g., G4, G3, G2, G1) to which participants belonged, reflecting different levels of decision support in the simulation.
      • Scenario Engagement: Identifies the specific scenarios (e.g., S1, S2, S3) each participant encountered, representing diverse challenges within the control room simulation.
      • SPAM Metrics: Participant ratings across three dimensions of the SPAM questionnaire - Perception, Understanding, and Projection, on a scale typically from 1 to 5.
      • SPAM Index: Composite scores derived from the SPAM, indicating overall situation awareness levels experienced by participants. Calculated as the average of the score on perception, understanding and projection.
    3. NASA-TLX Responses:

      • Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.
      • Group Assignment: Indicates the experimental group (e.g., G1) to which participants were assigned, reflecting different levels of decision support in the simulation.
      • TLX Ratings: Participants' responses utilizing the NASA Task Load Index (NASA TLX) questionnaire, providing insights into the cognitive, physical, and emotional workload experienced by operators in simulated control room scenarios.
      • TLX Index: Composite scores derived from the NASA TLX, representing the overall workload experienced by the participant, calculated as an average of the ratings across the six dimensions.
    4. SART Data:

      • Participant Identifier: Unique codes for participants (e.g., P04, P06), maintaining anonymity while allowing for individual analysis.
      • Group Assignment: Indicates the experimental group (e.g., G1) to which participants belonged, reflecting different levels of decision support in the simulation.
      • SART Metrics: Participants' responses to the Situation Awareness Rating Technique (SART) questionnaire, capturing metrics reflecting the participants' situation awareness. It is calculated using the equation U - (D - S). Situation Understanding (U) comprises Information Quantity, Information Quality, and Familiarity. Situation demand (D) includes the situation's Instability, Complexity, and Variability. At the same time, the Supply of attentional resources (S) comprises Arousal, Concentration, Division of Attention, and Spare Capacity.
    5. AI Decision Support System Feedback:

      • Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.
      • AI System Ratings: Participants' feedback and ratings across different aspects of the AI decision support system, such as support, explainability, and trust, providing insights into the system's perceived strengths and areas for improvement.
      • Workload Impact Data: Information on the workload impact and the balance between AI benefits and additional workload, offering valuable perspectives on the practicality and efficiency of integrating AI systems in control room operations.
      • DRL (Deep Reinforcement Learning) Role: Emphasis on the importance of validating AI recommendations and the role of Deep Reinforcement Learning (DRL) in enhancing trust.
    6. Performance Metrics:

      • Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.
      • Scenario Engagement: Details of the specific scenario (e.g., S1, S2, S3) each participant encountered, representing various challenges in the control room environment.
      • Task-Specific Performance Measures: Data capturing the participants' experiences and performance across different scenarios in a control room simulation, including task-specific performance measures and outcomes related to decision-making processes in safety- critical environments.

    This detailed breakdown provides a comprehensive view of the specific data elements that could be included in the concatenated Excel file, allowing for thorough analysis and exploration of the participants' experiences, cognitive states, workload, and decision-making processes in control room environments.

    <a

  8. Water Meter Jbktv 7vz5k Axdg Dataset

    • universe.roboflow.com
    zip
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    roboflow 20 VL (2025). Water Meter Jbktv 7vz5k Axdg Dataset [Dataset]. https://universe.roboflow.com/roboflow-20-vl/water-meter-jbktv-7vz5k-axdg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset provided by
    Roboflow
    Authors
    roboflow 20 VL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Water Meter Jbktv 7vz5k Axdg Axdg Bounding Boxes
    Description

    Overview

    Introduction

    This dataset is designed to facilitate the task of detecting and recognizing characters on a water meter. It includes annotations for individual digits (0-9) as well as any other textual or numeric information present. The goal is to provide comprehensive labels for training object detection models to accurately identify and interpret the numeric readings.

    • Number: Refers to any sequence or individual numeric or alphabetic characters that are not single digits.
    • 0-9: Individual digits from 0 to 9 that appear as standalone characters on the water meter display.

    Object Classes

    Number

    Description

    The "Number" class includes any sequences or isolated numeric or alphabetic characters that are not part of the specific digit classes (0-9). These can be alphanumeric codes, labels, or any other text visible on the water meter that is not a single digit.

    Instructions

    • Annotate any sequence of numbers or text that is not a single digit.
    • Include all visible parts of the alphanumeric sequence.
    • Do not annotate individual digits that are clearly separate and belong to classes 0-9.

    0

    Description

    The digit "0" is represented as a single closed loop character, often circular or oval in shape, found on the water meter display.

    Instructions

    • Annotate the complete visible form of the digit "0".
    • Ensure the bounding box closely encloses the outer edges of the character.
    • Ignore partial digits that are not clearly readable as "0".

    1

    Description

    The digit "1" typically appears as a single vertical line, occasionally with a small base or top serif, depending on the font style.

    Instructions

    • Draw a bounding box around the full visible extent of the digit "1".
    • If the digit has serifs, include them within the box.
    • Do not annotate shadows or indistinct lines that could be mistaken for "1".

    2

    Description

    The digit "2" usually has a rounded top loop and a descending diagonal stroke ending in a horizontal base.

    Instructions

    • Capture the entire digit "2", from its rounded top to the bottom base.
    • Make sure the bounding box is snug around both the top loop and the bottom line.
    • Do not label portions that do not clearly define the character "2".

    3

    Description

    The digit "3" consists of two rounded loops stacked vertically with their centers aligned.

    Instructions

    • Enclose both rounded loops of the digit "3" within a bounding box.
    • Ensure there is minimal space between the edge of the loops and the box.
    • Disregard any artifacts that do not contribute to the full shape of "3".

    4

    Description

    The digit "4" appears with a vertical line intersected by a diagonal line forming a triangle and a horizontal base.

    Instructions

    • Include the vertical, diagonal, and horizontal components within the bounding box.
    • Verify the triangle and base are entirely contained.
    • Ignore marks that do not complete the "4" shape.

    5

    Description

    The digit "5" features a top horizontal line, a curved back, and a flat base, resembling an incomplete circle with a flat top.

    Instructions

    • Annotate from the top line through the back curve and base line.
    • Align the bounding box tightly around the curved sections.
    • Exclude markings that do not complete the recognizably "5" structure.

    6

    Description

    The digit "6" includes a closed loop at the bottom with an open top loop, appearing as a partially twisted circle.

    Instructions

    • Ensure the bounding box covers both the open top and closed bottom loops.
    • The box should encompass the whole digit, avoiding excess space.
    • Neglect incomplete loops not forming a full "6".

    7

    Description

    The digit "7" has a flat top line connected to a diagonal descending line, often lacking additional embellishments.

    Instructions

    • Frame both the horizontal top and the descending line.
    • Box should be tight, especially around the junction of the horizontal and diagonal lines.
    • Do not include extraneous lines that do not match "7".

    8

    Description

    The digit "8" consists of two equal-sized closed loops stacked vertically.

    Instructions

    • The bounding box should capture both loops fully, ensuring the character's symmetry is maintained.
    • Do not annotate shapes that do not distinctly form an "8".

    9

    Description

    The digit "9" appears as a top loop with a straight or slightly curved descending tail, resembling an upside-down "6".

    Instructions

    • Annotate both the top round and tail section within the bounding box.
    • Ensure the box encapsulates the full shape from top to bottom.
    • Exclude lines without a connecting loop or tail completing a "9
  9. GulfFlow: A gridded surface current product for the Gulf of Mexico from...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan M. Lilly; Jonathan M. Lilly; Paula Pérez-Brunius; Paula Pérez-Brunius (2023). GulfFlow: A gridded surface current product for the Gulf of Mexico from consolidated drifter measurements [Dataset]. http://doi.org/10.5281/zenodo.4421958
    Explore at:
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jonathan M. Lilly; Jonathan M. Lilly; Paula Pérez-Brunius; Paula Pérez-Brunius
    Area covered
    Gulf of Mexico (Gulf of America)
    Description

    This dataset is comprised of mean and variance of the surface velocity field of the Gulf of Mexico, obtained from a large set of historical surface drifter data from the Gulf of Mexico—3770 trajectories spanning 28 years and more than a dozen data sources— which were uniformly processed and quality controlled, and assimilated into a spatially and temporally gridded dataset. A gridded product, called GulfFlow, is created by averaging all available data from the GulfDrifters dataset within quarter-degree spatial bins, and within overlapping monthlong temporal bins having a semimonthly spacing. The dataset spans monthly time bins centered on July 16, 1992 through July 1, 2020, for a total of 672 overlapping time slices. Odd- numbered slices correspond to calendar months, while even-numbered slices run from halfway through one month to halfway through the following month. A higher spatial resolution version, GulfFlow-1/12 degree is created in the identical way but using 1/12 degree bins instead of quarter-degree bins. In addition to the average velocities within each 3D bin, the count of sources contributing to each bin is also distributed, as is the subgridscale velocity variance. The count variable is a four-dimensional array of integers, the fourth dimension of which has length 45. This variable gives the number of hourly observations from each source dataset contributing to each three-dimensional bin. Values 1–15 are the count of velocity observations from drifters from each of the 15 experiments that are flagged as having retained their drogues, values 16–30 are for observation from drifters that are flagged as having lost their drogues, and values 31–45 are for observations from drifters of an unknown drogue status. In defining averaged quantities, we represent the velocity as a vector, u = [u v]T , where the superscript “T” denotes the transpose. Let an overbar, \(\overline {\bf u}\) , denote an average over a spatial bin and over all times, while angled brackets, <u>, denote an average over a spatial bin and a particular temporal bin. Thus, <u>, is a function of time while \(\overline {\bf u}\) is not. We refer to <u>, as the local average, \(\overline {\bf u}\) as the global average, and \(\overline {<\bf u>}\) as the double average. Given the inhomogeneity of the drifter data, turns out the global average is biased towards intensive but short duration programs, hence the double average results in a much better representation of the true mean velocity field. The dataset includes the global average \(\overline {<\bf u>}\), the local covariance defined as

    \(\bf{ε}=<(u − )(𝐮−< 𝐮 >)^T>\)

    and \(\epsilon^2\)which is the trace of \(\overline{\bf ε}\)

    \(\epsilon^2\)=\(tr\{\overline{\bf ε}\}\)

    The data is distributed in two separate netcdCDF files, one for each grid resolution.

    Here the article describing this dataset.

    Lilly, J. M. and P. Pérez-Brunius (2021). A gridded surface current product for the Gulf of Mexico from consolidated drifter measurements. Earth System Science Data, 13: 645–669. https://doi.org/10.5194/essd-13-645-2021.

  10. m

    Educational Attainment in North Carolina Public Schools: Use of statistical...

    • data.mendeley.com
    Updated Nov 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Herford (2018). Educational Attainment in North Carolina Public Schools: Use of statistical modeling, data mining techniques, and machine learning algorithms to explore 2014-2017 North Carolina Public School datasets. [Dataset]. http://doi.org/10.17632/6cm9wyd5g5.1
    Explore at:
    Dataset updated
    Nov 14, 2018
    Authors
    Scott Herford
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Carolina
    Description

    The purpose of data mining analysis is always to find patterns of the data using certain kind of techiques such as classification or regression. It is not always feasible to apply classification algorithms directly to dataset. Before doing any work on the data, the data has to be pre-processed and this process normally involves feature selection and dimensionality reduction. We tried to use clustering as a way to reduce the dimension of the data and create new features. Based on our project, after using clustering prior to classification, the performance has not improved much. The reason why it has not improved could be the features we selected to perform clustering are not well suited for it. Because of the nature of the data, classification tasks are going to provide more information to work with in terms of improving knowledge and overall performance metrics. From the dimensionality reduction perspective: It is different from Principle Component Analysis which guarantees finding the best linear transformation that reduces the number of dimensions with a minimum loss of information. Using clusters as a technique of reducing the data dimension will lose a lot of information since clustering techniques are based a metric of 'distance'. At high dimensions euclidean distance loses pretty much all meaning. Therefore using clustering as a "Reducing" dimensionality by mapping data points to cluster numbers is not always good since you may lose almost all the information. From the creating new features perspective: Clustering analysis creates labels based on the patterns of the data, it brings uncertainties into the data. By using clustering prior to classification, the decision on the number of clusters will highly affect the performance of the clustering, then affect the performance of classification. If the part of features we use clustering techniques on is very suited for it, it might increase the overall performance on classification. For example, if the features we use k-means on are numerical and the dimension is small, the overall classification performance may be better. We did not lock in the clustering outputs using a random_state in the effort to see if they were stable. Our assumption was that if the results vary highly from run to run which they definitely did, maybe the data just does not cluster well with the methods selected at all. Basically, the ramification we saw was that our results are not much better than random when applying clustering to the data preprocessing. Finally, it is important to ensure a feedback loop is in place to continuously collect the same data in the same format from which the models were created. This feedback loop can be used to measure the model real world effectiveness and also to continue to revise the models from time to time as things change.

  11. Water Meter Jbktv 7vz5k Ftoz Dataset

    • universe.roboflow.com
    zip
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow 20-VL (2025). Water Meter Jbktv 7vz5k Ftoz Dataset [Dataset]. https://universe.roboflow.com/rf20-vl/water-meter-jbktv-7vz5k-ftoz
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    Roboflow
    Authors
    Roboflow 20-VL
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Water Meter Jbktv 7vz5k Ftoz Ftoz Bounding Boxes
    Description

    Overview

    • Introduction
    • Object Classes

    • 0: The digit 0

    • 1: The digit 1

    • 2: The digit 2

    • 3: The digit 3

    • 4: The digit 4

    • 5: The digit 5

    • 6: The digit 6

    • 7: The digit 7

    • 8: The digit 8

    • 9: The digit 9

    Introduction

    This dataset contains images of water meter readings with the purpose of digitizing the numeric values. There are 10 classes representing the digits 0 through 9. Annotators will label the digits as they appear on the meters to facilitate accurate recognition.

    Object Classes

    0

    Description

    The digit "0" is characterized by its oval or circular shape, often with a distinctive horizontal thickness.

    Instructions

    • Annotate the entire visible area of the digit, ensuring the bounding box captures the full curvature without cutting into adjacent digits.
    • Do not label if only partially visible or distorted beyond recognition.

    1

    Description

    The digit "1" typically appears as a straight vertical line, sometimes with a short horizontal base.

    Instructions

    • Draw a bounding box around the full length of the digit, including any visible base. Ensure clear vertical alignment, and avoid labeling if obscured by glare or shadows.

    2

    Description

    The digit "2" has a curved top and straight middle section, finishing with a horizontal or diagonal stroke at the base.

    Instructions

    • Ensure that the bounding box covers the entire silhouette, from the curved top to the base.
    • Avoid labeling if any section is visibly missing or incomplete.

    3

    Description

    The digit "3" is identified by two stacked curved sections without intersecting lines.

    Instructions

    • Capture both curvatures completely within the bounding box.
    • Ensure that no parts are obscured by reflections or meter shadows before labeling.

    4

    Description

    The digit "4" often features intersecting horizontal and vertical lines with a triangle-like top section.

    Instructions

    • Include the intersecting lines and full top section within the bounding box.
    • Avoid labeling if structural lines are affected by wear or visibility issues.

    5

    Description

    The digit "5" combines a prominent upper loop with a lower horizontal stroke and a straight vertical line.

    Instructions

    • Enclose the loop and strokes fully within the bounding box.
    • Ensure separation from neighboring digits, and do not label if substantial parts are unclear or missing.

    6

    Description

    The digit "6" features a closed top loop with an extended lower curve that continues downward.

    Instructions

    • Draw the bounding box to encompass both the loop and the extended curve, ensuring clarity and complete visibility.
    • Avoid labeling partially obstructed digits.

    7

    Description

    The digit "7" is characterized by a horizontal top line connecting to a diagonal downward stroke.

    Instructions

    • Include the horizontal and diagonal lines entirely in the bounding box.
    • Check for clarity and distinction from shadows before labeling.

    8

    Description

    The digit "8" resembles two stacked circles or loops, one above the other.

    Instructions

    • Ensure the bounding box captures both loops.
    • Confirm that the digit is not overlapping with others and is entirely visible before labeling.

    9

    Description

    The digit "9" starts with a circular or elliptical loop at the top, leading into a straight downward stroke.

    Instructions

    • The bounding box should cover the complete loop and the straight line.
    • Do not annotate if any parts are obscured or not clearly discernible.
  12. P

    Real Estate Price Prediction Dataset

    • paperswithcode.com
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Real Estate Price Prediction Dataset [Dataset]. https://paperswithcode.com/dataset/real-estate-price-prediction
    Explore at:
    Dataset updated
    Mar 7, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    Investors and buyers in the real estate market faced challenges in accurately assessing property values and market trends. Traditional valuation methods were time-consuming and lacked precision, making it difficult to make informed investment decisions. A real estate firm sought a predictive analytics solution to provide accurate property price forecasts and market insights.

    Challenge

    Developing a real estate price prediction system involved addressing the following challenges:

    Collecting and processing vast amounts of data, including historical property prices, economic indicators, and location-specific factors.

    Accounting for diverse variables such as neighborhood quality, proximity to amenities, and market demand.

    Ensuring the model’s adaptability to changing market conditions and economic fluctuations.

    Solution Provided

    A real estate price prediction system was developed using machine learning regression models and big data analytics. The solution was designed to:

    Analyze historical and real-time data to predict property prices accurately.

    Provide actionable insights on market trends, enabling better investment strategies.

    Identify undervalued properties and potential growth areas for investors.

    Development Steps

    Data Collection

    Collected extensive datasets, including property listings, sales records, demographic data, and economic indicators.

    Preprocessing

    Cleaned and structured data, removing inconsistencies and normalizing variables such as location, property type, and size.

    Model Development

    Built regression models using techniques such as linear regression, decision trees, and gradient boosting to predict property prices. Integrated feature engineering to account for location-specific factors, amenities, and market trends.

    Validation

    Tested the models using historical data and cross-validation to ensure high prediction accuracy and robustness.

    Deployment

    Implemented the prediction system as a web-based platform, allowing users to input property details and receive price estimates and market insights.

    Continuous Monitoring & Improvement

    Established a feedback loop to update models with new data and refine predictions as market conditions evolved.

    Results

    Increased Prediction Accuracy

    The system delivered highly accurate property price forecasts, improving investor confidence and decision-making.

    Informed Investment Decisions

    Investors and buyers gained valuable insights into market trends and property values, enabling better strategies and reduced risks.

    Enhanced Market Insights

    The platform provided detailed analytics on neighborhood trends, demand patterns, and growth potential, helping users identify opportunities.

    Scalable Solution

    The system scaled seamlessly to include new locations, property types, and market dynamics.

    Improved User Experience

    The intuitive platform design made it easy for users to access predictions and insights, boosting engagement and satisfaction.

  13. GIOTTO RADIO SCIENCE EXPERIMENT DATA V1.0

    • data.nasa.gov
    • datasets.ai
    • +3more
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). GIOTTO RADIO SCIENCE EXPERIMENT DATA V1.0 [Dataset]. https://data.nasa.gov/Earth-Science/GIOTTO-RADIO-SCIENCE-EXPERIMENT-DATA-V1-0/tyz9-5y3g
    Explore at:
    csv, application/rdfxml, xml, tsv, json, application/rssxmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The Giotto Radio Science Experiment data set consists of four tables. Each table contains a measurement value listed as a function of time. The measurements are: closed-loop receiver carrier signal amplitude, closed-loop receiver carrier frequency residual, open-loop receiver carrier signal amplitude, and open-loop receiver carrier frequency.

  14. Digital Geologic-GIS Map of The Loop and Druid Arch Quadrangles, Utah (NPS,...

    • s.cnmilf.com
    • catalog.data.gov
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2025). Digital Geologic-GIS Map of The Loop and Druid Arch Quadrangles, Utah (NPS, GRD, GRI, CANY, THDR digital map) adapted from a U.S. Geological Survey Miscellaneous Field Studies Map by Billingsley, Block and Felger (2002) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/digital-geologic-gis-map-of-the-loop-and-druid-arch-quadrangles-utah-nps-grd-gri-cany-thdr
    Explore at:
    Dataset updated
    Mar 11, 2025
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Area covered
    Utah
    Description

    The Digital Geologic-GIS Map of The Loop and Druid Arch Quadrangles, Utah is composed of GIS data layers and GIS tables, and is available in the following GRI-supported GIS data formats: 1.) an ESRI file geodatabase (thdr_geology.gdb), a 2.) Open Geospatial Consortium (OGC) geopackage, and 3.) 2.2 KMZ/KML file for use in Google Earth, however, this format version of the map is limited in data layers presented and in access to GRI ancillary table information. The file geodatabase format is supported with a 1.) ArcGIS Pro map file (.mapx) file (thdr_geology.mapx) and individual Pro layer (.lyrx) files (for each GIS data layer). The OGC geopackage is supported with a QGIS project (.qgz) file. Upon request, the GIS data is also available in ESRI shapefile format. Contact Stephanie O'Meara (see contact information below) to acquire the GIS data in these GIS data formats. In addition to the GIS data and supporting GIS files, three additional files comprise a GRI digital geologic-GIS dataset or map: 1.) a readme file (cany_geology_gis_readme.pdf), 2.) the GRI ancillary map information document (.pdf) file (cany_geology.pdf) which contains geologic unit descriptions, as well as other ancillary map information and graphics from the source map(s) used by the GRI in the production of the GRI digital geologic-GIS data for the park, and 3.) a user-friendly FAQ PDF version of the metadata (thdr_geology_metadata_faq.pdf). Please read the cany_geology_gis_readme.pdf for information pertaining to the proper extraction of the GIS data and other map files. Google Earth software is available for free at: https://www.google.com/earth/versions/. QGIS software is available for free at: https://www.qgis.org/en/site/. Users are encouraged to only use the Google Earth data for basic visualization, and to use the GIS data for any type of data analysis or investigation. The data were completed as a component of the Geologic Resources Inventory (GRI) program, a National Park Service (NPS) Inventory and Monitoring (I&M) Division funded program that is administered by the NPS Geologic Resources Division (GRD). For a complete listing of GRI products visit the GRI publications webpage: https://www.nps.gov/subjects/geology/geologic-resources-inventory-products.htm. For more information about the Geologic Resources Inventory Program visit the GRI webpage: https://www.nps.gov/subjects/geology/gri.htm. At the bottom of that webpage is a "Contact Us" link if you need additional information. You may also directly contact the program coordinator, Jason Kenworthy (jason_kenworthy@nps.gov). Source geologic maps and data used to complete this GRI digital dataset were provided by the following: U.S. Geological Survey. Detailed information concerning the sources used and their contribution the GRI product are listed in the Source Citation section(s) of this metadata record (thdr_geology_metadata.txt or thdr_geology_metadata_faq.pdf). Users of this data are cautioned about the locational accuracy of features within this dataset. Based on the source map scale of 1:24,000 and United States National Map Accuracy Standards features are within (horizontally) 12.2 meters or 40 feet of their actual _location as presented by this dataset. Users of this data should thus not assume the _location of features is exactly where they are portrayed in Google Earth, ArcGIS Pro, QGIS or other software used to display this dataset. All GIS and ancillary tables were produced as per the NPS GRI Geology-GIS Geodatabase Data Model v. 2.3. (available at: https://www.nps.gov/articles/gri-geodatabase-model.htm).

  15. u

    UGOS Drifter Dataset for ID # 09630 - Dataset - UGOS

    • ugos.info
    Updated Nov 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). UGOS Drifter Dataset for ID # 09630 - Dataset - UGOS [Dataset]. https://ugos.info/dataset/ugos-drifter-dataset-09630
    Explore at:
    Dataset updated
    Nov 12, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains surface current data collected from UGOS MASTR Drifters (Far Horizon Drifters) deployed in the Gulf. They were specifically positioned to capture dynamic features such as the Loop Current, Dry Tortugas Eddy, Florida Current, Gulf Stream, and Eddy Denali. The dataset includes quality-controlled, hourly measurements intended for model assimilation and validation. Data covers the temporal range from January to July 2024.

  16. Z

    ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tony Reina (2024). ESM-2 embeddings for TCR-Epitope Binding Affinity Prediction Task [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7502653
    Explore at:
    Dataset updated
    Jun 17, 2024
    Dataset authored and provided by
    Tony Reina
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the accompanying dataset that was generated by the GitHub project: https://github.com/tonyreina/tdc-tcr-epitope-antibody-binding. In that repository I show how to create a machine learning models for predicting if a T-cell receptor (TCR) and protein epitope will bind to each other.

    A model that can predict how well a TCR bindings to an epitope can lead to more effective treatments that use immunotherapy. For example, in anti-cancer therapies it is important for the T-cell receptor to bind to the protein marker in the cancer cell so that the T-cell (actually the T-cell's friends in the immune system) can kill the cancer cell.

    HuggingFace provides a "one-stop shop" to train and deploy AI models. In this case, we use Facebook's open-source Evolutionary Scale Model (ESM-2). These embeddings turn the protein sequences into a vector of numbers that the computer can use in a mathematical model.

    To load them into Python use the Pandas library:

    import pandas as pd

    train_data = pd.read_pickle("train_data.pkl") validation_data = pd.read_pickle("validation_data.pkl") test_data = pd.read_pickle("test_data.pkl")

    The epitope_aa and the tcr_full columns are the protein (peptide) sequences for the epitope and the T-cell receptor, respectively. The letters correspond to the standard amino acid codes.

    The epitope_smi column is the SMILES notation for the chemical structure of the epitope. We won't use this information. Instead, the ESM-1b embedder should be sufficient for the input to our binary classification model.

    The tcr column is the CDR3 hyperloop. It's the part of the TCR that actually binds (assuming it binds) to the epitope.

    The label column is whether the two proteins bind. 0 = No. 1 = Yes.

    The tcr_vector and epitope_vector columns are the bio-embeddings of the TCR and epitope sequences generated by the Facebook ESM-1b model. These two vectors can be used to create a machine learning model that predicts whether the combination will produce a successful protein binding.

    From the TDC website:

    T-cells are an integral part of the adaptive immune system, whose survival, proliferation, activation and function are all governed by the interaction of their T-cell receptor (TCR) with immunogenic peptides (epitopes). A large repertoire of T-cell receptors with different specificity is needed to provide protection against a wide range of pathogens. This new task aims to predict the binding affinity given a pair of TCR sequence and epitope sequence.

    Weber et al.

    Dataset Description: The dataset is from Weber et al. who assemble a large and diverse data from the VDJ database and ImmuneCODE project. It uses human TCR-beta chain sequences. Since this dataset is highly imbalanced, the authors exclude epitopes with less than 15 associated TCR sequences and downsample to a limit of 400 TCRs per epitope. The dataset contains amino acid sequences either for the entire TCR or only for the hypervariable CDR3 loop. Epitopes are available as amino acid sequences. Since Weber et al. proposed to represent the peptides as SMILES strings (which reformulates the problem to protein-ligand binding prediction) the SMILES strings of the epitopes are also included. 50% negative samples were generated by shuffling the pairs, i.e. associating TCR sequences with epitopes they have not been shown to bind.

    Task Description: Binary classification. Given the epitope (a peptide, either represented as amino acid sequence or as SMILES) and a T-cell receptor (amino acid sequence, either of the full protein complex or only of the hypervariable CDR3 loop), predict whether the epitope binds to the TCR.

    Dataset Statistics: 47,182 TCR-Epitope pairs between 192 epitopes and 23,139 TCRs.

    References:

    Weber, Anna, Jannis Born, and María Rodriguez Martínez. “TITAN: T-cell receptor specificity prediction with bimodal attention networks.” Bioinformatics 37.Supplement_1 (2021): i237-i244.

    Bagaev, Dmitry V., et al. “VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium.” Nucleic Acids Research 48.D1 (2020): D1057-D1062.

    Dines, Jennifer N., et al. “The immunerace study: A prospective multicohort study of immune response action to covid-19 events with the immunecode™ open access database.” medRxiv (2020).

    Dataset License: CC BY 4.0.

    Contributed by: Anna Weber and Jannis Born.

    The Facebook ESM-2 model has the MIT license and was published in:

    HuggingFace has several versions of the trained model.

    Checkpoint name Number of layers Number of parameters

    esm2_t48_15B_UR50D 48 15B

    esm2_t36_3B_UR50D 36 3B

    esm2_t33_650M_UR50D 33 650M

    esm2_t30_150M_UR50D 30 150M

    esm2_t12_35M_UR50D 12 35M

    esm2_t6_8M_UR50D 6 8M

  17. d

    Data from: SNARE chaperone Sly1 directly mediates close-range vesicle...

    • datadryad.org
    zip
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachael Plemel; Alex Merz; Mengtong Duan; Elizabeth Miller (2024). SNARE chaperone Sly1 directly mediates close-range vesicle tethering [Dataset]. http://doi.org/10.5061/dryad.dr7sqvb5b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 27, 2024
    Dataset provided by
    Dryad
    Authors
    Rachael Plemel; Alex Merz; Mengtong Duan; Elizabeth Miller
    Time period covered
    2023
    Description

    Yeast Sly1 SGA, BioGrid, and Gene Ontology Supplemental Dataset

    https://doi.org/10.5061/dryad.dr7sqvb5b

    Description of the data and file structure:

    To gain genome-scale insight into the sly1∆loop allele’s loss of function, we used synthetic genome array (SGA) analysis. SGA measures the synthetic sickness or rescue (suppression) of a query allele versus a genome-scale collection of loss-of-function alleles (Tong and Boone, 2005). The sly1∆loop allele was knocked into the genomic SLY1 locus. The SGA data from this analysis was then aligned with the BioGRID dataset.

    Contents of the dataset:

    sly1∆loop SGA contains the raw SGA dataset. The SGA score algorithm processes raw colony size data, normalizes them for a series of experimental systematic effects and calculates a quantitative genetic interaction score.

    LogRatios indicates the log-transformed ratio of the growth of the indicated double mutant to the growth of the single mutant with the indicated quer...

  18. P

    Data from: Quality Control in Pharmaceuticals Dataset

    • paperswithcode.com
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Quality Control in Pharmaceuticals Dataset [Dataset]. https://paperswithcode.com/dataset/quality-control-in-pharmaceuticals
    Explore at:
    Dataset updated
    Mar 7, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    A pharmaceutical manufacturer faced significant challenges in ensuring consistent quality during the production of medications. Manual quality control processes were prone to errors and inefficiencies, leading to product recalls and compliance risks. The company needed an advanced solution to automate quality control, reduce production errors, and comply with stringent regulatory standards.

    Challenge

    Implementing automated quality control in pharmaceutical manufacturing posed several challenges:

    Detecting microscopic defects, contamination, or irregularities in products and packaging.

    Ensuring high-speed inspection without disrupting production workflows.

    Meeting strict industry regulations for product quality and traceability.

    Solution Provided

    An AI-powered quality control system was developed using machine vision and advanced inspection algorithms. The solution was designed to:

    Automatically inspect pharmaceutical products for defects, contamination, and compliance with production standards.

    Analyze packaging integrity to detect labeling errors, seal defects, or missing components.

    Provide real-time quality control insights to production teams for immediate corrective actions.

    Development Steps

    Data Collection

    Captured high-resolution images and videos of pharmaceutical products during production, including tablets, capsules, and packaging components.

    Preprocessing

    Preprocessed visual data to enhance features such as shape, texture, and color, enabling accurate defect detection.

    Model Training

    Developed machine vision models to detect defects and anomalies at microscopic levels. Integrated AI algorithms to classify defects and provide actionable insights for process improvement.

    Validation

    Tested the system on a variety of production scenarios to ensure high accuracy and reliability in defect detection.

    Deployment

    Installed AI-powered inspection systems on production lines, integrating them with existing manufacturing processes and quality control frameworks.

    Continuous Monitoring & Improvement

    Established a feedback loop to refine models based on new production data and evolving quality standards.

  19. f

    Data from: S1 Dataset -

    • plos.figshare.com
    xlsx
    Updated Jan 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuh-Chin T. Huang; Luke Henriquez; Hengji Chen; Craig Henriquez (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0297519.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 29, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yuh-Chin T. Huang; Luke Henriquez; Hengji Chen; Craig Henriquez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Pulmonary function tests (PFTs) are usually interpreted by clinicians using rule-based strategies and pattern recognition. The interpretation, however, has variabilities due to patient and interpreter errors. Most PFTs have recognizable patterns that can be categorized into specific physiological defects. In this study, we developed a computerized algorithm using the python package (pdfplumber) and validated against clinicians’ interpretation. We downloaded PFT reports in the electronic medical record system that were in PDF format. We digitized the flow volume loop (FVL) and extracted numeric values from the reports. The algorithm used FEV1/FVC

  20. g

    HystLab Software v1.1.1 (NERC Grant NE/P017266/1) | gimi9.com

    • gimi9.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HystLab Software v1.1.1 (NERC Grant NE/P017266/1) | gimi9.com [Dataset]. https://www.gimi9.com/dataset/uk_hystlab-software-v1-1-1-nerc-grant-ne-p017266-1/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    HystLab (Hysteresis Loop analysis box), is MATLAB based software for the advanced processing and analysis of magnetic hysteresis data. Hysteresis loops are one of the most ubiquitous rock magnetic measurements and with the growing need for high resolution analyses of ever larger datasets, there is a need to rapidly, consistently, and accurately process and analyze these data. HystLab is an easy to use graphical interface that is compatible with a wide range of software platforms. The software can read a wide range of data formats and rapidly process the data. It includes functionality to re-center loops, correction for drift, and perform a range of slope saturation corrections.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra (2020). Freesound Loop Dataset [Dataset]. http://doi.org/10.5281/zenodo.3967852
Organization logo

Freesound Loop Dataset

Explore at:
32 scholarly articles cite this dataset (View in Google Scholar)
bin, zipAvailable download formats
Dataset updated
Jul 31, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra; Antonio Ramires; Frederic Font; Dmitry Bogdanov; Jordan B. L. Smith; Yi-Hsuan Yang; Joann Ching; Bo-Yu Chen; Yueh-Kao Wu; Hsu Wei-Han; Xavier Serra
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Freesound Loop Dataset

This dataset contains 9,455 loops from Freesound.org and the corresponding annotations. These loops have tempo, key, genre and instrumentation annotation.

Dataset Construction

To collect this dataset, the following steps were performed:

  • Freesound was queried with "loop" and "bpm", so as to collect loops which have a beats-per-minute(BPM) annotations.

  • The sounds were analysed with AudioCommons extractor, so as to obtain key information.

  • The textual metadata of each sound was analysed, to obtain the BPM proposed by the user, and to obtain genre information.

  • Annotators used a web interface to annotate around 3,000 loops.

Dataset Organisation

The dataset contains two folders and two files in the root directory:

  • 'FSL10K' encloses the audio files and their metadata and analysis. The audios are in the 'audio' folder and are named '

  • 'annotations' holds the expert provided annotation for the sounds in the dataset. The annotations are separated in a folder for each annotator and each annotation is stored as a .json file, named 'sound-

Licenses

All the sounds have some kind of Creative Commons license. The license of each sound in the dataset can be obtained from the 'FSL10K/metadata.json' file

Authors and Contact

This dataset was developed by António Ramires et. al.

Any questions related to this dataset please contact:

António Ramires

antonio.ramires@upf.edu

aframires@gmail.com

References

Please cite this paper if you use this dataset:

@inproceedings{ramires2020, author = "Antonio Ramires and Frederic Font and Dmitry Bogdanov and Jordan B. L. Smith and Yi-Hsuan Yang and Joann Ching and Bo-Yu Chen and Yueh-Kao Wu and Hsu Wei-Han and Xavier Serra", title = "The Freesound Loop Dataset and Annotation Tool", booktitle = "Proc. of the 21st International Society for Music Information Retrieval (ISMIR)", year = "2020" }

Acknowledgements

This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 765068 (MIP-Frontiers).

Search
Clear search
Close search
Google apps
Main menu